Sunday, November 22, 2009

Linear Regression: Diamond Carat Weight and Price



I was interested in seeing exactly how the weight of a diamond relates to the price. I gathered some data and did a simple linear regression.

I collected a random sample of loose diamonds listed for sale on BlueNile.com. The sample was taken from all “round” cut diamonds graded as having clarity with very slight inclusions (VS1) and color ranging from D-H (near colorless) and having carat weights of .25-.50. Blue Nile had 614 individual stones that matched these categories. I labeled the diamonds 1-614 and took a random sample size of 40 using a random number generator. I then collected the size in carats (x) and price in dollars (y) of the selected diamonds. The data is listed below.

Size(ct.) Price ($)
0.27 509
0.27 509
0.28 518
0.25 582
0.31 597
0.35 660
0.36 661
0.32 670
0.32 670
0.36 678
0.3 696
0.36 700
0.33 727
0.28 747
0.31 782
0.31 782
0.3 800
0.33 808
0.33 829
0.3 849
0.37 851
0.32 852
0.32 859
0.39 890
0.39 890
0.41 981
0.4 1017
0.4 1068
0.41 1071
0.39 1113
0.39 1113
0.42 1119
0.42 1310
0.42 1331
0.41 1398
0.43 1476
0.42 1516
0.46 1543
0.46 1595
0.48 1803

Interpretation of Regression Coefficients.



The y-intercept, b0= -836.79 and the slope b1= 4950. These numbers might seem unusual, but it’s easy to think about what they mean when you use the y-intercept and slope to calculate that each .01ct increase of weight adds $49.50 to the predicted price of these diamonds.

Interpretation of R and R^2

The correlation coefficient of R is .8836 and indicates a strong positive linear association between the weight of a diamond and its price. The coefficient of determination, R^2, is .7807. This indicates that about 78.1% of the cost of these diamonds is accounted for in the weight of the stones. This leaves 21.9% of the variation in the residuals. This is the linear association expected from the scatter plot.
Sample Predictions of Price from Weight

The regression equation can now be used to predict the price of a diamond by its weight. For the example we use a diamond that is .25 carats because it is within the domain of the regression line.

4950.6(.25)-836.79=400.86

So it would be reasonable to expect to pay $400.86 for a .25 ct diamond.

None of the random diamonds in this data set were .5 ct, but I can use the equation to predict a price:

4950.6(.5)-836.79=1638.51

The equation predicts the cost of a .5 ct diamond to be $1638.51, so that is a price that I could reasonably expect to pay for a .5 ct stone.

Confidence Interval for Predicted Mean Value

There is 95% confidence that the average price of a .25 ct diamond falls between $293.22 and $508.50. The interval is not too broad and gives an idea of the average prices for a diamond that is .25 ct.

Prediction Interval for Individual Predicted Value

There is 95% confidence that the price of a .25 ct diamond will fall between $234.91 and $566.95. This interval is once again not too broad and gives a rough idea of the prices for a .25 ct stone, with the understand that selling a stone for less than $234.91 is pricing the diamond too cheap and more than $566.95 for a .25 ct diamond is overpriced.
Confidence Interval for Slope.

There is 95% confidence that the slope of the true regression line is between 4086.51 and 5814.77. This means that we can be 95% confident that the price of a diamond rises between $40.87 and $58.15 for every additional .01 ct in weight. It can be concluded that, because zero is not in the interval, there is a positive linear association between the variables.

Hypothesis:
Ho: B1== 0 The null hypotheses is that there is no linear association between the price of a diamond and its weight. (slope is zero)

Ha: B1=/= 0 The alternative hypotheses is that there is a linear association between the price of a diamond and its carat weight. (slope is not zero)

Model:
All of the linear regression t-test conditions are met
The scatter plot appears linear
The residual plot has no apparent pattern
The residuals are relatively spread consistently
The normal probability plot appears basically straight

Mechanics:
The statistics in the test have been calculated by Gnumeric. The statistics are also calculated below.

t=b1-0/SE(b1)= 4950.64/425.68=11.63

P=P(|t|>11.63)=4.37E-014 or about 0.000000000000004

Conclusion:
With this very small P-Value, the null hypothesis is rejected. The probability of calculating a slope of 4950.64 if the actual slope is zero is extremely small. This serves as significant evidence that there is a positive linear relationship between diamond weight and price.

From Amazon:
14k White Gold, Round, Diamond Stud Earrings (1/3 cttw, K-L Color, I3 Clarity)

No comments:

Post a Comment