import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
Price = -5,269 + 8,413 x Carat + 158.1 x Cut + 454 x Clarity
diamonds = pd.read_csv('data/diamonds.csv')
new_diamonds = pd.read_csv('data/new-diamonds.csv')
diamonds
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
Unnamed: 0 | carat | cut | cut_ord | color | clarity | clarity_ord | price | |
---|---|---|---|---|---|---|---|---|
0 | 1 | 0.51 | Premium | 4 | F | VS1 | 4 | 1749 |
1 | 2 | 2.25 | Fair | 1 | G | I1 | 1 | 7069 |
2 | 3 | 0.70 | Very Good | 3 | E | VS2 | 5 | 2757 |
3 | 4 | 0.47 | Good | 2 | F | VS1 | 4 | 1243 |
4 | 5 | 0.30 | Ideal | 5 | G | VVS1 | 7 | 789 |
5 | 6 | 0.33 | Ideal | 5 | D | SI1 | 3 | 728 |
6 | 7 | 2.01 | Very Good | 3 | G | SI1 | 3 | 18398 |
7 | 8 | 0.51 | Ideal | 5 | F | VVS2 | 6 | 2203 |
8 | 9 | 1.70 | Premium | 4 | D | SI1 | 3 | 15100 |
9 | 10 | 0.53 | Premium | 4 | D | VS2 | 5 | 1857 |
10 | 11 | 0.39 | Premium | 4 | H | SI1 | 3 | 834 |
11 | 12 | 1.50 | Very Good | 3 | H | SI1 | 3 | 7708 |
12 | 13 | 1.00 | Premium | 4 | E | VS2 | 5 | 6272 |
13 | 14 | 1.29 | Ideal | 5 | J | VS1 | 4 | 5676 |
14 | 15 | 2.01 | Good | 2 | D | SI2 | 2 | 16776 |
15 | 16 | 1.13 | Ideal | 5 | G | VS1 | 4 | 7404 |
16 | 17 | 0.70 | Ideal | 5 | I | SI2 | 2 | 1702 |
17 | 18 | 0.38 | Very Good | 3 | I | VS1 | 4 | 606 |
18 | 19 | 1.17 | Ideal | 5 | H | SI2 | 2 | 5423 |
19 | 20 | 1.51 | Premium | 4 | F | SI1 | 3 | 8033 |
20 | 21 | 0.40 | Ideal | 5 | D | VVS1 | 7 | 1279 |
21 | 22 | 0.41 | Very Good | 3 | F | VS2 | 5 | 863 |
22 | 23 | 0.51 | Ideal | 5 | G | VVS1 | 7 | 1893 |
23 | 24 | 1.00 | Premium | 4 | H | SI2 | 2 | 3584 |
24 | 25 | 1.09 | Ideal | 5 | F | VVS2 | 6 | 10196 |
25 | 26 | 0.39 | Good | 2 | E | VS1 | 4 | 1082 |
26 | 27 | 0.72 | Premium | 4 | E | VS2 | 5 | 3024 |
27 | 28 | 1.14 | Very Good | 3 | E | SI2 | 2 | 5593 |
28 | 29 | 0.30 | Ideal | 5 | D | VS2 | 5 | 710 |
29 | 30 | 0.30 | Ideal | 5 | H | SI1 | 3 | 465 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
49970 | 49971 | 1.83 | Premium | 4 | H | SI1 | 3 | 10162 |
49971 | 49972 | 1.11 | Ideal | 5 | I | VS2 | 5 | 5506 |
49972 | 49973 | 1.00 | Very Good | 3 | E | SI2 | 2 | 3763 |
49973 | 49974 | 0.61 | Ideal | 5 | D | VVS1 | 7 | 3625 |
49974 | 49975 | 0.80 | Very Good | 3 | G | SI2 | 2 | 2451 |
49975 | 49976 | 0.53 | Ideal | 5 | E | SI1 | 3 | 1564 |
49976 | 49977 | 0.34 | Very Good | 3 | E | VS2 | 5 | 659 |
49977 | 49978 | 1.22 | Ideal | 5 | H | VS2 | 5 | 7584 |
49978 | 49979 | 0.57 | Very Good | 3 | J | VS1 | 4 | 1270 |
49979 | 49980 | 0.58 | Very Good | 3 | I | VVS1 | 7 | 1790 |
49980 | 49981 | 0.32 | Ideal | 5 | I | SI2 | 2 | 371 |
49981 | 49982 | 0.42 | Premium | 4 | D | SI1 | 3 | 1040 |
49982 | 49983 | 0.35 | Very Good | 3 | H | SI1 | 3 | 491 |
49983 | 49984 | 0.46 | Ideal | 5 | E | SI2 | 2 | 870 |
49984 | 49985 | 0.70 | Ideal | 5 | I | VS1 | 4 | 2398 |
49985 | 49986 | 0.43 | Ideal | 5 | G | VVS2 | 6 | 1129 |
49986 | 49987 | 1.01 | Premium | 4 | G | VS1 | 4 | 6932 |
49987 | 49988 | 1.03 | Premium | 4 | F | VS1 | 4 | 7328 |
49988 | 49989 | 0.47 | Ideal | 5 | H | VS2 | 5 | 1058 |
49989 | 49990 | 1.01 | Premium | 4 | I | VVS1 | 7 | 4989 |
49990 | 49991 | 1.21 | Good | 2 | F | VS2 | 5 | 7786 |
49991 | 49992 | 0.43 | Ideal | 5 | I | VVS1 | 7 | 848 |
49992 | 49993 | 0.70 | Premium | 4 | G | VVS1 | 7 | 3105 |
49993 | 49994 | 0.50 | Ideal | 5 | G | VS2 | 5 | 1449 |
49994 | 49995 | 0.33 | Very Good | 3 | G | VVS2 | 6 | 692 |
49995 | 49996 | 0.71 | Ideal | 5 | H | VVS1 | 7 | 2918 |
49996 | 49997 | 0.43 | Ideal | 5 | G | VVS2 | 6 | 1056 |
49997 | 49998 | 1.14 | Premium | 4 | G | VS2 | 5 | 6619 |
49998 | 49999 | 1.01 | Premium | 4 | E | VS2 | 5 | 6787 |
49999 | 50000 | 1.77 | Premium | 4 | J | VS2 | 5 | 9428 |
50000 rows × 8 columns
-
Carat represents the weight of the diamond, and is a numerical variable.
-
Cut represents the quality of the cut of the diamond, and falls into 5 categories: fair, good, very good, ideal, and premium. Each of these categories are represented by a number, 1-5, in the Cut_Ord variable.
-
Clarity represents the internal purity of the diamond, and falls into 8 categories: I1, SI2, SI1, VS1, VS2, VVS2, VVS1, and IF. Each of these categories are represented by a number, 1-8, in the Clarity_Ord variable
new_diamonds
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
Unnamed: 0 | carat | cut | cut_ord | color | clarity | clarity_ord | |
---|---|---|---|---|---|---|---|
0 | 1 | 1.22 | Premium | 4 | G | SI1 | 3 |
1 | 2 | 1.01 | Good | 2 | G | VS2 | 5 |
2 | 3 | 0.71 | Very Good | 3 | I | VS2 | 5 |
3 | 4 | 1.01 | Ideal | 5 | D | SI2 | 2 |
4 | 5 | 0.27 | Ideal | 5 | H | VVS2 | 6 |
5 | 6 | 0.52 | Premium | 4 | G | VS1 | 4 |
6 | 7 | 1.01 | Premium | 4 | F | SI1 | 3 |
7 | 8 | 0.59 | Ideal | 5 | D | SI1 | 3 |
8 | 9 | 1.01 | Good | 2 | E | SI1 | 3 |
9 | 10 | 2.03 | Ideal | 5 | F | SI2 | 2 |
10 | 11 | 1.35 | Premium | 4 | H | VS2 | 5 |
11 | 12 | 0.74 | Ideal | 5 | G | SI1 | 3 |
12 | 13 | 0.90 | Premium | 4 | D | SI1 | 3 |
13 | 14 | 0.30 | Good | 2 | G | VS2 | 5 |
14 | 15 | 1.01 | Good | 2 | F | VS2 | 5 |
15 | 16 | 1.02 | Good | 2 | H | SI2 | 2 |
16 | 17 | 2.05 | Premium | 4 | G | SI1 | 3 |
17 | 18 | 0.54 | Ideal | 5 | I | SI1 | 3 |
18 | 19 | 0.72 | Ideal | 5 | G | VS2 | 5 |
19 | 20 | 2.00 | Premium | 4 | J | SI2 | 2 |
20 | 21 | 1.57 | Premium | 4 | G | SI2 | 2 |
21 | 22 | 0.89 | Premium | 4 | G | SI1 | 3 |
22 | 23 | 0.33 | Premium | 4 | I | VVS2 | 6 |
23 | 24 | 0.30 | Very Good | 3 | G | IF | 8 |
24 | 25 | 1.79 | Very Good | 3 | I | VS2 | 5 |
25 | 26 | 1.11 | Ideal | 5 | E | SI2 | 2 |
26 | 27 | 0.79 | Premium | 4 | F | SI1 | 3 |
27 | 28 | 0.71 | Very Good | 3 | F | SI1 | 3 |
28 | 29 | 0.73 | Ideal | 5 | G | SI1 | 3 |
29 | 30 | 0.90 | Very Good | 3 | G | VS2 | 5 |
... | ... | ... | ... | ... | ... | ... | ... |
2970 | 2971 | 1.00 | Premium | 4 | F | VS2 | 5 |
2971 | 2972 | 1.07 | Premium | 4 | E | SI2 | 2 |
2972 | 2973 | 1.21 | Very Good | 3 | F | VS1 | 4 |
2973 | 2974 | 0.46 | Fair | 1 | G | VS1 | 4 |
2974 | 2975 | 0.79 | Good | 2 | I | VS2 | 5 |
2975 | 2976 | 0.32 | Ideal | 5 | F | VVS2 | 6 |
2976 | 2977 | 0.40 | Ideal | 5 | F | VS2 | 5 |
2977 | 2978 | 0.51 | Fair | 1 | E | VS2 | 5 |
2978 | 2979 | 1.05 | Premium | 4 | J | VS1 | 4 |
2979 | 2980 | 0.43 | Very Good | 3 | H | SI1 | 3 |
2980 | 2981 | 0.30 | Ideal | 5 | G | VVS2 | 6 |
2981 | 2982 | 1.51 | Premium | 4 | E | SI1 | 3 |
2982 | 2983 | 1.54 | Ideal | 5 | J | VS2 | 5 |
2983 | 2984 | 1.16 | Ideal | 5 | I | VS2 | 5 |
2984 | 2985 | 1.01 | Premium | 4 | D | SI1 | 3 |
2985 | 2986 | 1.01 | Very Good | 3 | I | VS2 | 5 |
2986 | 2987 | 0.83 | Premium | 4 | H | SI1 | 3 |
2987 | 2988 | 0.93 | Very Good | 3 | F | VS2 | 5 |
2988 | 2989 | 1.20 | Very Good | 3 | G | SI2 | 2 |
2989 | 2990 | 0.80 | Ideal | 5 | E | VVS2 | 6 |
2990 | 2991 | 1.11 | Ideal | 5 | D | VVS2 | 6 |
2991 | 2992 | 1.31 | Premium | 4 | H | SI2 | 2 |
2992 | 2993 | 1.11 | Good | 2 | F | SI1 | 3 |
2993 | 2994 | 1.10 | Good | 2 | D | SI2 | 2 |
2994 | 2995 | 1.22 | Premium | 4 | I | SI2 | 2 |
2995 | 2996 | 0.72 | Ideal | 5 | F | SI2 | 2 |
2996 | 2997 | 1.09 | Premium | 4 | I | VS2 | 5 |
2997 | 2998 | 1.05 | Very Good | 3 | G | SI1 | 3 |
2998 | 2999 | 0.70 | Fair | 1 | G | SI1 | 3 |
2999 | 3000 | 1.01 | Very Good | 3 | F | SI1 | 3 |
3000 rows × 7 columns
1. According to the model, if a diamond is 1 carat heavier than another with the same cut, how much more should I expect to pay? Why?
def calculate_diamond_price(carat, cut, clarity):
diamond_price = -5269 + (8413 * carat) + (158.1 * cut) + (454 * clarity)
return diamond_price
def format_currency(amount):
return '${:,.2f}'.format(amount)
diamond_one_carat = calculate_diamond_price(1,1,1)
diamond_two_carats = calculate_diamond_price(2,1,1)
price_difference = diamond_two_carats - diamond_one_carat
print("Diamond 1 carat price:\t ",diamond_one_carat,"$\nDiamond 2 carats price:\t",diamond_two_carats,
"$\nPrice difference:\t ",price_difference,"$")
Diamond 1 carat price: 3756.1 $
Diamond 2 carats price: 12169.1 $
Price difference: 8413.0 $
According to our equation: Price = -5,269 + 8,413 x Carat + 158.1 x Cut + 454 x Clarity
1 carat will increase the price in 8,413 $
2. If you were interested in a 1.5 carat diamond with a Very Good cut (represented by a 3 in the model) and a VS2 clarity rating (represented by a 5 in the model), how much would the model predict you should pay for it?
diamond_one_half_carat = calculate_diamond_price(1.5,3,5)
print("Diamond Carat: 1.5 Cut: Very Good Clarity: VS2 price:\t ",diamond_one_half_carat,"$")
Diamond Carat: 1.5 Cut: Very Good Clarity: VS2 price: 10094.8 $
The model predicts a price of 10,094.8 $
1. Plot 1 - Plot the data for the diamonds in the database, with carat on the x-axis and price on the y-axis.
x = diamonds['carat']
y = diamonds['price']
plt.figure(figsize=(14,8))
plt.ylabel('Price')
plt.xlabel('Carat')
plt.title('Price vs Carat')
plt.scatter(x, y, alpha=0.4, color='green')
<matplotlib.collections.PathCollection at 0x13498454c50>
new_diamonds["predicted_price"] = calculate_diamond_price(new_diamonds['carat'],new_diamonds['cut_ord'], new_diamonds['clarity_ord'])
new_diamonds
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
Unnamed: 0 | carat | cut | cut_ord | color | clarity | clarity_ord | predicted_price | |
---|---|---|---|---|---|---|---|---|
0 | 1 | 1.22 | Premium | 4 | G | SI1 | 3 | 6989.26 |
1 | 2 | 1.01 | Good | 2 | G | VS2 | 5 | 5814.33 |
2 | 3 | 0.71 | Very Good | 3 | I | VS2 | 5 | 3448.53 |
3 | 4 | 1.01 | Ideal | 5 | D | SI2 | 2 | 4926.63 |
4 | 5 | 0.27 | Ideal | 5 | H | VVS2 | 6 | 517.01 |
5 | 6 | 0.52 | Premium | 4 | G | VS1 | 4 | 1554.16 |
6 | 7 | 1.01 | Premium | 4 | F | SI1 | 3 | 5222.53 |
7 | 8 | 0.59 | Ideal | 5 | D | SI1 | 3 | 1847.17 |
8 | 9 | 1.01 | Good | 2 | E | SI1 | 3 | 4906.33 |
9 | 10 | 2.03 | Ideal | 5 | F | SI2 | 2 | 13507.89 |
10 | 11 | 1.35 | Premium | 4 | H | VS2 | 5 | 8990.95 |
11 | 12 | 0.74 | Ideal | 5 | G | SI1 | 3 | 3109.12 |
12 | 13 | 0.90 | Premium | 4 | D | SI1 | 3 | 4297.10 |
13 | 14 | 0.30 | Good | 2 | G | VS2 | 5 | -158.90 |
14 | 15 | 1.01 | Good | 2 | F | VS2 | 5 | 5814.33 |
15 | 16 | 1.02 | Good | 2 | H | SI2 | 2 | 4536.46 |
16 | 17 | 2.05 | Premium | 4 | G | SI1 | 3 | 13972.05 |
17 | 18 | 0.54 | Ideal | 5 | I | SI1 | 3 | 1426.52 |
18 | 19 | 0.72 | Ideal | 5 | G | VS2 | 5 | 3848.86 |
19 | 20 | 2.00 | Premium | 4 | J | SI2 | 2 | 13097.40 |
20 | 21 | 1.57 | Premium | 4 | G | SI2 | 2 | 9479.81 |
21 | 22 | 0.89 | Premium | 4 | G | SI1 | 3 | 4212.97 |
22 | 23 | 0.33 | Premium | 4 | I | VVS2 | 6 | 863.69 |
23 | 24 | 0.30 | Very Good | 3 | G | IF | 8 | 1361.20 |
24 | 25 | 1.79 | Very Good | 3 | I | VS2 | 5 | 12534.57 |
25 | 26 | 1.11 | Ideal | 5 | E | SI2 | 2 | 5767.93 |
26 | 27 | 0.79 | Premium | 4 | F | SI1 | 3 | 3371.67 |
27 | 28 | 0.71 | Very Good | 3 | F | SI1 | 3 | 2540.53 |
28 | 29 | 0.73 | Ideal | 5 | G | SI1 | 3 | 3024.99 |
29 | 30 | 0.90 | Very Good | 3 | G | VS2 | 5 | 5047.00 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
2970 | 2971 | 1.00 | Premium | 4 | F | VS2 | 5 | 6046.40 |
2971 | 2972 | 1.07 | Premium | 4 | E | SI2 | 2 | 5273.31 |
2972 | 2973 | 1.21 | Very Good | 3 | F | VS1 | 4 | 7201.03 |
2973 | 2974 | 0.46 | Fair | 1 | G | VS1 | 4 | 575.08 |
2974 | 2975 | 0.79 | Good | 2 | I | VS2 | 5 | 3963.47 |
2975 | 2976 | 0.32 | Ideal | 5 | F | VVS2 | 6 | 937.66 |
2976 | 2977 | 0.40 | Ideal | 5 | F | VS2 | 5 | 1156.70 |
2977 | 2978 | 0.51 | Fair | 1 | E | VS2 | 5 | 1449.73 |
2978 | 2979 | 1.05 | Premium | 4 | J | VS1 | 4 | 6013.05 |
2979 | 2980 | 0.43 | Very Good | 3 | H | SI1 | 3 | 184.89 |
2980 | 2981 | 0.30 | Ideal | 5 | G | VVS2 | 6 | 769.40 |
2981 | 2982 | 1.51 | Premium | 4 | E | SI1 | 3 | 9429.03 |
2982 | 2983 | 1.54 | Ideal | 5 | J | VS2 | 5 | 10747.52 |
2983 | 2984 | 1.16 | Ideal | 5 | I | VS2 | 5 | 7550.58 |
2984 | 2985 | 1.01 | Premium | 4 | D | SI1 | 3 | 5222.53 |
2985 | 2986 | 1.01 | Very Good | 3 | I | VS2 | 5 | 5972.43 |
2986 | 2987 | 0.83 | Premium | 4 | H | SI1 | 3 | 3708.19 |
2987 | 2988 | 0.93 | Very Good | 3 | F | VS2 | 5 | 5299.39 |
2988 | 2989 | 1.20 | Very Good | 3 | G | SI2 | 2 | 6208.90 |
2989 | 2990 | 0.80 | Ideal | 5 | E | VVS2 | 6 | 4975.90 |
2990 | 2991 | 1.11 | Ideal | 5 | D | VVS2 | 6 | 7583.93 |
2991 | 2992 | 1.31 | Premium | 4 | H | SI2 | 2 | 7292.43 |
2992 | 2993 | 1.11 | Good | 2 | F | SI1 | 3 | 5747.63 |
2993 | 2994 | 1.10 | Good | 2 | D | SI2 | 2 | 5209.50 |
2994 | 2995 | 1.22 | Premium | 4 | I | SI2 | 2 | 6535.26 |
2995 | 2996 | 0.72 | Ideal | 5 | F | SI2 | 2 | 2486.86 |
2996 | 2997 | 1.09 | Premium | 4 | I | VS2 | 5 | 6803.57 |
2997 | 2998 | 1.05 | Very Good | 3 | G | SI1 | 3 | 5400.95 |
2998 | 2999 | 0.70 | Fair | 1 | G | SI1 | 3 | 2140.20 |
2999 | 3000 | 1.01 | Very Good | 3 | F | SI1 | 3 | 5064.43 |
3000 rows × 8 columns
2. Plot 2 - Plot the data for the diamonds for which you are predicting prices with carat on the x-axis and predicted price on the y-axis.
plt.style.use('seaborn')
plt.figure(figsize=(14,8))
x_new = new_diamonds['carat']
y_new = new_diamonds['predicted_price']
plt.ylabel('Predicted Price')
plt.xlabel('Carat')
plt.title('Predicted Price vs Carat')
plt.scatter(x_new, y_new, alpha=0.4, color = 'blue')
<matplotlib.collections.PathCollection at 0x134987a27f0>
plt.figure(figsize=(14,8))
plt.scatter(x, y, alpha=0.3, color='green')
plt.scatter(x_new, y_new, alpha=0.3, color = 'blue')
plt.ylabel('Price', fontsize=15)
plt.xlabel('Carat', fontsize=15)
plt.title('Price vs Carat', fontsize=15)
plt.legend(loc='upper right', frameon=False, fontsize=15)
plt.show
<function matplotlib.pyplot.show>
3. What strikes you about this comparison? After seeing this plot, do you feel confident in the model’s ability to predict prices?
Comparing both plots we can see that the model is making a good average prediction on prices, we can conclude that the model is working fine and that predicted prices are correct.
1. What price do you recommend the jewelry company to bid? Please explain how you arrived at that number.
total_diamonds_cost = np.sum(new_diamonds['predicted_price'])
print("Total diamonds cost estimate: ",format_currency(total_diamonds_cost))
Total diamonds cost estimate: $11,733,522.76
bid_value = (total_diamonds_cost * 70) / 100
print("Total bid value recommendation: ",format_currency(bid_value))
Total bid value recommendation: $8,213,465.93
The final retail price the consumer will pay for all 3,000 diamonds will add up to $11,733,522.76
Since the company generally purchases diamonds from distributors at 70% of the that price my bid recommendation is $8,213,465.93 which is exactly 70% of the retail value.