uc-macss / persp-model-econ_w19 Goto Github PK
View Code? Open in Web Editor NEWCourse site for MACS 30150 (Winter 2019) - Perspectives on Computational Modeling for Economics
Course site for MACS 30150 (Winter 2019) - Perspectives on Computational Modeling for Economics
In the PS7 Q2, you might encounter this error for creating the linear/cublic spline object:
ValueError: x must be strictly increasing
The LSQUnivariateSpline enforces input array to be strictly increasing since 2017 (Github Issue). Here are three ways I have thought of to get around:
Use the older Scipy version before the check is implemented. It seems this is what some last years' students did (Github issue). I didn't think this is an efficient approach.
Implement the spline function from scratch. I did the linear spline and am working on cubic spline. However, the question requires to use LQUnivariateSpline.
Grouping the data by the age with average Coolness
df2=df.groupby('Age').mean()
df2['Age']=df2.index
The new dataframe will look like this:
Age | Cool |
---|
11.0 | 10.110237
12.0 | 9.365623
13.0 | 10.015882
14.0 | 11.747109
15.0 | 15.434739
This solves the problem.
Here is the link to my interactive plot:
Problem 4 in ACME: Numerical Differentiation says the data is stored in the file plane.npy, where can we get the file?
Hi all,
I have completed grading PS4. There is some issue about a common mistake I want to bring to your attention:
When calculating the estimated variance covariance matrix of the estimates, results.hess_inv is not a ndarray but a scipy.optimize.LbfgsInvHessProduct object, which will interpret '*' as dot multiplication instead of pointwise multiplication, thus generating irregular VCV result. Here we should use results.hess_inv.todense() * OffDiagNeg, i.e., transform it to a dense matrix, and then do the pointwise multiplication. @rickecon
Please don't hesitate to send me an email or come to my office hour if you have any question regarding your PS4's grades.
Best,
Winston
In the data given for Q2, the values given for the number of children are all fractional (float type) - do we go ahead and use this value only or do we truncate the decimal part for accuracy (you can't have fractional # of children IRL)?
In the notebook LogitKNN, we have the following for confusion matrix:
[[180, 32],
[ 48, 96]]),
where 180 is True Positives, 32 is False Positives, 48 is False Negatives, and 96 is True Negatives.
So the matrix is telling us that 180 and 96 are the numbers of correct predictions.
The model predicts 32 passengers who actually died as survived and 48 passengers who actually survived as died.
Here is a detailed discussion with Dr. Evans @rickecon about playing around different expressions of the composite Simpson's Rule.
In the ACME Numerical Integration Notebook, the equation is:
In the Jupyter Notebook, the equation is:
There are several differences:
a. Denominators are 3(N+1)
,6N
,3n
respectively.
b. To have even intervals, 1 and 2 uses 2N
and 3 uses n
. So it is inconsistent in 1 and 2 to have N intervals
and nodes x_0,x_1,...x_2N
. It should be 2N
intervals in 1 and 2.
c. Minor things: 1 and 2 uses g(x)
and 3 uses f(x)
. 3 also has an error term which could be ignored for now.
Example:
Suppose we use four intervals (n=2N=4) to approximate the integral,
In 1 and 2, 2N=4, N=2, so the term in squared bracket is f(x_0)+2*f(x_2)+4*(f(x_1)+f(x_3))+f(x_4)
In 3, n=4, the term in squared bracket is g(x_0)+2*g(x_2)+4*(g(x_1)+g(x_3))+g(x_4)
So they are the same!
My code implementation using Equ 2 (with 2N):
N=1000000
a,b=-10,10
g=lambda x: 0.1*x**4-1.5*x**3+0.53*x**2+2*x+1
intersim=[a+i*(b-a)/(2*N) for i in range(2*N+1)]
appsim=(g(intersim[0])+g(intersim[-1])+\
4*sum(g(i) for i in intersim[1:-1:2])+\
2*sum(g(i) for i in intersim[2:-2:2]))*(b-a)/(6*N)
appsim
The result will be 4373.333333333138
Conclusion: Follow the equation in the Jupyter notebook will yield correct answer. It's the typo N intervals
which confuses me in class. Changing it to 2N
will agree with the textbook formula. Hope this helps.
Reference:
Numerical Analysis, 9th edition, by R. Burden and J. Faires
Numerical Integration by ACME at BYU
Numerical Integration notebook by Dr. Evans
@rickecon
Hi Rick,
In (b) logistic regression, we should tune the parameters penalty
and C
instead of max depth
, min samples split
, and min samples leaf
.
In (c) random forest, n_estimators
are the number of trees in the forest. I think it makes more sense to use
sp_randint(10,200) instead of [10,200]. If using [10,200], we are only alternating between two values. However, this will add much more computing complexity and not guaranteed to get the best score because of the randomness.
In (d) SVC, shrinking need to be a boolean value, so it could be either [True, False] or [1,0] (this is corrected in class).
In the PS for plotting the data, we've been asked to use this this package:
from pandas.plotting import scatter_matrix
scatter_matrix(df_quant, alpha=0.3, figsize=(6, 6), diagonal='kde')
But the plot quality isn't that great. The seaborn package does a much better job at producing the same plot:
import seaborn as sns
sns.pairplot(auto_df, dropna=True)
Can I go ahead with the seaborn version? Ty!
In exercise 5.16, the question says that the epsilon's mean equals mu, but the distribution is N(0, sigma). I don't know if somebody has the same question, but I think the center of the distribution should also be mu? Hopefully the issue is not too late.
In that question, we've been asked to set 'normed = True' option. But when I checked about the function online here, it is given that "normed : bool, optional. Deprecated; use the density keyword argument instead." Also, in MLE notebook density option is being used. So do we still go ahead and use normed?
@rickecon
I tried using boosting to reduce MSE. It works well for Hitters data.
For trees with max_depth=3, min_samples_leaf=4,
Decision Tree
mean_squared_error(y_test, hit_tree2.predict(X_test))
MSE= 146493
Random Forest
rfr = RandomForestRegressor(n_estimators=100, max_depth=3, min_samples_leaf=4, random_state=25)
rfr.fit(X_train,y_train)
mean_squared_error(y_test, rfr.predict(X_test))
MSE= 110005
Gradient Boosting
clf = GradientBoostingRegressor(n_estimators=100, max_depth=3, min_samples_leaf=4,random_state=25)
clf.fit(X_train,y_train)
mean_squared_error(y_test, clf.predict(X_test))
MSE= 102931
AdaBoost
ada = AdaBoostRegressor(hit_tree2, n_estimators=100,random_state=25)
ada.fit(X_train,y_train)
mean_squared_error(y_test, ada.predict(X_test))
MSE= 107001
Hi,
I think this is a typo, but I thought I would confirm anyway. So in Q2, you have:
Create a binary variable mpg high that equals 1 if mpg high≥ median(mpg high) and equals either 0 if mpg high< median(mpg high).
Shouldn't we have median(mpg) instead of median(mpg_high)? Thanks!
In the question, if we set K=2, then observation 4 and 6 will be in the neighborhood, but they are red and green respectively. How should we handle situations like this?
In this part, we've been asked to predict the mpg values from the model that was built. The given model year for prediction is 1999. In the auto data that's been given, the variable 'year' takes values between 70 and 82. So shouldn't we be using 99 and not 1999 for prediction?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.