uc-macss / persp-model-econ_w20 Goto Github PK

Course site for MACS 30150 (Winter 2020) - Perspectives on Computational Modeling for Economics

TeX 0.06% Jupyter Notebook 99.94%

persp-model-econ_w20's Introduction

MACS 30150 - Perspectives on Computational Modeling in Economics (Winter 2020)

	Dr. Richard Evans	Keertana Chidambaram
Email	[email protected]	[email protected]
Office	1155 S 60th St., Room 217
Office Hours	T 10:30am-12:30pm	Th 1-3p, (224 Lounge)
GitHub	rickecon	keertanavc

Meeting day/time: MW 11:30am-1:20pm, Saieh Hall, Room 247
Office hours also available by appointment

Main text

James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning. New York: Springer.

Course description

This course is an economics-focused survey of modern computational modeling methods that are valuable to empirical, computational, and numerical research. The course begins with some basics of numerical derivatives and integrals. We then spend two days on dynamic programming, which is a very general and flexible way to pose a dynamic problem and which has powerful iterative global solution techniques. We then transition into a week-and-a-half of structural estimation methods. These methods generalize many of the specific estimation techniques that come after. Finally, we spend the last half of the course with a survey of some of the most common statistical learning/machine learning methods.

Grades

You will have 9 problem sets throughout the term. I will drop everybody's lowest problem set score. For this reason, problem sets will only account for 80 percent of your grade.

Assignment	Quantity	Points	Total Points	Percent
Problem Sets	9	10	80	80%
Midterm exam	1	20	20	20%
Total Points	--	--	100	100%

Late problem sets will be penalized 2 points for every hour they are late. For example, if an assignment is due on Monday at 11:30am, the following points will be deducted based on the time stamp of the last commit.

Example PR last commit	points deducted
11:31am to 12:30pm	-2 points
12:31pm to 1:30pm	-4 points
1:31pm to 2:30pm	-6 points
2:31pm to 3:30pm	-8 points
3:30pm and beyond	-10 points (no credit)

Assignment submission procedure

This folder on your fork of the class repository github.com/YourGitHubHandle/persp-model-econ_W19/ProblemSets/ is where you will submit your problem sets and project assignments. You will just commit and push your assignments to the appropriate folder ON YOUR FORK (NOT ON THIS MAIN COURSE REPOSITORY). For example, your files for PS1 should be committed to the PS1 folder on your fork of the class repository.

/persp-model-econ_W19/ProblemSets/PS1/YourFile.pdf

I will use a shell script to clone all class members' repositories at the time the assignments are due.

Disability services

If you need any special accommodations, please provide us with a copy of your Accommodation Determination Letter (provided to you by the Student Disability Services office) as soon as possible so that you may discuss with me how your accommodations may be implemented in this course.

Course schedule

Date	Day	Topic	Readings	Assignment
Jan. 6	M	Model/theory building, data generating processes	V1997, Slides	PS1
Jan. 8	W	Numerical derivatives	Notes	PS2
Jan. 13	M	Numerical integration	Notebk
Jan. 15	W	Dynamic programming	Notes	PS3
Jan. 20	M	No class (Martin Luther King, Jr. Day)
Jan. 22	W	Dynamic programming
Jan. 27	M	Maximum likelihood estimation (MLE)	Notebk	PS4
Jan. 29	W	Maximum likelihood estimation (MLE)
Feb. 3	M	Generalized method of moments (GMM)	Notebk	PS5
Feb. 5	W	Generalized method of moments (GMM)
Feb. 10	M	Evans Midterm
Feb. 12	W	Statistical learning and linear regression	JWHT Ch. 2, 3, Notebk	PS6
Feb. 17	M	Classification and logistic regression	JWHT Chs. 2, 4, Notebk
Feb. 19	W	Resampling methods (cross-validation and bootstrapping)	Notebk	PS7
Feb. 24	M	Interpolation	Notebk
Feb. 26	W	Tree-based methods	JWHT Ch. 8, Notebk	PS8
Mar. 2	M	Tree-based methods	JWHT Ch. 8
Mar. 4	W	Support vector machines	JWHT Ch. 9, Notebk
Mar. 9	M	Neural networks	HTF Ch. 11, G Ch. 10	PS9
Mar. 11	W	Neural networks	Notebk

References and Readings

All readings are required unless otherwise noted. Adjustments can be made throughout the quarter; be sure to check this repository frequently to make sure you know all the assigned readings.

[A2019] Athey, Susan, ``The Impact of Machine Learning on Econometrics and Economics,'' keynote presentation, American Economics Association/American Finance Association joint luncheon at the Allied Social Sciences 2019 Annual Meeting, Atlanta, Georgia (January 5, 2019). [Slides]
[A2018] Athey, Susan, ``The Impact of Machine Learning on Economics,'' forthcoming in The Economics of Artificial Intelligence: An Agenda, eds. Ajay K. Agarwal, Joshua Gans, and Avi Goldfarb, National Bureau of Economic Research (forthcoming).
[G2017] Géron, Aurélien, Hands-On Machine Learning with Scikit-Learn & TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems, O'Reilly (2017).
[HTF2009] Hastie, Trevor, Robert Tibshirani, and Jerome Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition, Springer (2009).
[JWHT2013] James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning. New York: Springer.
[V2016] VanderPlas, Jake. (2016). Python Data Science Handbook. O'Reilly Media, Inc.
[V1997] Varian, Hal R., "How to Build an Economic Model in Your Spare Time," in Passion and Craft: Economists at Work, eds. Michael Szenberg, University of Michigan Press, 1997.

persp-model-econ_w20's People

Contributors

Stargazers

Watchers

persp-model-econ_w20's Issues

Evans' maximum likelihood linear regression code (almost real code)

Here is some pseudo code.

# Linear regression by Max Likelihood
import numpy as np
import scipy.stats as sts
import scipy.optimize as opt

y, x1, x2 = data

def log_lik(y, x1, x2, beta_0,
            beta_1, beta_2, sigma):
    epsilon = y - beta_0 - beta_1 * x1 - beta_2 * x2
    pdf_vals = sts.norm.pdf(epsilon, loc=0.0, scale=sigma)
    log_lik_func = np.log(pdf_vals).sum()
    
    return log_lik_func

def crit_lr(params, *args):
    beta_0, beta_1, beta_2, sigma = params
    y, x1, x2 = args
    neg_log_lik = -log_lik(y, x1, x2, beta_0,
                           beta_1, beta_2, sigma)
    
    return neg_log_lik

beta_0_init = ?
beta_1_init = ?
beta_2_init = ?
sigma_init = ?
params_init = np.array([beta_0_init, beta_1_init,
                        beta_2_init, sigma_init])
args_lr = (y, x1, x2)
results = opt.minimize(crit_lr, params_init, args=(args_lr))

Where is your assignment?

@Leahjl. I cloned your fork of the repository, and there was no assignment.

Assignment Info

Hello everyone!

Please reply to this thread with your name, Github handle, and your CNetID (i.e. your @uchicago.edu ID). Thank you.

Keertana

Resources for PS 2 and 3

I received a few questions for the coding part of the assignment and I felt it would be useful to share a few functions that might help you with PS 2 and 3. Good luck!

# Converting a symbolic function to a regular python function that takes arguments
import sympy as sy

x = sy.symbols('x')
f = (2 * sy.pi * sy.cos(x)) 
# Note: when you use functions like cosine/sine or constants like pi, not using the
# sympy version of the function / constant can lead to errors
print('function:', f)

# Method 1: lamdifying the function
g = sy.lambdify(x, f)
# g is a regular python function that takes x and correspondingly
# evaluates f(x)
print('lamdified evaluation:', g(10)) # = 2 * pi * cos(10)

# Method 2: substitution
print('substitution evaluation:', f.subs({x:10}).evalf())

# Keeping track of time to run a code
import time

# Record start time
start_time = time.clock()
count = 0
for i in range(100000):
    count += i
end_time = time.clock()
print('Running time for code = ', end_time-start_time, ' seconds')

# Drawing values from distribution
import scipy.stats as ss
import numpy as np
x, mu, sigma = (0, 0, 1)
# Normal distribution
# Cummulative probability
print(ss.norm.cdf(x, mu, sigma))
# Probability density function
print(ss.norm.pdf(x, mu, sigma))

# Lognormal distribution
# Cummulative probability
x = 1
print(ss.lognorm.cdf(x, sigma, mu, np.exp(mu)))
# Probability density function
print(ss.lognorm.pdf(x, sigma, mu, np.exp(mu)))

# Random draws from a uniform distribution
np.random.seed = 25
x_lb, x_ub, N = 0, 1, 10
# Setting the seed allows you to get the same random draws everytime
print(np.random.uniform(x_lb, x_ub, N))
# This code draws 10 random samples uniformly 
# distributed between 0 and 1

# Some standard numpy arrays
print(np.zeros((3, 5))) # Creates a matrix of size 3x5 filled with zeros
print(np.eye(4)) # Creates an identity matrix of size 4x4

Getting Started with Git and GitHub

Dear MACS 30150 Perspectives on Computational Modeling for Economists students,

Here is a brief tutorial about how I want you to interact with this course through Git and GitHub. A good help is the Git and GitHub tutorial (git_tutorial.pdf) that I have posted in the Tutorials folder. Particularly valuable is Figure 3.2. But here is a summary of the steps. If you have gone far down an incorrect road, you can always "burn down" (delete) your fork and restart by following these directions.

Make sure Git is installed on your machine and set at least the user.name and user.email settings in git config as described in Section 3.2 of the tutorial.
For this main remote repository by clicking on the "Fork" button in the upper-right corner of the repository main page. This will create a remote copy of this main remote repository on your GitHub account in the cloud.
Clone your remote fork of this repository by going to the main page of your fork of the repository https://github.com/[YourGitHubHandle]/persp-model-econ_W20/, click on the green "Clone or download" button near the top right of the page, and copy the URL in the window to your clipboard. Then open a terminal window and navigate to the directory where you want to store this repository on your local machine. Type git clone [URL], where the URL in brackets is just the URL you copied from the "Clone or download" button (without the brackets).
In your terminal, change the directory to your new local Git repository by typing cd persp-model-econ_W20. You should see this folder in your hard drive file structure. If you type git remote -v you'll see that your local repository has a remote associated with it named origin, and that remote is your remote fork of the main repository.
Add another remote to be associated with your local repository. This remote will be the the main remote repository for the class (my repository). We'll call this remote upstream as opposed to your other remote named origin. In your browser, go to the main repository home page for the class (https://github.com/UC-MACSS/persp-model-econ_W20). Click on the "Clone or download" button near the top-right of the page. Copy the URL in that window to your clipboard. Then go back to your terminal and type the following: git remote add upstream [URL], where the URL in brackets is just the URL you copied from the "Clone or download" button (without the brackets). Now if you type git remote -v, you'll see that your local git repository on your computer has two remotes that you can draw from. The remote origin is your remote fork of that main repository that is on your GitHub account. The remote upstream is the main remote class repository.
- If you want to download any updates to your fork both locally and remotely, you will type the following commands in your terminal once you have navigated to your repository folder. The first command pulls all the changed contain from the upstream main class repository and stages it to be incorporated into your local repo on your hard drive. The second command merges those changes into your local repository on your computer. The last command pushes those changes up to your remote fork of the main repository.

>>> git fetch upstream
>>> git merge upstream/origin
>>> git push origin

You can make changes to your fork of the repository locally on your computer. When you are done or at any point that you want to save your changes to your local Git repository and to your remote fork of the repository, you can type the following commands.

>>> git add [filename]
or
>>> git add [directory]/*
or
>>> git add -A
>>> git commit -m "[give short description of commit]"
>>> git push origin

These are just some instructions to get you started. I gave an 18-hour training on Git and GitHub research collaboration in September to advanced pre-doc students from around the country at NYU. All the slides, notes, notebooks, and problem sets for that training are publicly available in this repository (https://github.com/nyupredocs/githubtutorial). And I will be teaching a full course on this topic this Spring term.

Questions about the initial guess for scipy.optimize.minimize

Dear Dr. @rickecon ,

In HW5 Q1.c, we are asked to perform a two-step GMM estimation using an optimized weighting matrix based on the result of the previous GMM estimation that uses the identity weighting matrix. For the previous GMM estimation, I set the initial guess as mu=11, sigma=0.5 and got a satisfying result that fits the data well. However, when I use the same initial guess for the second GMM that uses the two-step optimized weighting matrix, the result becomes obviously worse than the initial GMM. Then, if I turn to use the previous GMM result as the initial guess for the second estimation, the result would be very close to the initial guess (the previous GMM result).

It seems that the optimizer highly depends on the initial guess, and even though I change the tolerant threshold, the problem still exists. I wonder is it true that the SciPy optimizer actually does need a somewhat "accurate" initial guess, or there can be other methods to avoid this problem? (PS: all the results indicate the optimizer converges with a "success: True". )

Thank you!

Notebook from Class

Hi Dr. Evans,

I was wondering if you would be able to post the Notebook from today's class as a jumping off point for the problems in the second half of Assignment 2.

Thanks,

Josh Laven

Unclear about the problem 5

@rickecon Hi Dr. Rick,
As I read the problem 5 in the Differential section, I am not sure about which kind of function we are supposed to write. Shall we define a single function which accepts an arbitrary function and calculate its Jacobian matrix (i.e. the parameter can be either "x" or anything else)? Or we can write a set of functions and separate the assignment of assigning the function and the assignment of assigning float h and a point x. I am thinking about whether we need to consider the special case where the unknown parameter should be observed by the function itself. A similar question can be applied to other problems in this problem sets. Are we suppose to consider all the possible cases of the function format and their unknown variables? Thanks.

PS2 hasn't shown in my repo

Hi, Prof. Evans. In my repo for this course, I haven't found a file for PS2 in the ProblemSet. Could you please tell me how to submit my answer? Just upload my PS2 in the ProblemSet file?

Questions about tree tuning

Dear Dr. @rickecon, @keertanavc,

Parameter distributions
In the problems set, we are required to tune the parameters of a Decision Tree and a Random Forest regression model. As is specified, the distributions of the parameters are set as the following,

In 1.(b)

from scipy.stats import randint as sp_randint
param_dist1 = {'max_depth': [3, 10], # a list?
               'min_samples_split': sp_randint(2, 20), 
               'min_samples_leaf': sp_randint(2, 20)}

In 1.(c)

param_dist2 = {'n_estimators': [10, 200], # a list?
               'max_depth': [3, 10],  # a list?
               'min_samples_split': sp_randint(2, 20), 
               'min_samples_leaf': sp_randint(2, 20), 
               'max_features': sp_randint(1, 5)}

While sp_randint is used for the other parameters, the distribution of max_depth and n_estimators are only specified by a list, which, according to the documentation of RandomizedSearchCV and GridSearchCV, means that only two numbers will be tried.

I wonder if this is intended, or a more reasonable specification would be sp_randint(int, int)?

test MSE

In 1.(b) and 1.(c), the problem description writes that

scoring='neg mean squared error' will allow you to compare the MSE of the optimized tree (it will output the negative MSE) to the MSE calculated in part (a).

However, in 1.(a), we are calculating the test MSE on the testing set, and when we tune the trees in (b) and (c), the best_score_ method returns the MSE calculated on the training set. I wonder if it may be the case that we cannot evaluate the performance of the tuning by comparing the MSE calculated on two different subsets of data?