briatte / srqm Goto Github PK
View Code? Open in Web Editor NEWAn introductory statistics course for social scientists, using Stata
Home Page: https://f.briatte.org/teaching/quanti/
An introductory statistics course for social scientists, using Stata
Home Page: https://f.briatte.org/teaching/quanti/
The math appendix uses knitr, ggplot2 and gridExtra to produce plots with math notation in the document. If you try to add extrafont
to the R script, the plots will fail to generate:
Warning in grid.Call.graphics(L_text, as.graphicsAnnot(x$label), x$x, x$y, :
font family 'LinLibertine' not found in PostScript font database
Quitting from lines 154-194 (A_math.Rnw)
Error in grid.Call.graphics(L_text, as.graphicsAnnot(x$label), x$x, x$y, :
invalid font type
I'll leave the code in a FALSE
condition in case someone finds a fix.
Round 6 (2012) of the ESS is now available.
Spotted by a smart group of students. Relevant tweet.
This is still an issue, and the code in setup/srqm_pkgs.ado
and utils.ado
(the pkgs
utility) is too complicated and not even guaranteed to work properly.
Three possible situations:
Issue (1) might be easy:
pwd
either contains /Volumes/
(Mac) or does not contain c:\
(Win), or equivalent on Unixsrqm/pkgs
folderThis might fail if the hard drive is not C:
on a Windows machine.
Issue (2) is bothersome. So far, the approach is to try the PLUS folder, and if it fails, the PERSONAL folder, and if it fails, install locally.
Perhaps it would be easier and better to just try out the default option, using something as simple as ssc inst fre
, and if it fails for whatever reason, to fall back on the local install.
https://www.trentonmize.com/software/desctable
Requires Stata 15, though.
On http://f.briatte.org/teaching/quanti/, the link to the syllabus is broken (since you moved it around).
The new version should not be very different from the 15May2013 version.
Lost count around September 2018, it seems…
Fall 2023 team: Pol-angély PESCAYRE, Alexis GRIGORIEFF and myself.
For historical research via the Wayback Machine…
http://formation.sciences-po.fr/enseignement/2018/KGLM/2015
http://formation.sciences-po.fr/enseignement/2018/KOUT/2030
Might also be useful to dig into:
https://moodle.sciences-po.fr/course/search.php?q=reasoning&areaids=core_course-course
… helps to find, for semester '202010' (Fall, probably)
The course was originally written for Stata 10/11, and has been tested with Stata 12, but not Stata 13.
Slides should be an excuse to structure the first 45' of class:
… so the slides basically tie up everything together:
Disclosure: count 5' to 10' for lateness to class (both teacher and student-induced). This means that the 20' for (3) are more like 15', and that the break has to be 10' at most, even when the students have interesting questions and ideas about their projects to share over coffee.
Some of the linked resources are probably outdated or unavailable.
Course utilities is the most useful, at least to me. I keep rediscovering some of the stuff it lists…
Data lists stuff that I communicate to students.
The course history was never clear or accurate… Let's see if planning for 2.0 (#31) helps.
The "Code" wiki page does not exactly correspond to what I teach students. For instance, my first session focuses almost just on pwd
…
The "Stata" wiki page links only to English-language stuff, but I could add @methevenin's courses in French, which are very up-to-date:
https://mthevenin.github.io/stata_fr/
https://github.com/mthevenin/stata_fr
https://github.com/mthevenin/formation_stata
There are other pages, some with close to zero usefulness, unless I put the links or references in the Stata Guide, for instance. I keep re-creating lists of courses every time I teach a new course anyway.
Bottom line — use the wiki only to document the srqm
internals (utilities), move everything else to the Stata Guide.
In the profile.do
file, I encounter an error with Linux/Stata 12 at line 34. The problem seems to be with the c()
instruction :
c(update_query) undefined
https://ncgg.princeton.edu/wep/dataverse.html
https://ncgg.princeton.edu/wep/download.html
https://ncgg.princeton.edu/wep/IPE_Codebook.pdf (outdated)
wep2020
? ipe2020
?lp_
ones in QOG (1980s)https://github.com/sergiocorreia/stata-schemes
Should be opened in its own repo too.
What are the world-c
and world-d
data files? I can't seem to open them, Stata says "file data/world-d.dta not Stata format".
Related to #25 in a way: release svyplot
.
https://gist.github.com/briatte/5099538
It's been used for good: https://twitter.com/PetGran/status/1046824377151619074
Version 2 is XeTeX-coded, so the sources should be out there too.
(Once publishable, it should be easy to bundle the replication material as a Stata package, which would also be a better way to distribute the course utilities. See Mark Lunt's epidemiology course or J. Scott Long's course for examples of courses-as-packages.)
Using notes from SRQM-TODO-2018. Other TODO files need to be added too.
With the aim of explaining better what the options are through a different use case:
Aim would be to have one 'extra' per week.
Suggestion (3) might be a good 'extra'.
CPDS, OECD, WDI are good candidates. Scruggs, if not too outdated?
See https://f.briatte.org/teaching/quanti/data for ideas.
Could you share (with me, at least) the syllabus source? I need to change some details, such as my name, the room etc. Thanks!
The course has not been tested with Stata 10 or 11 for a while, and keeping backward compatibility is important.
srqm_get
fetches course material. It's useful to distribute do-files and slides, which are often edited at the last minute.
The code for srqm_get
now points to srqm.briatte.org
, which will redirect to srqm.apinc.org
as soon as my zone file refreshes. There is a page at this address to remind students how srqm_get
works.
@joelgombin: I'll send you the address and password to the FTP.
Closes #12 and #18 in favour of a reassessment in early 2021 2023 (updated).
The ideal goal would be to maintain compatibility with all (SE/IC/MP) versions of Stata released in the last 10 years, with a focus on Stata SE.
(For reference, the course started running shortly after Stata 11.1 was released, and I think I remember testing it with Stata 10, from June 2007, possibly even Stata 9, April 2005.)
marginsplot
in "if version" conditionals.ci
and changes to the defaults of margins
(for xtlogit, re
, so not affecting the course) suggest supporting only Stata 14+. That's the lazy option.Given the above and some of the details below, the lazy objective of supporting only the last 3 versions (Stata 14+) might be more reasonable… Stata 14 was released in 2015, so that would result in a 5-year compatibility window, which is not so shabby.
There are comments about this in srqm_data.ado
on that. The current format for all teaching datasets in Stata 12.
Solution: warn (or fail?) if datasets go over 2,048 - 100 variables in srqm_data.ado
(leaving 100 variables for the user).
qog2019
is fine, 1,983 vars (leaves 65 free for the Stata/MP user to create)qog2019
makes senseGSS limited to 1976 and 2016 has ~ 1,100 vars and weighs 7.7 MB -- that should work.
keep if year==1976 | year == 2016
d, varl
foreach i of varlist `r(varlist)' {
di "`i'"
count if !mi(`i')
if r(N) == 0 {
drop `i'
}
}
Discussed in #12. It's probably time to drop HTTP support — there is no satisfying solution to continue doing so, the course is available outside of Sciences Po only via HTTPS-only GitHub, and all Sciences Po students are on HTTPS.
There are more comments about this (HTTP/S on my Stata access point) in srqm_grab.ado
.
srqm_grab.ado
contains commands to import CSV/TSV and Excel data in Stata: it will show the commands to do so for Stata 13+ (one more argument in favour of dropping Stata 12 at that stage).
memory
fails gracefully. Affects week1.do
.
ci
does not fail gracefully — it does not require mean
in Stata 12 or in Stata 13, but does in Stata 14+. Affects week4.do
and week5.do
.
marginsplot
is not supported in Stata 11-. Affects week11.do
.
This fix used to be able to set up the course on admin-restricted computers in the Sciences Po microlab. It failed today, so it needs to be tested again or modified.
https://briatte.github.io/srqm/
So badly outdated… Never really used it in class anyway. Deleting it entirely might be a better idea.
This course is now roughly 10 13 years old (first run: Fall 2010). The pretty dirty repo history shows it. It's time to think of version 2.0, although version 1.0 never got its release tag (release tags did not even exist when we started).
version n
version 13
to 'freeze' some commands, e.g. table
, margins
?setup
, especially unpublished ones; see also #26lobbying.dta
(Baumgartner), ebm2009
(Eurobarometer)And perhaps even more importantly, but (perhaps, even) more time-consumingly:
My many TODO
files from 2017, (especially) 2018, 2019, 2020 have suggestions of extra do-files to create — shorter ones, ones that cover extra stuff beyond the scope of the course (e.g. merging, panel data).
I also have some very short "demo" do-files that I use in the first hour, as recaps of the previous session + introduction to the second hour of the current one.
Use that as an opportunity to…
estout
properly? (both for "Table 1" and regression tables)week01
, week02
… week12
for obsessive neatness?
week0*-recap
do-files with just the essentialsweek**
onesxtra01
to xtra12
-- one 'bonus' do-file per week (see below)Bonus do-files (which will move out some stuff from the main ones, and will cover some intermediate/advanced topics):
xtra01-pca
-- plot a map + demo PCA (see below)xtra02-merge
-- download additional data from online + mergextra03-svy
-- survey weights: WVS 99-04xtra04-bootstrap
-- survey weights: NHIS 2017 (repeat?) + bootstrapxtra05-export
-- export descriptive stats with estout
xtra06-tests
-- survey weights: ESS 2008 (repeat?) + other association tests with ranksxtra07-ts
-- QOG time series with (extract of) 2023 edition? (serial correlation)xtra08-panels
-- robust and clustered SEs, fixed and random effects with QOG time seriesxtra09-export
-- export regression results with estout
xtra10-logit
-- AUC/ROC, predicted probabilities, ordinal logit, multinomial (?)xtra11-mfx
-- marginal effects, xtra12-count
-- survey weights: GSS + neg binomial, count, Poisson etc.?PCA example:
pca popgrowth-safewater
scoreplot, ms(i) mlab(country)
// note: tried using `kountry` to convert country names, failed so far
loadingplot
// demo arch effect, no strong 2nd dimension
pca lexp-safewater
scoreplot
Leaves out:
I once considered publishing the Stata Guide, but publishing a Stata Guide, even though some publishers would take it, sounds bizarre in 2021. R is the current standard, with Julia and Python probably coming next or along.
The online data now goes up to 2014.
Basically a follow-up of #14
gea_
versions of educational attainment.eu_
variables (small small size) during data preparation? The data trimming script probably already does so.Hello! I was just perusing Stata-related github repositories and came across this. There's no license file, so I wasn't sure if the content was openly-licensed. By default, repositories without a license file have all rights reserved.
Secondly, I've been working on a Jupyter kernel for Stata, which allows Stata to be run directly from Jupyter Notebooks. I'd been wanting to make an example notebook anyways, so I converted the week2.do
file into a Jupyter Notebook. It's nice because Github includes the output and graphs when it displays Jupyter Notebook files.
If you click here: https://github.com/kylebarron/srqm/blob/master/code/week2.ipynb you'll see all the output of the week2.do
file rendered next to the Markdown descriptions.
The sty.tex file loads a "0_myboxes" package that is, however, not included in the folder. Any idea where I could get it? (I looked quicky at Taraborelli's github page but didn't find it).
Closes #21, #22 and #23 (copied below), #27.
Stop updating the data, really.
data-raw/
srqm_data
to use data-raw/
_readme
documents
Detailed notes
qog2023
qog2019
eu_*
variablesgss7221
gss7616
(but see below)gss7616*
to match files)ess2008
ess0816
, or ess2008
and ess2016
(different codebooks, so it's fine)_merge
problemess2016
despite not in use anywhere in the course do-fileswvs9904
-- keep old version for sharia law question
ess2016
)nhis202*
recent yearnhis1020
?
Note on QOG -- offers only this as a replacement in 2023, which is not ideal:
// school life expectancy
sc wdi_fertility wef_lse, ms(i) mlab(ccodealp) || lfit wdi_fertility wef_lse, ///
name(g1, replace)
// linear fit + SSA data points only, underpredicted
sc wdi_fertility wef_lse if ht_region == 4, ms(i) mlab(ccodealp) || ///
lfit wdi_fertility wef_lse, ///
name(g2, replace)
// all regions
forv i = 1/10 {
sc wdi_fertility wef_lse if ht_region == `i', ms(i) mlab(ccodealp) || ///
lfit wdi_fertility wef_lse, ///
name("region`i'", replace)
}
week12.do
.week6.do
(which uses Round 4 only right now, despite trrtort
also existing for Round 8).ess0810
— note: in previous course versions, ess0810
contained Rounds 4 (2008) and 5 (2010)ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/Program_Code/NHIS/2019/
ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/Program_Code/NHIS/2018/
Additional things to consider:
I like the initial "acronym + year" convention, but it produces strange names for multiple-year survey datasets:
ess1214
(not used) and ess0816
wvs9904
(unavoidable)nhis1017
(unavoidable, unless we use a single year, but that removes any demo of keep if year
)gss7616
(unavoidable, unless we separate the years)Is it still a good idea to do that for e.g. ESS? Probably not, esp. if we need to limit datasets at 2,048 variables for Stata/IC.
keep if year
.Both WVS and ESS are used to demo keep if inlist(country, …)
, the other subset we want to show.
It would make a lot of sense to have more datasets for the students to use than those used in the do-files.
Currently, the do-files are selective anyway: we provide ESS 2016 (Round 8) but do not use the data, even though the dependent variable also exists in that round.
See #21 for the equivalent issue with the QOG dataset.
Digging in tweets and bookmarks… My own scheme-burd
seems to have a few issues in recent versions of Stata, so at least update it, or switch to one of those below.
https://github.com/mdroste/stata-scheme-modern#screenshots
Seems most promising. Would ideally like to support BuRd diverging colours.
ssc install g538schemes, replace all
https://danbischof.com/2017/09/05/a-final-stata-gift-538-schemes/
Via @rivelino22.
library(tidyverse)
d <- haven::read_dta('/Users/fr/Documents/Teaching/SRQM/data/qog2019.dta')
tibble(
var = names(d),
# data sources
src = str_extract(names(d), ".*?_"),
n = apply(d, 2, function(x) sum(!is.na(x)))
) %>%
group_by(src) %>%
summarise(n_vars = n(), min_N = min(n), max_N = max(n)) %>%
arrange(min_N) %>%
# arbitrary threshold at N = 50
filter(!is.na(src), min_N < 50) %>%
print(n = 100)
PSI, EU, OECD, WWBI and a few others are particularly at fault:
# A tibble: 28 x 5
src n_vars min_N med_N max_N
<chr> <int> <int> <dbl> <int>
1 psi_ 6 1 10.5 20
2 mad_ 4 15 29 163
3 eu_ 277 16 34 48
4 une_ 47 16 146 193
5 wwbi_ 38 17 41 62
6 oecd_ 281 19 37 44
7 wdi_ 278 19 156 192
8 dev_ 4 20 20 20
9 dpi_ 70 26 160. 175
10 bs_ 8 28 28 28
11 ess_ 9 28 28 28
12 ideavt_ 6 28 107 180
13 wel_ 36 29 32 189
14 wvs_ 42 29 34 34
15 aid_ 6 31 139 139
16 cses_ 2 31 31.5 32
17 gol_ 20 33 127 129
18 wiid_ 18 34 35 35
19 ucdp_ 2 35 70 105
20 cpds_ 49 36 36 36
21 h_ 11 37 165 185
22 lis_ 23 37 37 37
23 r_ 5 40 98 144
24 sgi_ 29 41 41 41
25 top_ 2 41 41 41
26 nelda_ 10 44 45 45
27 vi_ 13 45 48 50
28 qs_ 9 47 112 115
Not a bug, but leads students to build designs with low sample sizes.
https://www.trentonmize.com/teaching/cda
… and possibly others at the same location, e.g.
https://www.trentonmize.com/teaching/surveys
Also, basic Stata guide there:
https://drive.google.com/file/d/1wX0bXu7WOb3OW9eAyTCYTccsQLdGU2bF/view
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.