ajdamico / lodown Goto Github PK
View Code? Open in Web Editor NEWlocally download and prepare publicly-available microdata
locally download and prepare publicly-available microdata
# only download 2011
meps_cat <- get_catalog( "meps" , output_dir = "C:/My Directory/MEPS" )
lodown( "meps" , subset( meps_cat , year == 2011 ) )
i can't get to this ftp site from either a us or an ivory coast ip address :/ if it's only administrative data, is it big enough to require monetdb? (2 million+ records?) thanks
@ajdamico, when I run
dbGetQuery( mdb_src$con , "SELECT RIGHT( cast( dtobito as text ) , 4 ) as ano , COUNT(*) from geral_cid10 GROUP BY ano order by ano" )
it returns:
ano L5
1 +004 1
2 +006 615147
3 +007 1467989
4 1996 908883
5 1997 903516
6 1998 931895
7 1999 938658
8 2000 496
9 2001 961492
10 2002 982807
11 2003 1002340
12 2004 1024073
13 2005 1006827
14 2006 1031691
15 2007 1047824
16 2008 1077007
17 2009 1103088
18 2011 1170498
19 2012 1181166
20 2013 1210474
21 2014 1227039
The first 3 lines and the year 2000 are wrong, as you can see in http://tabnet.datasus.gov.br/cgi/tabcgi.exe?sim/cnv/obt10uf.def, when you select Linha: Ano do óbito and all years.
can you tell me why?
just in an asdfree book, or elsewhere?
* checking loading without being on the library search path ... OK
* checking dependencies in R code ... NOTE
There are ::: calls to the package's namespace in its code. A package
almost never needs to use ::: for its own objects:
'recursive_ftp_scrape'
* checking S3 generic/method consistency ... OK
* checking replacement functions ... OK
* checking foreign function calls ... OK
* checking R code for possible problems ... NOTE
get_catalog_mtps: no visible binding for global variable 'URLdecode'
lodown_mtps: no visible global function definition for 'read.csv2'
Undefined global functions or variables:
URLdecode read.csv2
Consider adding
importFrom("utils", "URLdecode", "read.csv2")
to your NAMESPACE file.
* checking Rd files ... OK
* checking Rd metadata ... OK
* checking Rd line widths ... OK
* checking Rd cross-references ... OK
* checking for missing documentation entries ... OK
* checking for code/documentation mismatches ... OK
* checking Rd \usage sections ... OK
* checking Rd contents ... OK
* checking for unstated dependencies in examples ... OK
* checking examples ... OK
* DONE
Status: 2 NOTEs
See
'C:/Users/anthonyd/Documents/GitHub/lodown.Rcheck/00check.log'
for details.
checking dependencies in R code ... NOTE
There are ::: calls to the package's namespace in its code. A package
almost never needs to use ::: for its own objects:
'recursive_ftp_scrape'
checking R code for possible problems ... NOTE
get_catalog_mtps: no visible binding for global variable 'URLdecode'
lodown_mtps: no visible global function definition for 'read.csv2'
Undefined global functions or variables:
URLdecode read.csv2
Consider adding
importFrom("utils", "URLdecode", "read.csv2")
to your NAMESPACE file.
R CMD check results
0 errors | 0 warnings | 2 notes
R CMD check succeeded
it should be another column in the data frame that get_catalog_* makes. that way, it defaults to something nicely, but users can change it the same way they can change other output names. you will also need to add this to the dir.create() line within lodown.R so the directory gets built. make sense? thanks
capture the error and add the note about how to increase disk paging
verify platform independent
https://www.cdc.gov/nchs/slaits/cshcn.htm
looks very similar in structure to nsch
when you add datasets, make sure you also add lines in lodown.R and reoxygenize. each new function pair needs one line in the first block and two lines in the second block. i don't have this hooked up to travis so make sure you test build in rstudio.. thanks bud
this forces you to case by case ignore results. read_fwf cannot be trusted to work on future datasets
the code in lodown_sim and lodown_sinasc looks nearly identical. is there a good reason to keep them separate?
if they stay separate, please remove the redundant code and replace it with datasus-wide custom functions. see lodown_icpsr and get_catalog_icpsr for an example of how to structure this. thanks
Please feel free to assign me to the issue if you can!
Poking @joelgombin, just in case.
loop through file listings like unzipped_files
and import with haven/SAScii as much as possible. too much redundant code as it is
similarly, should nrow( surveydesign ) be added?
that lots of MB will be downloaded automatically. anyone with a pay-per-GB connection should use this package with caution
unclear if/where this is worth the effort.. some current datasets are likely easy to add a design to the auto-output
probably everything on the continent, country, and regional pages, like http://download.geofabrik.de/central-america/cuba.html
@guilhermejacob i'll do this later
include some trigger warning in catalog creation to file a github issue for a manual check when some data website changes but not in a predictable way.. add to template.R as well
hi, in this commit 60388c4 i removed the db_tablename
from the catalog because it is not used. should it be? is the escola
table the main table here, or should there be some other merge happening automatically?
in the same commit, i added the case_count column to the catalog using the current year's escola table. if that's the main table for this dataset, then i think that's what we want? what do you think?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.