Code Monkey home page Code Monkey logo

congress's Introduction

congress: A Tool for CongressData

congress is a package designed to allow a user with only basic knowledge of R interact with CongressData, a dataset with nearly 800 variables that compiles information about all US congressional districts across 1789-2021. Specifically, the dataset tracks characteristics of congressional districts, the members of congress themselves, and the behavior of those members in policymaking. Users can find variables related to demographics, politics, and policy; subset the data across multiple dimensions; create custom aggregations of the dataset; and access citations in both plain text and BibTeX for every variable. An associated web application is available here and the data-only package is here.

Read the Codebook and Manual

The CongressData codebook is available here.

The package’s manual contains information regarding each function and its arguments. It is available here: congress manual.

Installing this Package and the Data-only Companion Package

congress is a functional package that interacts with CongressData. We maintain the dataset in another package called congressData. You can use the data-only package if you simply want to access the data. Install them from GitHub like so:

# use the devtools library to download the package from GitHub
library(devtools)

# this will download congressData as well (NOTE: installation can take several minutes)
install_github("ippsr/congress")

# if there are issues or you only want to download congressData (NOTE: installation can take several minutes)
install_github("ippsr/congressData")

Finding Variables

get_var_info: Retrieve information regarding variables in CongressData and identify variables of interest with get_var_info. The function allows you to search to codebook to find the years each variable is observed in the data; a short and long description of each variable; and the source and citation/s for each variable. Citations are available in both bibtex and plain text. Use the function to search for broad terms like ‘tax’ with the related_to argument and/or partial-match variable names with var_names.

suppressMessages(library(dplyr))
library(congress)
#> Please cite:
#> Grossmann, M., Lucas, C., McCrain, J, & Ostrander, I. (2022). CongressData.
#> East Lansing, MI: Institute for Public Policy and Social Research (IPPSR).
#> 
#> You are using the version of the Congress Data stored in your local copy of congressData. Running `congressData::get_congress_version()` will print your local version number.

# variables related to health insurance
h_ins_cong <- get_var_info(related_to = "health insurance")

cat("There are",nrow(h_ins_cong),"variables related to health insurance in CongressData")
#> There are 41 variables related to health insurance in CongressData

head(h_ins_cong$variable)
#> [1] "percent_under18_healthins" "percent_private_under18"  
#> [3] "percent_public_under18"    "percent_privpub_under18"  
#> [5] "percent_pop18_34"          "percent_private_18_34"

# variables with 'under18' in their name
under18_cong <- get_var_info(var_names = "under18")

head(under18_cong$variable)
#> [1] "percent_under18"           "percent_under18_healthins"
#> [3] "percent_private_under18"   "percent_public_under18"   
#> [5] "percent_privpub_under18"   "under18"

get_var_info returns the following information to simplify using CongressData:

  • variable: Variable name
  • year: The precise years the variable is observed
  • short_desc: A short description of the variable
  • long_desc: A long description of the variable
  • source: The sources of the data
  • category: the variable’s category (not all are coded)
  • plaintext_cite (1-4): Plain text citations for the data
  • bibtext_cite (1-4): BibTeX citation for the data

Accessing Member-Year Data

get_cong_data: Access all or a part of CongressData with get_cong_data. Subset by state names with state and years with years (either a single year or a two-year vector that represents the min/max of what you want). You can also use the related_to argument to search across variable names, short/long descriptions from the codebook, and citations for non-exact matches of a supplied term. For example, searching ‘tax’ will return variables with words like ‘taxes’ and ‘taxable’ in any of those columns.

# return the entire dataset
all_the_dat <- get_cong_data()

# subset by state, topic, and years
cong_subset <- get_cong_data(states = c("Kentucky","Michigan","Pennsylvania")
                             ,related_to = "tax"
                             ,years = c(1960,1980)
                             )

Aggregate to Member-Session Data with Custom Schemes

aggregate_cong_dat: Choose how to aggregate the member-year data into member-session data across subsets (e.g. data sources) of CongressData. You can choose either Mean or Sum or First (meaning the value in the first year of the session) to aggregate the following chunks of the dataset:

  • census_nonperc_vars: Non percent Census Variables
  • census_perc_vars Percent Census Variables
  • bill_vars: Congressional Bills Project Variables
  • com_vars: Committee Assignment Variables
  • else_vars: All the Other Variables

The variable names in the resulting dataset will reflect how they were aggregated (e.g. nbills_major_topic_10 becomes nbills_major_topic_10_mean). Note: while character variables will reflect the chosen aggregation scheme (e.g. character_var_sum), they are aggregated by pasting their unique values together.

# import the data using the default mean values
agg_cong <- aggregate_cong_dat()

# choose specific aggregation methods by subgroup
cong_custom <- aggregate_cong_dat(census_nonperc_vars = "Mean"
                                  ,census_perc_vars = "Sum"
                                  ,bill_vars = "First"
                                  ,com_vars = "Mean"
                                  ,else_vars = "Sum")

# default aggs for specific states/sessions and vars related to health
cong_subset <- aggregate_cong_dat(states = c("Kentucky","Michigan","Pennsylvania")
                                  ,related_to = "health"
                                  ,sessions = c(50:55)
                                  )

Pulling Citations

get_var_info: Each variable in CongressData was collected from external sources, please use get_var_info to obtain their citations (plain text and BibTeX). We’ve made it easy to cite the source of each variable you use with the get_var_info function described above. Supply a vector of variable names to the function with the var_names function and collect the citations provided in the plain text or BibTeX columns. NOTE: Some variables have multiple citations, so do check you have them all.

# bibtex is also available
get_var_info(var_names = "com_benghazi_299") %>%
  pull(plaintext_cite)
#> [1] "Charles Stewart III and Jonathan Woon. Congressional Committee Assignments, 103rd to 114th Congresses, 1993--2017: House of Representatives, 2017.\n"

# bibtex is also available
get_var_info(var_names = "percent_bus") %>%
  pull(plaintext_cite)
#> [1] "U.S. Census Bureau. (2022). 2009-2019 American Community Survey 1-year Estimates. Retrieved from the Census Bureau Data API."

Contact

For questions about congressData please reach out to Benjamin Yoel ([email protected]).

Citation

In addition to citing each variable’s source, we ask that you cite CongressData if use this package or the dataset. A recommended citation is below for congressData or for the specific version of the datase.

Grossmann, M., Lucas, C., McCrain, J, & Ostrander, I. (2022). CongressData. East Lansing, MI: Institute for Public Policy and Social Research (IPPSR)

Rapanos, A., & Yoel, B. (2024). CongressData v1.4. East Lansing, MI: Institute for Public Policy and Social Research (IPPSR)

congress's People

Contributors

ashtaanrapanos avatar benyoel avatar caleblucas avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.