Code Monkey home page Code Monkey logo

h1b_visa_eda's Introduction

H-1B Visa Petitions Exploratory Data Analysis

The H-1B is an employment-based, non-immigrant visa category for temporary foreign workers in the United States. Every year, the US immigration department receives over 200,000 petitions and selects 85,000 applications through a random process. The application data is available for public access to perform in-depth longitudinal research and analysis. This data provides key insights into the prevailing wages for job titles being sponsored by US employers under H1-B visa category. In particular, I utilize the 2011-2016 H-1B petition disclosure data to analyze the employers with the most applications, data science related job positions and relationship between salaries offered and cost of living index.

Data Set Source

The Office of Foreign Labor Certification (OFLC) generates program data that is useful information about the immigration programs including the H1-B visa. The disclosure data updated annually is available at https://www.foreignlaborcert.doleta.gov/performancedata.cfm

  • Click on Disclosure Data tab
  • Go to Section LCA Programs (H-1B, H-1B1, E-3)
  • You will find data from 2008 onwards.

Requirements

  • R
  • R Studio
  • Packages: readxl, dplyr, hashmap, ggplot2, ggmap, ggrepel

Use install.packages("package_name") to install new packages in R.

Files

  • data_processing.md: Markdown document illustrating the key data transformations on the raw dataset.
  • data_analysis.md: Markdown document illustrating with code for plots and corresponding data analysis.
  • helpers.R: helper functions used mainly for data analysis
  • spell_correcter.R: A suite of functions for performing spell correction in a given vector using the frequencies of occurrence of different elements in the vector.
  • coli/: Python Scrapy code directory for scraping cost of living plus rent index. The spider crawl file is the main file describing how the data should be scraped.

Shiny app

I extended this project to build a Shiny app based on the transformed data set.

Blogs

Please read my blogs for key data insights and more details:

Kaggle

I have released the transformed dataset on Kaggle for public use under CC BY-NC-SA 4.0 License.

Acknowledgements

License

Open sourced under the MIT License.

h1b_visa_eda's People

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.