This code parses through the Common Crawl database and pulls out records of job listings based on different filters. It can be run locally or scaled up to run on a cluster.
skyler-myers-db / common-crawl-analysis Goto Github PK
View Code? Open in Web Editor NEWParsing the common crawl database using Scala and Spark
License: MIT License