git clone https://github.com/tarunmeena6846/Webscraper.git
npm install
To execute the scraping script, run the following command:
npx tsc && npm start
The structured data for each YC company is stored in the out/scraped.json
file.
inputs/companies.csv
: CSV file containing the list of YC companies.src/challenge.ts
: TypeScript code for the scraping tool.out/scraped.json
: Output JSON file containing structured data about YC companies.node_modules/
: Dependencies directory.package.json
,package-lock.json
: NPM package configuration files.
crawlee
: Library for web scraping.cheerio
: HTML parsing library.fast-csv
orpapaparse
: CSV parsing library.fs/promises
,fs-extra
: File system utilities.
The scraper outputs a properly structured JSON file at out/scraped.json
. It handles various YC company pages and extracts relevant information accurately.