Code Monkey home page Code Monkey logo

userlane-challenge's Introduction

Web Crawler Command - Userlane Coding Challenge

Simple web crawler command, presented as a solution for the Userlane's Coding Challenge. I've spent ~4hs. working on this challenge.

Setup

  • Install required dependencies through npm install
  • Run npm run crawler to show the usage hints
  • Run npm run test to run the unit tests

Dependencies

  • Typescrypt (ts-node & ts-jest)
  • command-line-args & command-line-usage (CLI args and self-documenting support)
  • JSDOM (HTML parsing)
  • validate (CLI args validation)
  • Jest (unit tests)

Localenv testing

I've provided an additional static-html-demo folder which contains a static HTML website that can be used to easily test this command. This is included as a separate project. By running npm install and npm run start, this app will spin up an Express server that hosts a static website in localhost:9000. Then we can run npm run crawler -- -u http://localhost:9000 back in the main project to see it in action.

Improvements and pending to-do's

Having more time, I'd like to address the following:

  • Better error handling for non-HTML URLs (currently, this crashes the app)
  • Better error handling and retry handling for HTTP 429 errors
  • Better support for relative links (currently, it only supports absolute links)
  • Refactor the scripts/crawler.ts module, separating validation logic from the user hint support into different files
  • Provide a -o [file] additional option to allow to dump the output into a file. Currently, this is limited to doing npm run crawler -- -u testing.com > myFile.json

userlane-challenge's People

Contributors

perlucas avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.