Code Monkey home page Code Monkey logo

filterbyfilter's Introduction

Hello! I am currently a PhD student @ImperialCollege. My research lies at the intersection of machine learning, physics, and differential geometry. The goal of this research is to leverage reduced-order models for high-dimensional, complex physical phenomena - utilising tools from differential geometry and physics to operate directly on low-dimensional manifolds.

Previously, I worked as a machine learning research intern @NasaJPL, @ESA, @MindFoundry, @Dyson, and @BAESystems.

filterbyfilter's People

Contributors

danielkelshaw avatar

Watchers

 avatar  avatar

filterbyfilter's Issues

Extract Information

Once a Coffee record has been defined, as per #5, a function is needed to extract the relevant information from a given ProductPage. This function needs to be bespoke for each shop.

Printing Coffee Records

At the moment a coffee record can be constructed - it would be useful if this could be printed to stdout to show the results of the web-scraping.

Setup Repository

Provide basic dune file to allow for build / execute of .ml files.
Add a .gitignore to prevent unnecessary files being added.

Add a line in the README.md to explain that this is a project for me to learn OCaml - it will be improved as time goes on.

Read / Write Coffee from / to JSON

At the moment the coffee record can only be printed to stdout - it would be useful to write the data to file. With this it would be nice to read the contents of the file and load them into coffee records which can be operated on.

Define Coffee Record

The information that we want to extract should be consistent across different sites and should follow the same template - a record is a good way to implement this in OCaml. This should also allow for easy JSON interactions later down the line if I want to save the results to file.

Some information I may want to include:

  • Name
  • URL
  • Origin
  • Prices / Masses
  • Description
  • Process
  • Tasting Notes

SquareMileScraper -- Get Links to Relevant Products

To provide a generic case study, I shall be focusing on extracting information from SquareMileCoffee. This can be extended to a number of online shops once I have gained a better appreciation of how to do web scraping with OCaml.

The homepage of SquareMileCoffee lists a number of products - extracting the relevant links will allow us to scrape individual product pages to extract the information that we need. For now this can be a: Uri.t list but perhaps in the future it would be wise to define a custom type.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.