Code Monkey home page Code Monkey logo

product-mapping-engine's People

Contributors

kackamac avatar equidem avatar

Stargazers

Maruthi Basava avatar

Watchers

 avatar  avatar

Forkers

natashalekh

product-mapping-engine's Issues

Devise smarter filters for candidate product pairs

Currently, we basically only use price as a filter. This however won't work in case when the price is not there, and moreover, will still generate too many candidate product pairs to go through. We need to create smarter filters before we deploy the system in practice.

Prepare the engine for advanced filters

More advanced filters (e.g. filtering according to the descriptive words) require preprocessing to be done before they can be applied. Therefore, preprocessing has to be done before that portion of filtering. However, to make sure we are not preprocessing products that cannot possibly be paired with anything, the most basic filters that do not require any preprocessing (e.g. the price-based filtering) should be done beforehand.
Also make sure we are not preprocessing any product multiple times, which is the case now.

Handle the possibility of missing data and shift all the similarities to <-1,1> range

Since it is possible that in some datasets, some data columns will be missing both altogether or just be missing in some rows, we need to make sure the engine can deal with such cases without failing.
To do so properly, we also need to make sure there is a difference between the available texts (for instance) not matching and the texts not being there at all. To do this, lets shift every single probability we are calculating to the <-1,1> range, with values close to -1 signifying total mismatch, values around 0 signifying either ambiguity or the data not being present and values close to 1 meaning total match.

Create unit tests

Create automated tests to verify whether new changes did not break anything else.

Devise smarter comparison of images

Currently, we are only using image hashing to compare images. However, it only cares about the visual features, but not about the semantics of what is depicted on the image.

Implement smarter comparison of parameter names

Currently, parameter names are simply compared directly, but that will almost never work - e.g. "RAM memory" vs "Operational memory (RAM)". We need to devise a smarter way to compare them.

Deal with products with no price

Currently, our filtering requires the products to have a price specified, but that isn't necessarily true, it might not be sold yet for instance, but we still want to be able to work with it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.