Code Monkey home page Code Monkey logo

persister's People

Contributors

creekorful avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

persister's Issues

Do not delete duplicate content

Currently persister prevent duplicate content by deleting any duplicate before adding new content.

To support the up-coming website snapshot feature this behavior has to change.

This check has to be completely removed as it's not the persister responsibility to determinate if something has to be saved but It's the scheduler one.

Implement filesystem resource storage

Instead of storing content in a MongoDB the persister will now store the raw resources directly on the filesystem.

The storage format will be like this:

resource-url/64bit-timestamp

For example the following resource:

http://login.google.com/secure/createAccount.html

Will be store like this on the disk:

login.google.com/secure/createAccount.html/1570788418

Interaction with Elasticsearch

Each time content is received by the persister it will be persisted on disk once #7 is merged, but that's not enough:

To increase performances when doing a text search the content will be structured by the persister and stored into an Elasticsearch instance.

This instance will be used by the API when performing a text search

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.