Code Monkey home page Code Monkey logo

belfiore-search's Introduction

belfiore-search


  _            _   __  _                                                      _      
 | |          | | / _|(_)                                                    | |    
 | |__    ___ | || |_  _   ___   _ __   ___     ___   ___   __ _  _ __   ___ | |__   
 | '_ \  / _ \| ||  _|| | / _ \ | '__| / _ \   / __| / _ \ / _` || '__| / __|| '_ \  
 | |_) ||  __/| || |  | || (_) || |   |  __/   \__ \|  __/| (_| || |   | (__ | | | | 
 |_.__/  \___||_||_|  |_| \___/ |_|    \___|   |___/ \___| \__,_||_|    \___||_| |_| 
🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌻🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸🌸
🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩 

This project is aimed at the ingestion of a set of records from a CSV file into a modern lightweight full-text search engine, Orama, in order to evaluate its performance and perform useful searches at the same time.

Introduction

Every Italian citizen has a unique ID, called FISCAL CODE. It is often used for fiscal / health insurance purposes. This code is generated through an algorithm. Apart from your personal details, the algorithm requires an external CODE (commonly called Belfiore code) that will be part of the suffix of the final algorithm's output.
Every city in Italy has its code and you can get the updated list at this url. As a user, you can easily remember the name of the city where you were born, but not the code. This API is for instant real-time search and retrieval of the Belfiore code (even if you get much more, as the entire archive of cities of Italy is ingested), starting from substrings of a city name, province, etc.

The project uses:

Setup of the project

  1. I like to use nodeenv to manage my node.js projects, so:
  2. Create and activate a node.js virtual env with the LTS version of node.js (currently node:18.16.1):
cd belfiore-search  
nodeenv  -n lts .nvenv  
source .nvenv/bin/activate  
  1. to build the project locally, execute in order:
npm install
npm run build  
npm run vitest  
npm run setup

The last command will start the ingestion of documents from the CSV into the db. Once completed (the operation lasts approx 2 secs after updating Orama to the latest version), the database is persisted in ./comuni.msp

  1. now you can run a local http server and perform full-text searches running npm run start (or npm run restart as preferred way as it executes some checks on the latest release of the original dataset)

  2. WARNING! CORS is deliberately not enforced :)

Samples

GET http://localhost:3000/

{
  "message": "You search with the following params. If you pass both of them, the second is used as a filter on the results",
  "extra": " Optional params 'limit' (default is 10) and 'offset' (default is 0)",
  "params": [
    "DENOMINAZIONE_IT",
    "SIGLAPROVINCIA"
  ]
}

GET http://localhost:3000/?DENOMINAZIONE_IT=ARICC&SIGLAPROVINCIA=RM

{
  "elapsed": {
    "raw": 3515375,
    "formatted": "3ms"
  },
  "hits": [
    {
      "id": "37899817-5378",
      "score": 5.860792320596953,
      "document": {
        "ο»ΏID": "438",
        "DATAISTITUZIONE": "1935-03-07",
        "DATACESSAZIONE": "9999-12-31",
        "CODISTAT": "058009",
        "CODCATASTALE": "A401",
        "DENOMINAZIONE_IT": "ARICCIA",
        "DENOMTRASLITTERATA": "ARICCIA",
        "ALTRADENOMINAZIONE": "",
        "ALTRADENOMTRASLITTERATA": "",
        "ID_PROVINCIA": "58",
        "IDPROVINCIAISTAT": "058",
        "IDREGIONE": "12",
        "IDPREFETTURA": "RM",
        "STATO": "A",
        "SIGLAPROVINCIA": "RM",
        "FONTE": "",
        "DATAULTIMOAGG": "2016-06-17",
        "COD_DENOM": ""
      }
    },
    {
      "id": "37899817-5371",
      "score": 5.860792320596953,
      "document": {
        "ο»ΏID": "17567",
        "DATAISTITUZIONE": "1871-01-15",
        "DATACESSAZIONE": "1935-03-06",
        "CODISTAT": "058009",
        "CODCATASTALE": "A401",
        "DENOMINAZIONE_IT": "ARICCIA",
        "DENOMTRASLITTERATA": "ARICCIA",
        "ALTRADENOMINAZIONE": "",
        "ALTRADENOMTRASLITTERATA": "",
        "ID_PROVINCIA": "58",
        "IDPROVINCIAISTAT": "058",
        "IDREGIONE": "12",
        "IDPREFETTURA": "",
        "STATO": "C",
        "SIGLAPROVINCIA": "RM",
        "FONTE": "",
        "DATAULTIMOAGG": "2016-06-17",
        "COD_DENOM": ""
      }
    }
  ],
  "count": 2
}
  • limit and offset params come from Orama and are used to achieve the pagination of results. The default values if they are not passed are 10 and 0 respectively.

Info about the dataset

Meaning of the fields, in italian.

A curiosity

If you search a city by its name, using the param DENOMINAZIONE_IT for example, you may obtain an "historycal view of the city", as can exist similar documents with different intervals, in the past:

...  
  "DATAISTITUZIONE": "1871-01-15",  
  "DATACESSAZIONE": "1935-03-06",  
...  

or not (DATACESSAZIONE is in the future, so the document represents the current state of the city):

...  
  "DATAISTITUZIONE": "1937-10-26",
  "DATACESSAZIONE": "9999-12-31", 
...  

Results are DESC sorted by DATACESSAZIONE.

Dockerizing

You will find the latest docker image at the Docker Hub 🐳 . If you want to build it locally, you can run:

Build image (multiplatform linux)

docker buildx build --platform=linux/amd64,linux/arm64 . -t giufus/belfiore-search

⚠ WARNING: a note for me

No output specified with docker-container driver. Build result will only remain in the build cache. To push result image into registry use --push or to load image into docker use --load.

Build image (local)

docker build . -t giufus/belfiore-search

Run container

docker run -p 3000:3000 -d giufus/belfiore-search

To do

  • implementation of a performance test
  • add details about CF algorithm
  • add CI/CD (and hopefully a free hosting service)

belfiore-search's People

Contributors

gfusacchia avatar giufus avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.