Code Monkey home page Code Monkey logo

csv-detective's Introduction

CSV Detective

This is a package to automatically detect column content in tabular files. The script reads either the whole file or the first few rows and performs various checks to see for each column if it matches with various content types. This is currently done through regex and string comparison.

Currently supported file types: csv, xls, xlsx, ods.

You can also directly feed the URL of a remote file (from data.gouv.fr for instance).

How To ?

Install the package

You need to have python >= 3.7 installed. We recommend using a virtual environement.

pip install csv-detective

Detect some columns

Say you have a tabular file located at file_path. This is how you could use csv_detective:

# Import the csv_detective package
from csv_detective.explore_csv import routine
import os # for this example only

# Replace by your file path
file_path = os.path.join('.', 'tests', 'code_postaux_v201410.csv')

# Open your file and run csv_detective
inspection_results = routine(
  file_path, # or file URL
  num_rows=-1, # Value -1 will analyze all lines of your file, you can change with the number of lines you wish to analyze
  save_results=False, # Default False. If True, it will save result output into the same directory as the analyzed file, using the same name as your file and .json extension
  output_profile=True, # Default False. If True, returned dict will contain a property "profile" indicating profile (min, max, mean, tops...) of every column of you csv
  output_schema=True, # Default False. If True, returned dict will contain a property "schema" containing basic [tableschema](https://specs.frictionlessdata.io/table-schema/) of your file. This can be use to validate structure of other csv which should match same structure. 
)

So What Do You Get ?

Output

The program creates a Python dictionnary with the following information :

{
    "encoding": "windows-1252", 			        # Encoding detected
    "separator": ";",						# Detected CSV separator
    "header_row_idx": 0					# Index of the header (aka how many lines to skip to get it)
    "headers": ['code commune INSEE', 'nom de la commune', 'code postal', "libellé d'acheminement"], # Header row
    "total_lines": 42,					# Number of rows (excluding header)
    "nb_duplicates": 0,					# Number of exact duplicates in rows
    "heading_columns": 0,					# Number of heading columns
    "trailing_columns": 0,					# Number of trailing columns
    "categorical": ['Code commune']         # Columns that contain less than 25 different values (arbitrary threshold)
    "columns": { # Property that conciliate detection from labels and content of a column
        "Code commune": {
            "python_type": "string",
            "format": "code_commune_insee",
            "score": 1.0
        },
    },
    "columns_labels": { # Property that return detection from header columns
        "Code commune": {
            "python_type": "string",
            "format": "code_commune_insee",
            "score": 0.5
        },
    },
    "columns_fields": { # Property that return detection from content columns
        "Code commune": {
            "python_type": "string",
            "format": "code_commune_insee",
            "score": 1.25
        },
    },
    "profile": {
      "column_name" : {
        "min": 1, # only int and float
        "max: 12, # only int and float
        "mean": 5, # only int and float
        "std": 5, # only int and float
        "tops": [  # 10 most frequent values in the column
          "xxx",
          "yyy",
          "..."
        ],
        "nb_distinct": 67, # number of distinct values
        "nb_missing_values": 102 # number of empty cells in the column
      }
    },
    "schema": { # TableSchema of the file if `output_schema` was set to `True`
      "$schema": "https://frictionlessdata.io/schemas/table-schema.json",
      "name": "",
      "title": "",
      "description": "",
      "countryCode": "FR",
      "homepage": "",
      "path": "https://github.com/datagouv/csv-detective",
      "resources": [],
      "sources": [
        {"title": "Spécification Tableschema", "path": "https://specs.frictionlessdata.io/table-schema"},
        {"title": "schema.data.gouv.fr", "path": "https://schema.data.gouv.fr"}
      ],
      "created": "2023-02-10",
      "lastModified": "2023-02-10",
      "version": "0.0.1",
      "contributors": [
        {"title": "Table schema bot", "email": "[email protected]", "organisation": "data.gouv.fr", "role": "author"}
      ],
      "fields": [
        {
          "name": "Code commune",
          "description": "Le code INSEE de la commune",
          "example": "23150",
          "type": "string",
          "formatFR": "code_commune_insee",
          "constraints": {
            "required": False,
            "pattern": "^([013-9]\\d|2[AB1-9])\\d{3}$",
          }
        }
      ]
    }
}

The output slightly differs depending on the file format:

  • csv files have encoding and separator
  • xls, xls, ods files have engine and sheet_name

What Formats Can Be Detected

Includes :

  • Communes, Départements, Régions, Pays
  • Codes Communes, Codes Postaux, Codes Departement, ISO Pays
  • Codes CSP, Description CSP, SIREN
  • E-Mails, URLs, Téléphones FR
  • Years, Dates, Jours de la Semaine FR
  • UUIDs, Mongo ObjectIds

Format detection and scoring

For each column, 3 scores are computed for each format, the higher the score, the more likely the format:

  • the field score based on the values contained in the column (0.0 to 1.0).
  • the label score based on the header of the column (0.0 to 1.0).
  • the overall score, computed as field_score * (1 + label_score/2) (0.0 to 1.5).

The overall score computation aims to give more weight to the column contents while still leveraging the column header.

limited_output - Select the output mode you want for json report

This option allows you to select the output mode you want to pass. To do so, you have to pass a limited_output argument to the routine function. This variable has two possible values:

  • limited_output defaults to True which means report will contain only detected column formats based on a pre-selected threshold proportion in data. Report result is the standard output (an example can be found above in 'Output' section). Only the format with highest score is present in the output.
  • limited_output=False means report will contain a full list of all column format possibilities for each input data columns with a value associated which match to the proportion of found column type in data. With this report, user can adjust its rules of detection based on a specific threshold and has a better vision of quality detection for each columns. Results could also be easily transformed into a dataframe (columns types in column / column names in rows) for analysis and test.

Improvement suggestions

  • Smarter refactors
  • Improve performances
  • Improve testing structure to make modular searches (search only for cities for example)
  • Test other ways to load and process data (pandas alternatives)
  • Make differentiated pre-processing (no lower case for country codes for example)
  • Give a sense of probability in the prediction
  • Add more and more detection modules...

Related ideas:

  • store column names to make a learning model based on column names for (possible pre-screen)
  • normalising data based on column prediction
  • entity resolution (good luck...)

Why Could This Be of Any Use ?

Organisations such as data.gouv.fr aggregate huge amounts of un-normalised data. Performing cross-examination across datasets can be difficult. This tool could help enrich the datasets metadata and facilitate linking them together.

udata-hydra is a crawler that checks, analyzes (using csv-detective) and APIfies all tabular files from data.gouv.fr.

An early version of this analysis of all resources on data.gouv.fr can be found here.

Release

The release process uses bumpr.

pip install -r requirements-build.txt

Process

  1. bumpr will handle bumping the version according to your command (patch, minor, major)
  2. It will update the CHANGELOG according to the new version being published
  3. It will push a tag with the given version to github
  4. CircleCI will pickup this tag, build the package and publish it to pypi
  5. bumpr will have everything ready for the next version (version, changelog...)

Dry run

bumpr -d -v

Release

This will release a patch version:

bumpr -v

See bumpr options for minor and major:

$ bumpr -h
usage: bumpr [-h] [--version] [-v] [-c CONFIG] [-d] [-st] [-b | -pr] [-M] [-m] [-p]
             [-s SUFFIX] [-u] [-pM] [-pm] [-pp] [-ps PREPARE_SUFFIX] [-pu]
             [--vcs {git,hg}] [-nc] [-P] [-nP]
             [file] [files ...]

[...]

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  -v, --verbose         Verbose output
  -c CONFIG, --config CONFIG
                        Specify a configuration file
  -d, --dryrun          Do not write anything and display a diff
  -st, --skip-tests     Skip tests
  -b, --bump            Only perform the bump
  -pr, --prepare        Only perform the prepare

bump:
  -M, --major           Bump major version
  -m, --minor           Bump minor version
  -p, --patch           Bump patch version
  -s SUFFIX, --suffix SUFFIX
                        Set suffix
  -u, --unsuffix        Unset suffix

[...]

csv-detective's People

Contributors

pierlou avatar geoffreyaldebert avatar leobouloc avatar cquest avatar sixtedemaupeou avatar rob192 avatar abulte avatar vincentetalab avatar alexiseidelman avatar maudetes avatar anthonyauffret avatar

Stargazers

Maxwell Morais avatar Pierre Camilleri avatar Alexandre Ubaldo avatar Tokarev Igor avatar Marc G avatar Adrien D. avatar Paul Déchorgnat avatar  avatar  avatar PG avatar Cedric Rossi avatar  avatar Periklis Papanikolaou avatar RaphaelleK avatar Julien avatar mathieu rajerison avatar Suraj Nath avatar Rey F. Diaz avatar  avatar Karolína Bzdušek avatar Romuald avatar ~Way of the Wug avatar Vladimiro Bellini avatar Romain Lesur avatar Chia Berry avatar Kaitlin Maciejewski avatar  avatar Denis Roussel avatar Ana Paula Krelling avatar Ivan Savov avatar Vasant Marur avatar Kevin McElwee avatar roll avatar  avatar Julien Bouquillon avatar David Przybilla avatar tam kien duong avatar Rodion Popov avatar Johan Richer avatar Thomas Gratier avatar ufukhurriyet avatar Paul-Antoine avatar

Watchers

Thomas Gratier avatar James Cloos avatar Jean-Marie Arsac avatar Thibaud Dauce avatar Pierre Pezziardi avatar  avatar Paul-Antoine avatar  avatar  avatar RaphaelleK avatar  avatar  avatar

csv-detective's Issues

Recode all tests

Tests are very dependant from previous versions and not working fine with actual one.
==> Need to code clean test for repo.

json as python_type?

While not a primary type (same as #43), it would be nice to know that a column contains some json.

Performance issue with csv-detective

When applying csv-detective routine (with num_rows=-1) on the datasets catalog (~100Mo), the global amount of time is of ~160 seconds.

Majority of this time comes from Testing columns to a great extent (~96%).

Verbose logs in detail
INFO:root:Detecting encoding
INFO:root:Detected encoding: "UTF-8" in 0.213s (confidence: 99%)
INFO:root:Detecting separator
INFO:root:Detected separator: ";" in 0.0s
INFO:root:Detecting headers
INFO:root:Detected headers in 0.0s
INFO:root:Detecting heading columns
INFO:root:No heading column detected in 0.0s
INFO:root:Detecting trailing columns
INFO:root:No trailing column detected in 0.0s
INFO:root:Parsing table
WARNING:root:Table parsed successfully in 2.613s
INFO:root:Detecting categorical columns
INFO:root:Detected 6 categorical columns out of 30 in 0.658s

INFO:root:Testing columns to get types
CRITICAL:root:  - Done with type "date" in 21.878s (1/47)
INFO:root:      - Done with type "year" in 0.305s (2/47)
INFO:root:      - Done with type "email" in 0.389s (3/47)
INFO:root:      - Done with type "mongo_object_id" in 0.418s (4/47)
INFO:root:      - Done with type "uuid" in 0.41s (5/47)
INFO:root:      - Done with type "url" in 0.335s (6/47)
INFO:root:      - Done with type "iso_country_code_alpha2" in 0.308s (7/47)
INFO:root:      - Done with type "iso_country_code_alpha3" in 0.35s (8/47)
INFO:root:      - Done with type "iso_country_code_numeric" in 0.324s (9/47)
INFO:root:      - Done with type "jour_de_la_semaine" in 0.353s (10/47)
INFO:root:      - Done with type "csp_insee" in 0.33s (11/47)
INFO:root:      - Done with type "tel_fr" in 0.357s (12/47)
INFO:root:      - Done with type "siren" in 0.348s (13/47)
INFO:root:      - Done with type "code_csp_insee" in 0.313s (14/47)
INFO:root:      - Done with type "sexe" in 0.286s (15/47)
CRITICAL:root:  - Done with type "pays" in 17.903s (16/47)
INFO:root:      - Done with type "code_departement" in 0.407s (17/47)
CRITICAL:root:  - Done with type "adresse" in 18.212s (18/47)
INFO:root:      - Done with type "code_commune_insee" in 0.363s (19/47)
CRITICAL:root:  - Done with type "commune" in 20.625s (20/47)
INFO:root:      - Done with type "region" in 0.647s (21/47)
INFO:root:      - Done with type "code_postal" in 0.587s (22/47)
CRITICAL:root:  - Done with type "departement" in 22.128s (23/47)
INFO:root:      - Done with type "uai" in 0.495s (24/47)
INFO:root:      - Done with type "siret" in 0.569s (25/47)
CRITICAL:root:  - Done with type "latitude_wgs" in 3.878s (26/47)
CRITICAL:root:  - Done with type "longitude_wgs" in 5.02s (27/47)
INFO:root:      - Done with type "latlon_wgs" in 0.406s (28/47)
INFO:root:      - Done with type "json_geojson" in 0.579s (29/47)
INFO:root:      - Done with type "code_fantoir" in 0.438s (30/47)
INFO:root:      - Done with type "insee_ape700" in 0.388s (31/47)
INFO:root:      - Done with type "datetime_iso" in 0.451s (32/47)
INFO:root:      - Done with type "datetime_rfc822" in 0.402s (33/47)
CRITICAL:root:  - Done with type "latitude_wgs_fr_metropole" in 3.489s (34/47)
CRITICAL:root:  - Done with type "longitude_wgs_fr_metropole" in 3.126s (35/47)
INFO:root:      - Done with type "code_region" in 0.347s (36/47)
INFO:root:      - Done with type "booleen" in 0.404s (37/47)
INFO:root:      - Done with type "twitter" in 0.357s (38/47)
WARNING:root:   - Done with type "float" in 1.248s (39/47)
WARNING:root:   - Done with type "int" in 1.056s (40/47)
INFO:root:      - Done with type "json" in 0.433s (41/47)
CRITICAL:root:  - Done with type "latitude_l93" in 3.56s (42/47)
CRITICAL:root:  - Done with type "longitude_l93" in 3.231s (43/47)
CRITICAL:root:  - Done with type "insee_canton" in 19.299s (44/47)
INFO:root:      - Done with type "date_fr" in 0.347s (45/47)
INFO:root:      - Done with type "code_waldec" in 0.494s (46/47)
INFO:root:      - Done with type "code_rna" in 0.44s (47/47)
CRITICAL:root:Done testing columns in 158.045s

INFO:root:Testing labels to get types
INFO:root:      - Done with type "adresse" in 0.002s (1/48)
INFO:root:      - Done with type "code_commune_insee" in 0.002s (2/48)
INFO:root:      - Done with type "code_departement" in 0.002s (3/48)
INFO:root:      - Done with type "code_fantoir" in 0.002s (4/48)
INFO:root:      - Done with type "code_postal" in 0.003s (5/48)
INFO:root:      - Done with type "code_region" in 0.002s (6/48)
INFO:root:      - Done with type "commune" in 0.002s (7/48)
INFO:root:      - Done with type "departement" in 0.003s (8/48)
INFO:root:      - Done with type "insee_canton" in 0.003s (9/48)
INFO:root:      - Done with type "latitude_l93" in 0.003s (10/48)
INFO:root:      - Done with type "latitude_wgs_fr_metropole" in 0.003s (11/48)
INFO:root:      - Done with type "longitude_l93" in 0.003s (12/48)
INFO:root:      - Done with type "longitude_wgs_fr_metropole" in 0.002s (13/48)
INFO:root:      - Done with type "pays" in 0.003s (14/48)
INFO:root:      - Done with type "region" in 0.002s (15/48)
INFO:root:      - Done with type "code_csp_insee" in 0.002s (16/48)
INFO:root:      - Done with type "code_rna" in 0.002s (17/48)
INFO:root:      - Done with type "code_waldec" in 0.002s (18/48)
INFO:root:      - Done with type "csp_insee" in 0.002s (19/48)
INFO:root:      - Done with type "date_fr" in 0.002s (20/48)
INFO:root:      - Done with type "insee_ape700" in 0.002s (21/48)
INFO:root:      - Done with type "sexe" in 0.002s (22/48)
INFO:root:      - Done with type "siren" in 0.004s (23/48)
INFO:root:      - Done with type "siret" in 0.004s (24/48)
INFO:root:      - Done with type "tel_fr" in 0.003s (25/48)
INFO:root:      - Done with type "uai" in 0.002s (26/48)
INFO:root:      - Done with type "jour_de_la_semaine" in 0.002s (27/48)
INFO:root:      - Done with type "mois_de_annee" in 0.002s (28/48)
INFO:root:      - Done with type "iso_country_code_alpha2" in 0.003s (29/48)
INFO:root:      - Done with type "iso_country_code_alpha3" in 0.002s (30/48)
INFO:root:      - Done with type "iso_country_code_numeric" in 0.002s (31/48)
INFO:root:      - Done with type "json_geojson" in 0.002s (32/48)
INFO:root:      - Done with type "latitude_wgs" in 0.003s (33/48)
INFO:root:      - Done with type "latlon_wgs" in 0.004s (34/48)
INFO:root:      - Done with type "longitude_wgs" in 0.003s (35/48)
INFO:root:      - Done with type "booleen" in 0.002s (36/48)
INFO:root:      - Done with type "email" in 0.003s (37/48)
INFO:root:      - Done with type "mongo_object_id" in 0.003s (38/48)
INFO:root:      - Done with type "uuid" in 0.002s (39/48)
INFO:root:      - Done with type "float" in 0.002s (40/48)
INFO:root:      - Done with type "int" in 0.002s (41/48)
INFO:root:      - Done with type "money" in 0.002s (42/48)
INFO:root:      - Done with type "twitter" in 0.002s (43/48)
INFO:root:      - Done with type "url" in 0.003s (44/48)
INFO:root:      - Done with type "date" in 0.003s (45/48)
INFO:root:      - Done with type "datetime_iso" in 0.003s (46/48)
INFO:root:      - Done with type "datetime_rfc822" in 0.003s (47/48)
INFO:root:      - Done with type "year" in 0.002s (48/48)
INFO:root:Done testing labels in 0.133s
INFO:root:Creating profile
WARNING:root:Created profile in 2.445s
CRITICAL:root:Routine completed in 164.138s

This ends up timing out in hydra workers, making csv parsing fail : https://errors.data.gouv.fr/organizations/sentry/issues/129487/events/fced5f3fae964450b7d249efa9a35f96/?project=2&referrer=issue-list&statsPeriod=14d

FutureWarning when analysing CSV

When calling routine on https://static.data.gouv.fr/resources/adresses-au-format-bal-lacarre/20200319-151130/20200319
-bal-216402974.csv

/Users/alexandre/Developer/Etalab/udata-hydra/.venv/lib/python3.9/site-packages/csv_detective/utils.py:64: FutureWarning: In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead.
  return_table.loc[key] = table.apply(lambda serie: test_col_val(
/Users/alexandre/Developer/Etalab/udata-hydra/.venv/lib/python3.9/site-packages/csv_detective/utils.py:64: FutureWarning: In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead.
  return_table.loc[key] = table.apply(lambda serie: test_col_val(
/Users/alexandre/Developer/Etalab/udata-hydra/.venv/lib/python3.9/site-packages/csv_detective/utils.py:64: FutureWarning: In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead.
  return_table.loc[key] = table.apply(lambda serie: test_col_val(
/Users/alexandre/Developer/Etalab/udata-hydra/.venv/lib/python3.9/site-packages/csv_detective/utils.py:64: FutureWarning: In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead.
  return_table.loc[key] = table.apply(lambda serie: test_col_val(
/Users/alexandre/Developer/Etalab/udata-hydra/.venv/lib/python3.9/site-packages/csv_detective/utils.py:64: FutureWarning: In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead.
  return_table.loc[key] = table.apply(lambda serie: test_col_val(
/Users/alexandre/Developer/Etalab/udata-hydra/.venv/lib/python3.9/site-packages/csv_detective/utils.py:64: FutureWarning: In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead.
  return_table.loc[key] = table.apply(lambda serie: test_col_val(
/Users/alexandre/Developer/Etalab/udata-hydra/.venv/lib/python3.9/site-packages/csv_detective/utils.py:64: FutureWarning: In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead.
  return_table.loc[key] = table.apply(lambda serie: test_col_val(
/Users/alexandre/Developer/Etalab/udata-hydra/.venv/lib/python3.9/site-packages/csv_detective/utils.py:64: FutureWarning: In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead.
  return_table.loc[key] = table.apply(lambda serie: test_col_val(
/Users/alexandre/Developer/Etalab/udata-hydra/.venv/lib/python3.9/site-packages/csv_detective/utils.py:64: FutureWarning: In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead.
  return_table.loc[key] = table.apply(lambda serie: test_col_val(
/Users/alexandre/Developer/Etalab/udata-hydra/.venv/lib/python3.9/site-packages/csv_detective/utils.py:64: FutureWarning: In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead.
  return_table.loc[key] = table.apply(lambda serie: test_col_val(
/Users/alexandre/Developer/Etalab/udata-hydra/.venv/lib/python3.9/site-packages/csv_detective/utils.py:64: FutureWarning: In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead.
  return_table.loc[key] = table.apply(lambda serie: test_col_val(
/Users/alexandre/Developer/Etalab/udata-hydra/.venv/lib/python3.9/site-packages/csv_detective/utils.py:64: FutureWarning: In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead.
  return_table.loc[key] = table.apply(lambda serie: test_col_val(
/Users/alexandre/Developer/Etalab/udata-hydra/.venv/lib/python3.9/site-packages/csv_detective/utils.py:64: FutureWarning: In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead.
  return_table.loc[key] = table.apply(lambda serie: test_col_val(
/Users/alexandre/Developer/Etalab/udata-hydra/.venv/lib/python3.9/site-packages/csv_detective/utils.py:64: FutureWarning: In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead.
  return_table.loc[key] = table.apply(lambda serie: test_col_val(
/Users/alexandre/Developer/Etalab/udata-hydra/.venv/lib/python3.9/site-packages/csv_detective/utils.py:64: FutureWarning: In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead.
  return_table.loc[key] = table.apply(lambda serie: test_col_val(
/Users/alexandre/Developer/Etalab/udata-hydra/.venv/lib/python3.9/site-packages/csv_detective/utils.py:64: FutureWarning: In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead.
  return_table.loc[key] = table.apply(lambda serie: test_col_val(
/Users/alexandre/Developer/Etalab/udata-hydra/.venv/lib/python3.9/site-packages/csv_detective/utils.py:64: FutureWarning: In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead.
  return_table.loc[key] = table.apply(lambda serie: test_col_val(
/Users/alexandre/Developer/Etalab/udata-hydra/.venv/lib/python3.9/site-packages/csv_detective/utils.py:64: FutureWarning: In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead.
  return_table.loc[key] = table.apply(lambda serie: test_col_val(
/Users/alexandre/Developer/Etalab/udata-hydra/.venv/lib/python3.9/site-packages/csv_detective/utils.py:64: FutureWarning: In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead.
  return_table.loc[key] = table.apply(lambda serie: test_col_val(
/Users/alexandre/Developer/Etalab/udata-hydra/.venv/lib/python3.9/site-packages/csv_detective/utils.py:64: FutureWarning: In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead.
  return_table.loc[key] = table.apply(lambda serie: test_col_val(
/Users/alexandre/Developer/Etalab/udata-hydra/.venv/lib/python3.9/site-packages/csv_detective/utils.py:64: FutureWarning: In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead.
  return_table.loc[key] = table.apply(lambda serie: test_col_val(
/Users/alexandre/Developer/Etalab/udata-hydra/.venv/lib/python3.9/site-packages/csv_detective/utils.py:64: FutureWarning: In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead.
  return_table.loc[key] = table.apply(lambda serie: test_col_val(
/Users/alexandre/Developer/Etalab/udata-hydra/.venv/lib/python3.9/site-packages/csv_detective/utils.py:64: FutureWarning: In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead.
  return_table.loc[key] = table.apply(lambda serie: test_col_val(
/Users/alexandre/Developer/Etalab/udata-hydra/.venv/lib/python3.9/site-packages/csv_detective/utils.py:64: FutureWarning: In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead.
  return_table.loc[key] = table.apply(lambda serie: test_col_val(
/Users/alexandre/Developer/Etalab/udata-hydra/.venv/lib/python3.9/site-packages/csv_detective/utils.py:64: FutureWarning: In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead.
  return_table.loc[key] = table.apply(lambda serie: test_col_val(
/Users/alexandre/Developer/Etalab/udata-hydra/.venv/lib/python3.9/site-packages/csv_detective/utils.py:64: FutureWarning: In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead.
  return_table.loc[key] = table.apply(lambda serie: test_col_val(
/Users/alexandre/Developer/Etalab/udata-hydra/.venv/lib/python3.9/site-packages/csv_detective/utils.py:64: FutureWarning: In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead.
  return_table.loc[key] = table.apply(lambda serie: test_col_val(
/Users/alexandre/Developer/Etalab/udata-hydra/.venv/lib/python3.9/site-packages/csv_detective/utils.py:64: FutureWarning: In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead.
  return_table.loc[key] = table.apply(lambda serie: test_col_val(
/Users/alexandre/Developer/Etalab/udata-hydra/.venv/lib/python3.9/site-packages/csv_detective/utils.py:64: FutureWarning: In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead.
  return_table.loc[key] = table.apply(lambda serie: test_col_val(
/Users/alexandre/Developer/Etalab/udata-hydra/.venv/lib/python3.9/site-packages/csv_detective/utils.py:64: FutureWarning: In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead.
  return_table.loc[key] = table.apply(lambda serie: test_col_val(
/Users/alexandre/Developer/Etalab/udata-hydra/.venv/lib/python3.9/site-packages/csv_detective/utils.py:64: FutureWarning: In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead.
  return_table.loc[key] = table.apply(lambda serie: test_col_val(
/Users/alexandre/Developer/Etalab/udata-hydra/.venv/lib/python3.9/site-packages/csv_detective/utils.py:64: FutureWarning: In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead.
  return_table.loc[key] = table.apply(lambda serie: test_col_val(
/Users/alexandre/Developer/Etalab/udata-hydra/.venv/lib/python3.9/site-packages/csv_detective/utils.py:64: FutureWarning: In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead.
  return_table.loc[key] = table.apply(lambda serie: test_col_val(
/Users/alexandre/Developer/Etalab/udata-hydra/.venv/lib/python3.9/site-packages/csv_detective/utils.py:64: FutureWarning: In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead.
  return_table.loc[key] = table.apply(lambda serie: test_col_val(
/Users/alexandre/Developer/Etalab/udata-hydra/.venv/lib/python3.9/site-packages/csv_detective/utils.py:64: FutureWarning: In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead.
  return_table.loc[key] = table.apply(lambda serie: test_col_val(
/Users/alexandre/Developer/Etalab/udata-hydra/.venv/lib/python3.9/site-packages/csv_detective/utils.py:64: FutureWarning: In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead.
  return_table.loc[key] = table.apply(lambda serie: test_col_val(
/Users/alexandre/Developer/Etalab/udata-hydra/.venv/lib/python3.9/site-packages/csv_detective/utils.py:64: FutureWarning: In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead.
  return_table.loc[key] = table.apply(lambda serie: test_col_val(
/Users/alexandre/Developer/Etalab/udata-hydra/.venv/lib/python3.9/site-packages/csv_detective/utils.py:64: FutureWarning: In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead.
  return_table.loc[key] = table.apply(lambda serie: test_col_val(
/Users/alexandre/Developer/Etalab/udata-hydra/.venv/lib/python3.9/site-packages/csv_detective/utils.py:64: FutureWarning: In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead.
  return_table.loc[key] = table.apply(lambda serie: test_col_val(
/Users/alexandre/Developer/Etalab/udata-hydra/.venv/lib/python3.9/site-packages/csv_detective/utils.py:64: FutureWarning: In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead.
  return_table.loc[key] = table.apply(lambda serie: test_col_val(
/Users/alexandre/Developer/Etalab/udata-hydra/.venv/lib/python3.9/site-packages/csv_detective/utils.py:64: FutureWarning: In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead.
  return_table.loc[key] = table.apply(lambda serie: test_col_val(
/Users/alexandre/Developer/Etalab/udata-hydra/.venv/lib/python3.9/site-packages/csv_detective/utils.py:64: FutureWarning: In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead.
  return_table.loc[key] = table.apply(lambda serie: test_col_val(
/Users/alexandre/Developer/Etalab/udata-hydra/.venv/lib/python3.9/site-packages/csv_detective/utils.py:64: FutureWarning: In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead.
  return_table.loc[key] = table.apply(lambda serie: test_col_val(
/Users/alexandre/Developer/Etalab/udata-hydra/.venv/lib/python3.9/site-packages/csv_detective/utils.py:64: FutureWarning: In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead.
  return_table.loc[key] = table.apply(lambda serie: test_col_val(
/Users/alexandre/Developer/Etalab/udata-hydra/.venv/lib/python3.9/site-packages/csv_detective/utils.py:64: FutureWarning: In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead.
  return_table.loc[key] = table.apply(lambda serie: test_col_val(

Wrong type and header line detection

  • header_row_idx should be 1 (there are two duplicate header lines)
  • NUMCOM and NUMDEP should not be detected as int (Corsica forever)

http://data.caf.fr/dataset/f6411f07-10bf-4f13-b4fb-8d30ba9328b5/resource/94a182c4-19c8-4d3a-987c-187a49756365/download/txcouvglo2014.csv

[:~] $ head /Users/alexandre/Downloads/txcouvglo2014.csv
NUMCOM;NOMCOM;NUMDEP;NOMDEP;NUMEPCI;NOMEPCI;TXCOUVGLO_COM_2014;TXCOUVGLO_DEP_2014;TXCOUVGLO_EPCI_2014
NUMCOM;NOMCOM;NUMDEP;NOMDEP;NUMEPCI;NOMEPCI;TXCOUVGLO_COM_2014;TXCOUVGLO_DEP_2014;TXCOUVGLO_EPCI_2014
01001;L'ABERGEMENT-CLEMENCIAT;01;AIN;200035210;CC CHALARONNE CENTRE;41.7;65.2;72.9
01002;L'ABERGEMENT-DE-VAREY;01;AIN;240100883;CC DE LA PLAINE DE L'AIN;34.1;65.2;75.2
01004;AMBERIEU-EN-BUGEY;01;AIN;240100883;CC DE LA PLAINE DE L'AIN;61.8;65.2;75.2
01005;AMBERIEUX-EN-DOMBES;01;AIN;200042497;CC DOMBES SAONE VALLEE;73.6;65.2;77.8
01006;AMBLEON;01;AIN;200040350;CC BUGEY SUD;93.1;65.2;52.4
01007;AMBRONAY;01;AIN;240100883;CC DE LA PLAINE DE L'AIN;51.4;65.2;75.2
01008;AMBUTRIX;01;AIN;240100883;CC DE LA PLAINE DE L'AIN;92;65.2;75.2
01009;ANDERT-ET-CONDON;01;AIN;200040350;CC BUGEY SUD;34.2;65.2;52.4
[:~] $ tail /Users/alexandre/Downloads/txcouvglo2014.csv
97415;SAINT-PAUL;974;LA REUNION;249740101;CA TERRITOIRE DE LA COTE OUEST (TCO);33.2;25.8;29
97416;SAINT-PIERRE;974;LA REUNION;249740077;CA CIVIS (COMMUNAUTE INTERCOMMUNALE DES VILLES SOLIDAIRES);34.5;25.8;25.8
97417;SAINT-PHILIPPE;974;LA REUNION;249740085;CA DU SUD;16.9;25.8;18.4
97418;SAINTE-MARIE;974;LA REUNION;249740119;CA INTERCOMMUNALE DU NORD DE LA REUNION (CINOR);32;25.8;31.1
97419;SAINTE-ROSE;974;LA REUNION;249740093;CA INTERCOMMUNALE DE LA REUNION EST (CIREST);17.2;25.8;20
97420;SAINTE-SUZANNE;974;LA REUNION;249740119;CA INTERCOMMUNALE DU NORD DE LA REUNION (CINOR);28.1;25.8;31.1
97421;SALAZIE;974;LA REUNION;249740093;CA INTERCOMMUNALE DE LA REUNION EST (CIREST);17.7;25.8;20
97422;LE TAMPON;974;LA REUNION;249740085;CA DU SUD;20.3;25.8;18.4
97423;LES TROIS-BASSINS;974;LA REUNION;249740101;CA TERRITOIRE DE LA COTE OUEST (TCO);14.3;25.8;29
97424;CILAOS;974;LA REUNION;249740077;CA CIVIS (COMMUNAUTE INTERCOMMUNALE DES VILLES SOLIDAIRES);9.1;25.8;25.8
[:~] $ grep "2A" /Users/alexandre/Downloads/txcouvglo2014.csv
2A001;AFA;2A;CORSE DU SUD;242010056;CA DU PAYS AJACCIEN;32.6;35.8;27.6
2A004;AJACCIO;2A;CORSE DU SUD;242010056;CA DU PAYS AJACCIEN;29.5;35.8;27.6
2A006;ALATA;2A;CORSE DU SUD;242010056;CA DU PAYS AJACCIEN;20.8;35.8;27.6
{
   "header":[
      "NUMCOM",
      "NOMCOM",
      "NUMDEP",
      "NOMDEP",
      "NUMEPCI",
      "NOMEPCI",
      "TXCOUVGLO_COM_2014",
      "TXCOUVGLO_DEP_2014",
      "TXCOUVGLO_EPCI_2014"
   ],
   "columns":{
      "NOMCOM":{
         "score":1.0,
         "format":"commune",
         "python_type":"string"
      },
      "NOMDEP":{
         "score":1.0,
         "format":"departement",
         "python_type":"string"
      },
      "NUMCOM":{
         "score":1.0,
         "format":"int",
         "python_type":"int"
      },
      "NUMDEP":{
         "score":1.0,
         "format":"int",
         "python_type":"int"
      },
      "NOMEPCI":{
         "score":1.0,
         "format":"string",
         "python_type":"string"
      },
      "NUMEPCI":{
         "score":1.0,
         "format":"siren",
         "python_type":"string"
      },
      "TXCOUVGLO_COM_2014":{
         "score":1.0,
         "format":"float",
         "python_type":"float"
      },
      "TXCOUVGLO_DEP_2014":{
         "score":1.0,
         "format":"float",
         "python_type":"float"
      },
      "TXCOUVGLO_EPCI_2014":{
         "score":1.0,
         "format":"float",
         "python_type":"float"
      }
   },
   "formats":{
      "int":[
         "NUMCOM",
         "NUMDEP"
      ],
      "float":[
         "TXCOUVGLO_COM_2014",
         "TXCOUVGLO_DEP_2014",
         "TXCOUVGLO_EPCI_2014"
      ],
      "siren":[
         "NUMEPCI"
      ],
      "string":[
         "NOMEPCI"
      ],
      "commune":[
         "NOMCOM"
      ],
      "departement":[
         "NOMDEP"
      ]
   },
   "encoding":"ISO-8859-1",
   "separator":";",
   "continuous":[
      "TXCOUVGLO_DEP_2014",
      "TXCOUVGLO_EPCI_2014"
   ],
   "categorical":[
      
   ],
   "total_lines":36636,
   "columns_fields":{
      "NOMCOM":{
         "score":1.0,
         "format":"commune",
         "python_type":"string"
      },
      "NOMDEP":{
         "score":1.0,
         "format":"departement",
         "python_type":"string"
      },
      "NUMCOM":{
         "score":1.0,
         "format":"code_commune_insee",
         "python_type":"string"
      },
      "NUMDEP":{
         "score":1.0,
         "format":"code_departement",
         "python_type":"string"
      },
      "NOMEPCI":{
         "score":1.0,
         "format":"string",
         "python_type":"string"
      },
      "NUMEPCI":{
         "score":1.0,
         "format":"siren",
         "python_type":"string"
      },
      "TXCOUVGLO_COM_2014":{
         "score":1.0,
         "format":"float",
         "python_type":"float"
      },
      "TXCOUVGLO_DEP_2014":{
         "score":0.9183673469387755,
         "format":"latitude_wgs",
         "python_type":"float"
      },
      "TXCOUVGLO_EPCI_2014":{
         "score":0.9387755102040817,
         "format":"longitude_wgs",
         "python_type":"float"
      }
   },
   "columns_labels":{
      "NOMCOM":{
         "score":1.0,
         "format":"string",
         "python_type":"string"
      },
      "NOMDEP":{
         "score":1.0,
         "format":"string",
         "python_type":"string"
      },
      "NUMCOM":{
         "score":1.0,
         "format":"string",
         "python_type":"string"
      },
      "NUMDEP":{
         "score":1.0,
         "format":"string",
         "python_type":"string"
      },
      "NOMEPCI":{
         "score":1.0,
         "format":"string",
         "python_type":"string"
      },
      "NUMEPCI":{
         "score":1.0,
         "format":"string",
         "python_type":"string"
      },
      "TXCOUVGLO_COM_2014":{
         "score":0.5,
         "format":"code_commune_insee",
         "python_type":"string"
      },
      "TXCOUVGLO_DEP_2014":{
         "score":0.5,
         "format":"code_departement",
         "python_type":"string"
      },
      "TXCOUVGLO_EPCI_2014":{
         "score":1.0,
         "format":"string",
         "python_type":"string"
      }
   },
   "header_row_idx":0,
   "heading_columns":0,
   "trailing_columns":0
}

Code APE pris pour code Fantoir

Par exemple dans le fichier etablissements-du-domaine-sanitaire-et-social-en-france-2020.csv le code APE est detecte comme un code Fantoir.
Une solution potentielle serait de rajouter la detection des codes APE.

Add max length of column

[Maybe not : CHARACTER VARYING instead]
Useful for SQL uploads later on, to set the VARCHAR max length (ould use TEXT but impossible to set an index on a TEXT column)
Could be in the profile section :
{'tops': [
{'count': 2772, 'value': 'TTTTTTTT'},
{'count': 780, 'value': 'XXXXXXXXXX'}}],
'nb_distinct': 10,
'nb_missing_values': 28,
'max_length': 12}

Numpy subdependency version incompatibility

When installing csv-detective 0.7.1, on which project hydra depends, it needs pandas 1.5.3 which depends on Numpy>=1.23.2 so it also automatically installs Numpy 2.0.1.

When running csv-detective in hydra tests, we get the following Numpy error:
ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
It seems this error is related to an incompatible behaviour of Numpy >=2 with the current pandas code in csv-detective. More information here.

Three solutions to this (by order of preference):

  1. Try to run csv-detective and hydra with pandas==2.2.2 (latest pandas as of 2024/07/30). If the code works, pin csv-detective dependencies to pandas<2.3.0,>=2.2.0 so that pandas is pinned to 2.2.x
  2. If we run into the same error with pandas==2.2.2, fix the current code in csv-detective to be compatible with Numpy >= 2
  3. If for some reason that's too difficult, pin csv-detective dependencies to pandas<2.2.0,>=2.1.4 while we figure this out, since pandas 2.1.4 has added the limitation for Numpy to be <2.

requests==2.32.0 dependency is yanked (csv-detective 0.7.2.dev800)

Since requests 2.32.0 is yanked (https://pypi.org/project/requests/#history), we cannot install csv-detective 0.7.2.dev800 with some packages managers.

For example with rye package manager when installing latest hydra using csv-detective 0.7.2.dev800, we get this error:

No solution found when resolving dependencies:
  ╰─▶ Because requests==2.32.0 was yanked (reason: Yanked due to conflicts with CVE-2024-35195
      mitigation) and csv-detective==0.7.2.dev800 depends on requests==2.32.0, we can conclude that
      csv-detective==0.7.2.dev800 cannot be used.
      And because you require csv-detective==0.7.2.dev800, we can conclude that the requirements are
      unsatisfiable.

Solution: use latest 2.32.3, or use (>=2.32.0 AND <=2.33.0)

Screenshot 2024-08-20 at 11 13 29

date or datetime as python_type?

Would it make sense to output date or datetime as python_type for date_der_maj in the following example?

{
   "encoding":"UTF-8-SIG",
   "separator":";",
   "header_row_idx":0,
   "header":[
      "cle_interop",
      "uid_adresse",
      "voie_nom",
      "numero",
      "suffixe",
      "commune_nom",
      "position",
      "x",
      "y",
      "long",
      "lat",
      "source",
      "date_der_maj",
      "refparc",
      "voie_nom_eu",
      "complement"
   ],
   "total_lines":82,
   "heading_columns":0,
   "trailing_columns":0,
   "continuous":[
      "x",
      "y",
      "long",
      "lat"
   ],
   "categorical":[
      "uid_adresse",
      "suffixe",
      "commune_nom",
      "position",
      "source",
      "complement"
   ],
   "columns_fields":{
      "cle_interop":{
         "python_type":"float",
         "format":"float",
         "score":1.0
      },
      "uid_adresse":{
         "python_type":"string",
         "format":"string",
         "score":1.0
      },
      "voie_nom":{
         "python_type":"string",
         "format":"adresse",
         "score":1.0
      },
      "numero":{
         "python_type":"int",
         "format":"int",
         "score":1.0
      },
      "suffixe":{
         "python_type":"string",
         "format":"string",
         "score":1.0
      },
      "commune_nom":{
         "python_type":"string",
         "format":"commune",
         "score":1.0
      },
      "position":{
         "python_type":"string",
         "format":"string",
         "score":1.0
      },
      "x":{
         "python_type":"float",
         "format":"longitude_l93",
         "score":0.9795918367346939
      },
      "y":{
         "python_type":"float",
         "format":"latitude_l93",
         "score":1.0
      },
      "long":{
         "python_type":"float",
         "format":"latitude_wgs",
         "score":1.0
      },
      "lat":{
         "python_type":"float",
         "format":"longitude_wgs",
         "score":1.0
      },
      "source":{
         "python_type":"string",
         "format":"string",
         "score":1.0
      },
      "date_der_maj":{
         "python_type":"string",
         "format":"date",
         "score":1.0
      },
      "refparc":{
         "python_type":"string",
         "format":"string",
         "score":1.0
      },
      "voie_nom_eu":{
         "python_type":"string",
         "format":"string",
         "score":1.0
      },
      "complement":{
         "python_type":"string",
         "format":"string",
         "score":1.0
      }
   },
   "columns_labels":{
      "cle_interop":{
         "python_type":"string",
         "format":"string",
         "score":1.0
      },
      "uid_adresse":{
         "python_type":"string",
         "format":"adresse",
         "score":0.5
      },
      "voie_nom":{
         "python_type":"string",
         "format":"string",
         "score":1.0
      },
      "numero":{
         "python_type":"string",
         "format":"string",
         "score":1.0
      },
      "suffixe":{
         "python_type":"string",
         "format":"string",
         "score":1.0
      },
      "commune_nom":{
         "python_type":"string",
         "format":"commune",
         "score":0.5
      },
      "position":{
         "python_type":"string",
         "format":"latlon_wgs",
         "score":1.0
      },
      "x":{
         "python_type":"float",
         "format":"longitude_wgs_fr_metropole",
         "score":1.0
      },
      "y":{
         "python_type":"float",
         "format":"latitude_wgs_fr_metropole",
         "score":1.0
      },
      "long":{
         "python_type":"float",
         "format":"longitude_wgs_fr_metropole",
         "score":1.0
      },
      "lat":{
         "python_type":"float",
         "format":"latitude_wgs_fr_metropole",
         "score":1.0
      },
      "source":{
         "python_type":"string",
         "format":"string",
         "score":1.0
      },
      "date_der_maj":{
         "python_type":"string",
         "format":"date",
         "score":1.0
      },
      "refparc":{
         "python_type":"string",
         "format":"string",
         "score":1.0
      },
      "voie_nom_eu":{
         "python_type":"string",
         "format":"string",
         "score":1.0
      },
      "complement":{
         "python_type":"string",
         "format":"string",
         "score":1.0
      }
   },
   "columns":{
      "cle_interop":{
         "python_type":"float",
         "format":"float",
         "score":1.0
      },
      "uid_adresse":{
         "python_type":"string",
         "format":"string",
         "score":1.0
      },
      "voie_nom":{
         "python_type":"string",
         "format":"adresse",
         "score":1.0
      },
      "numero":{
         "python_type":"int",
         "format":"int",
         "score":1.0
      },
      "suffixe":{
         "python_type":"string",
         "format":"string",
         "score":1.0
      },
      "commune_nom":{
         "python_type":"string",
         "format":"commune",
         "score":1.25
      },
      "position":{
         "python_type":"string",
         "format":"string",
         "score":1.0
      },
      "x":{
         "python_type":"float",
         "format":"longitude_l93",
         "score":1.4693877551020407
      },
      "y":{
         "python_type":"float",
         "format":"latitude_l93",
         "score":1.5
      },
      "long":{
         "python_type":"float",
         "format":"longitude_wgs_fr_metropole",
         "score":1.5
      },
      "lat":{
         "python_type":"float",
         "format":"latitude_wgs_fr_metropole",
         "score":1.5
      },
      "source":{
         "python_type":"string",
         "format":"string",
         "score":1.0
      },
      "date_der_maj":{
         "python_type":"string",
         "format":"date",
         "score":1.5
      },
      "refparc":{
         "python_type":"string",
         "format":"string",
         "score":1.0
      },
      "voie_nom_eu":{
         "python_type":"string",
         "format":"string",
         "score":1.0
      },
      "complement":{
         "python_type":"string",
         "format":"string",
         "score":1.0
      }
   },
   "formats":{
      "float":[
         "cle_interop"
      ],
      "string":[
         "uid_adresse",
         "suffixe",
         "position",
         "source",
         "refparc",
         "voie_nom_eu",
         "complement"
      ],
      "adresse":[
         "voie_nom"
      ],
      "int":[
         "numero"
      ],
      "commune":[
         "commune_nom"
      ],
      "longitude_l93":[
         "x"
      ],
      "latitude_l93":[
         "y"
      ],
      "longitude_wgs_fr_metropole":[
         "long"
      ],
      "latitude_wgs_fr_metropole":[
         "lat"
      ],
      "date":[
         "date_der_maj"
      ]
   }
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.