Code Monkey home page Code Monkey logo

herding-cats's Introduction

Herding-CATs ๐Ÿˆโ€โฌ›

Purpose

The aim of this project is simple, create a basic python library to explore and interact with open data catalogues.

Easily access data from several CKAN data sources with planned integrations for:

  • Polars
  • Pandas
  • DuckDB
  • MotherDuck
  • S3
  • Google Big Query
  • Maybe a few others

This will improve and speed up how people access and use data from open data catalogues.

Current Default Open Data Catalogues

Herding-CATs supports the following catalogues by default

CKAN

Catalogue Name Website Catalogue API Endpoint Definition
London Datastore https://data.london.gov.uk CKAN
Subak Data Catalogue https://data.subak.org CKAN
Gov Open Data https://www.data.gov.uk CKAN
Humanitarian Data Exchange https://data.humdata.org CKAN

TBC

Catalogue Name Website Catalogue API Endpoint Definition
Bristol Open Data https://opendata.bristol.gov.uk TBC
Data Mill North https://datamillnorth.org TBC
Icebreaker One https://ib1.org TBC

Basic usage examples:

# Example usage 0: List all available data packages in the catalogue
if __name__ == "__main__":
    with CkanCatSession("data.london.gov.uk") as session:
        explore = CkanCatExplorer(session)
        package_list =  explore.package_list()
        pprint(package_list)
# Example usage 1: Basic Search: Look for packages with a basic search term
if __name__ == "__main__":
    with CkanCatSession("data.london.gov.uk") as session:
        explore = CkanCatExplorer(session)
        census_package =  explore.package_search_json(search_query="census")
        pprint(census_package)
# Example usage 2: List packages and show package info
if __name__ == "__main__":
    with CkanCatSession("data.london.gov.uk") as session:
        explore = ckanCatExplorer(session)
        packlage_list = explore.package_list_json()
        boundary_info = explore.package_show_info_json('2011-boundary-files')
        pprint(show_info)
# Example usage 3: Condensed package info view with resource info - either packed or unpacked
if __name__ == "__main__":
    with CatSession("data.london.gov.uk") as session:
        explorer = CatExplorer(session)

        condensed_results = explorer.package_search_condense_dataframe_packed("police", 'polars')
        print(condensed_results)

        condensed_results = explorer.package_search_condense_dataframe_unpacked("police", 'polars')
        print(condensed_results)
# Example usage 4: Find the data you want, and then load it into a polars df for further processing
if __name__ == "__main__":
    with CkanCatSession("data.london.gov.uk") as session:
        explore = CkanCatExplorer(session)
        all_packages = explore.package_list_dictionary()
        data = all_packages.get("violence-reduction-unit")
        info = explore.package_show_info_json(data)
        dl_link = explore.extract_resource_url(info, "VRU Q1 2023-24 Dataset")

    analyser = CkanCatAnalyser()
    df = analyser.polars_data_loader(dl_link)
    print(df)
    # Use it like a normal Polars DF from here

# This works for different data catalogues as well
if __name__ == "__main__":
    with CkanCatSession("HUMANITARIAN") as session:
        explore = CkanCatExplorer(session)
        all_packages = explore.package_list_dictionary()
        data = all_packages.get("cameroon-humanitarian-needs")
        info = explore.package_show_info_json(data)
        dl_link = explore.extract_resource_url(info, "cmr_hpc_needs_2024")

    analyser = CkanCatAnalyser()
    df = analyser.polars_data_loader(dl_link)
    print(df)
# Example usage 5: Find the data you want, and then load it into a local duckdb for further processing
if __name__ == "__main__":
    with CkanCatSession("humanitarian") as session:
        explore = CkanCatExplorer(session)
        all_packages = explore.package_list_dictionary()
        data = all_packages.get("cameroon-humanitarian-needs")
        info = explore.package_show_info_json(data)
        dl_link = explore.extract_resource_url(info, "cmr_hpc_needs_2024")

    analyser = CkanCatAnalyser()
    df = analyser.duckdb_data_loader_persist(dl_link, "test", "test_table")
    print(df)

herding-cats's People

Contributors

chriscarlon avatar

Watchers

 avatar  avatar

herding-cats's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.