Code Monkey home page Code Monkey logo

Comments (5)

cosmin-marginean avatar cosmin-marginean commented on August 21, 2024

Some of my initial thoughts on BODS-to-RDF integration and some challenges to consider.

  1. I'm assuming that OpenOwnership will provide and host for download the RDF format "atomically", correct? (i.e. an RDF-format register dataset will be available for each published BODS JSON register dataset)
  2. This is a long-running process (hours) and it's expected to increase with the register size.
  3. When integrating this, we should consider the option to also provide the RDF format for individual registers not just the combined register (#11)
  4. The conversion code at BODS-RDF (https://github.com/blueanvil/bods-rdf) is written in Kotlin (JVM) so there are several ways to proceed with integrating this, each with various implications:
    • 4.1 Integrate the code as a library in a processing pipeline running on JVM. This will require JVM coding and JVM processes on the OpenOwnership pipeline.
    • 4.2 Running the Gradle build to produce .ttl files for BODS data from JSONL format. This will only require a JVM 11+ available in the stack.
    • 4.3 Rewrite this in any of the Flatterer languages and integrate it there. As this seems to be Python/Rust, it means we won't be able to assist with it, so we'd need someone with experience in these languages for implementation (we'll obviously assist with the conceptual elements). However, I'd assume this would be the preferred/sane approach?
  5. The RDF vocabularies should probably be generated and provided as deliverables together with the RDF data set. This is a one-off that can be simply achieved with Gradle/JVM for each BODS schema release (Blue Anvil can do that periodically). Alternatively, it can be integrated with one of the options above.

from bodsdata.

StephenAbbott avatar StephenAbbott commented on August 21, 2024

Thanks @cosmin-marginean for the comprehensive feedback. Just back from holidays and catching up with updates. I'm due to work with our team on updates to the data analysis tools in August. Will be in touch as soon as possible

from bodsdata.

StephenAbbott avatar StephenAbbott commented on August 21, 2024

@StephenAbbott to speak to @ScatteredInk about this work - https://github.com/cosmin-marginean/kbods - by @cosmin-marginean

from bodsdata.

StephenAbbott avatar StephenAbbott commented on August 21, 2024

Bear in mind related discussion openownership/data-standard#121

from bodsdata.

StephenAbbott avatar StephenAbbott commented on August 21, 2024

From @cosmin-marginean:

There is a Downloads section here which contains info on all BODS RDF datasets: https://github.com/cosmin-marginean/kbods/tree/main/kbods-rdf

I'm exporting these when I get a chance (once a month or so) and happy to host them in my S3 for now, so if you want to link to these feel free to do so.

I also have a short bash script to produce them if you ever want to include these in the registry pipeline on your side (takes a couple of hours to run though and needs about 50GBs of disk space).

from bodsdata.

Related Issues (18)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.