Comments (5)
Some of my initial thoughts on BODS-to-RDF integration and some challenges to consider.
- I'm assuming that OpenOwnership will provide and host for download the RDF format "atomically", correct? (i.e. an RDF-format register dataset will be available for each published BODS JSON register dataset)
- This is a long-running process (hours) and it's expected to increase with the register size.
- When integrating this, we should consider the option to also provide the RDF format for individual registers not just the combined register (#11)
- The conversion code at BODS-RDF (https://github.com/blueanvil/bods-rdf) is written in Kotlin (JVM) so there are several ways to proceed with integrating this, each with various implications:
- 4.1 Integrate the code as a library in a processing pipeline running on JVM. This will require JVM coding and JVM processes on the OpenOwnership pipeline.
- 4.2 Running the Gradle build to produce .ttl files for BODS data from JSONL format. This will only require a JVM 11+ available in the stack.
- 4.3 Rewrite this in any of the Flatterer languages and integrate it there. As this seems to be Python/Rust, it means we won't be able to assist with it, so we'd need someone with experience in these languages for implementation (we'll obviously assist with the conceptual elements). However, I'd assume this would be the preferred/sane approach?
- The RDF vocabularies should probably be generated and provided as deliverables together with the RDF data set. This is a one-off that can be simply achieved with Gradle/JVM for each BODS schema release (Blue Anvil can do that periodically). Alternatively, it can be integrated with one of the options above.
from bodsdata.
Thanks @cosmin-marginean for the comprehensive feedback. Just back from holidays and catching up with updates. I'm due to work with our team on updates to the data analysis tools in August. Will be in touch as soon as possible
from bodsdata.
@StephenAbbott to speak to @ScatteredInk about this work - https://github.com/cosmin-marginean/kbods - by @cosmin-marginean
from bodsdata.
Bear in mind related discussion openownership/data-standard#121
from bodsdata.
From @cosmin-marginean:
There is a Downloads section here which contains info on all BODS RDF datasets: https://github.com/cosmin-marginean/kbods/tree/main/kbods-rdf
I'm exporting these when I get a chance (once a month or so) and happy to host them in my S3 for now, so if you want to link to these feel free to do so.
I also have a short bash script to produce them if you ever want to include these in the registry pipeline on your side (takes a couple of hours to run though and needs about 50GBs of disk space).
from bodsdata.
Related Issues (18)
- Feature request: Add download size to data overview HOT 1
- New output: Parquet HOT 2
- Update BODS data analysis tools to offer Denmark, Slovakia and the UK BO registers as separate sources HOT 1
- Add config file for links to related resources for sources
- Write wrapper function to run entire bodsdata pipeline for a single source
- Incorrect bigquery link
- Issue warning when no data downloaded
- New output: Open Ownership Register data mapped to FollowTheMoney HOT 1
- Update descriptions and field download buttons to make clear that JSON files are in JSONLines
- json_zip stage assumes JSON Lines input files but they actually gzipped
- Need for input consistency check pipeline stage? HOT 1
- Use libcovebods for data checks rather than developing more here HOT 1
- New output: Senzing ready JSON HOT 1
- Add publication guides for each data source HOT 2
- Learn more about Microsoft Fabric
- New output: XLSX in multithreaded mode HOT 5
- New output: BODS JSON by source register HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bodsdata.