Comments (5)
Have we considered doing any smarts on the incremental package_search, using metadata_modified to order - and page until we meet a record already updated? Whilst this may not solve up the issue, it will minimise the time it is missing, and minimise the chance of it occurring because of the reduced time querying?
This should enable more frequent increments (say hourly), coupled with a daily full refresh?
from magda.
yeah that's a good idea for doing incremental updates. I believe CKAN also has an API for getting modified packages since a date. But any kind of incremental update requires us to be able to do a full crawl and know that we got everything that was last modified before a certain date (the beginning or end of the crawl, let's say, depending on how the initial crawl works). So this issue is about making sure the initial crawl is actually reliable.
from magda.
Although maybe what you're suggesting is that we make a full crawl reliable by doing it naively, and then following it up with incremental one(s) until we don't see any more changes. That could work, and might be simpler than the page-with-overlap thing I described.
from magda.
yeah - if a full crawl is always followed by an incremental it should fix any updates. The same logic may also be applied to "large" multi-page incrementals for surety?
from magda.
Fixed by 6a546f6 using the simple approach of asking package_search
to sort results by metadata_created asc
. The order shouldn't be affected by updates, and new datasets should appear at the end.
from magda.
Related Issues (20)
- Registry API performance issues: host connection pool & blocking DB IO HOT 2
- Better way of passing deploy time config values to Scala Web Service HOT 2
- Allow setting global SQL statement timeout for registry API HOT 2
- Allow to disable scss-compiler job via helm chart config HOT 1
- Upgrade to Node 18 HOT 1
- Allow to specify service account for cloud sql proxy pod HOT 1
- No key / secrets cert-manager setup for dev & prod cluster HOT 2
- decision endpoint should make decision based on anonymous user's data when the user doesn't exist HOT 1
- New aspect "preview-map-settings" to provide dataset / distribution level controls over UI functionality HOT 1
- Registry records manager: Able to search record by keywords & records list HOT 3
- When using cloudSQL DB setup, `db-main-account-secret` might be incorrectly removed during the initial deployment HOT 1
- Update region mapping for Magda
- React 18 Upgrade & NodeJs Upgrade HOT 1
- Move away from the pancake HOT 1
- Update magda to use ASGS 2021 boundaries and latest LGAs
- Make record aspects APIs support full text search
- Registry performance issue HOT 3
- Allow user to manage the access to gateway proxied APIs HOT 1
- Investigate whether gateway will restart for losing connection of proxy target HOT 2
- Demo Preparation for CSIRO Chief Scientist HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from magda.