plazi / lycophron Goto Github PK
View Code? Open in Web Editor NEWBatch uploader to Zenodo
License: Creative Commons Zero v1.0 Universal
Batch uploader to Zenodo
License: Creative Commons Zero v1.0 Universal
Ok, so, this is a summary of what I discussed with @slint on skype today - and this discussion is a catch up of what we previously discussed in the last Arcadia Sprint meeting at CERN (Feb/2020). This is a Plazi-Zenodo join effort that aims a complete re-do of Lycophron in order to deliver a tool that can handle any use case, not only specific ones, with more performance and reliability.
Pandas
Dataframes;Click
), accepting commands as: upload, update, publish, delete, and uninstall, with some parameters to turn in/off sandbox mode, define the export file, edit sensible information (e.g. API token), and so on;.env
(python-dotenv
) to keep sensible information and other eventual parameters of the tool;Pydantic
, Marshmallow
);Celery
to 'multithreading';First step is building the Zenodo communication module, the next step would be implementing the first commands for the CLI.
Tomorrow I'll work on setting labels, milestones and creating templates for issues, and the README (at least the skeleton).
What do you think, @slint ?
Cheers!
Originally posted by @alejandromumo in #30 (comment)
From the comments and TODOs
in the code in #16, we can extract some follow-up issues to address later on.
Depends on new input from MfA folks: plazi/arcadia-project#234
Currently there are two issues related to DOIs:
Hi Donat and @flsimoes ,
Manuel finished last week the Sandbox upload of the ~230 records bats collection from the Google Sheet Felipe and Juliana shared in the last Arcadia sprint.
Some pending action items before we go ahead with uploading to production:
Let me know if you have any questions, I’ll be off next week on holidays, but we can re-route to the rest of the team so they can help.
Cheers,
Alex
PS: I couldn’t find Juliana’s email address, so feel free to forward this or add her in the loop.
@punkish @tcatapano @slint
here are some more files that could be the missing progam files in the repo. Can you please check out?
https://drive.google.com/drive/folders/1-Bm1ihJtZdTLOd35iXd-TFd60aKATYZJ
{<item_id>:<identifier>:<is_bidirectional>}
, e.g. {specimen001:doi:true}
get a new version of the bat files by Monday https://drive.google.com/drive/folders/13kZvDDCUq4ueNleQbiVx09B7v0C3EkqV
this is the complete list of bat publications used for data extraction https://docs.google.com/spreadsheets/d/1y5uBKvyzDQgQUtyRHvn5f20IOA7LoMho/edit#gid=966826746
@alejandromumo @slint Where can I find a template XLS to be used to upload articles to BLR?
We need in the TNA projects in BiCIKL these XLS to hand it out to the awardeed so that they can add their publications in a format that saves us time to then upload.
thanks for a link
Donat
I guess this is only needed for local developemnt, when calling localhost, probably we should use a dev flag to enable/disable this?
if self.config["ENV"] == "development"
client.session.verify = False
Originally posted by @jrcastro2 in #32 (comment)
@slint @jhpoelen @lnielsen
Here is a draft of a CSV for the lycophron upload to Zenodo.
https://docs.google.com/spreadsheets/d/1f-_6MFzObIBlxeCaEtHD5ZRF0Kwj0zKq_BiPvbEyYSg/edit#gid=0
Alex, can you please have a look at it and let me know? Also, may be you indicate in a color what is required. I tried to do some, looking at the * in the upload form.
I am not sure, how to add multiple contributors, keywords with the line breaks in a single field. When I save it and open the CSV file, then the XLS is not the same.
What do you recommend when we have the bibliographic reference article as a string, such as Nature 541: 136-138 and authors as string?
Do we need to parse this out in a first round, or just add?
Thanks
Donat
The current data model supports that each record has a "status", thus allowing the user to understand the current progress on its uploads.
The application supports it but there is currently an error when accessing the database inside a celery task:
File "/Users/alejandromumo/.virtualenvs/lycophron/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1964, in _exec_single_context
self.dialect.do_execute(
File "/Users/alejandromumo/.virtualenvs/lycophron/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 747, in do_execute
cursor.execute(statement, parameters)
MemoryError
[2023-03-27 15:36:41,853: ERROR/MainProcess] Pool callback raised exception: MemoryError('Process got: ')
Traceback (most recent call last):
File "/Users/alejandromumo/.virtualenvs/lycophron/lib/python3.9/site-packages/billiard/pool.py", line 1796, in safe_apply_callback
fun(*args, **kwargs)
File "/Users/alejandromumo/.virtualenvs/lycophron/lib/python3.9/site-packages/celery/worker/request.py", line 730, in on_success
return self.on_failure(retval, return_ok=True)
File "/Users/alejandromumo/.virtualenvs/lycophron/lib/python3.9/site-packages/celery/worker/request.py", line 545, in on_failure
raise MemoryError(f'Process got: {exc}')
MemoryError: Process got:
it seems that the engine (sqlite) is executing the cursor to fetch the data and somehow fails. The, celery returns a MemoryError
.
Hi Alex,
We recently discussed, by email, on how update custom metadata fields and that raised me a couple of question, especially because we're designing this tool to be universally used - not exclusively used by our domain.
Take our universe of custom metadata fields as an example. We have some fields that will always have a single value (most of the DwC based ones) and some other that will have multiple values, like, locations
in treatments, or, the OBO ones. For the ones with unique values, the idea of using a relational database (like a spreadsheet to an extant) as input would work perfectly fine. We can take the value on that specific column/row and replace it in the server. But if you start thinking about the custom metadata fields with multiple values, we need to know the value to be changed, not only the new value. Then I have the following questions:
I've some ideas in mind, but I'll let you start the brainstorm here.
Thanks!
We can integrate the following bullets into the main docs of the CSV fields:
doi
field:
subjects.subject
): the cell value contains “new-line” separated keywordscreators.*
): following the “new-line” separated convention, these have been “tabularized”. In the example there are two authors: Nils Schlüter (affiliation: Museum für Naturkunde, ORCID: 0000-0002-5699-3684) and John Smith (affiliation: CERN, ORCID: none)?q=<search term>
query string parameter to narrow down results)
[resource_type.id](http://resource_type.id/)
): https://zenodo.org/api/vocabularies/resourcetypes?size=1000[creators.affiliations.id](http://creators.affiliations.id/)
): https://zenodo.org/api/affiliations
[rights.id](http://rights.id/)
): https://zenodo.org/api/vocabularies/licenses?size=1000
[contributors.role.id](http://contributors.role.id/)
): https://zenodo.org/api/vocabularies/contributorsroles?size=1000[languages.id](http://languages.id/)
): https://zenodo.org/api/vocabularies/languages?size=1000[related_identifiers.relation_type.id](http://related_identifiers.relation_type.id/)
): https://zenodo.org/api/vocabularies/relationtypes?size=1000A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.