heardlibrary / linked-data Goto Github PK
View Code? Open in Web Editor NEWDocumentation and Data related to the Linked Data and Wikidata Working Groups
License: GNU General Public License v3.0
Documentation and Data related to the Linked Data and Wikidata Working Groups
License: GNU General Public License v3.0
@jbaskauf The csv-metadata.json schema will work as it is currently being generated, but it would make sense to move the UUID columns ahead of the generic value columns, since the directionality of the edges those columns generate are
item qID -> statement UUID -> value
Currently, it reads more like this:
statement UUID -> value
item qID -> statement UUID
See 8c85101 for how I'd like for it to be ordered.
Hi it seems you have reinvented the wheel a bit in https://github.com/HeardLibrary/linked-data/blob/master/vanderbot/vanderbot.py
I can highly recommend you to rewrite it using https://www.wikidata.org/wiki/Wikidata:Wikidata:Tools/WikibaseIntegrator :)
Testing reveals that the snak JSON for a commons image looks like this:
"P18": [
{
"mainsnak": {
"snaktype": "value",
"property": "P18",
"hash": "c99db2661b192fd22f5dff1a532b1d6eb9433ef9",
"datavalue": {
"value": "Ace The Wonder Dog.jpg",
"type": "string"
},
"datatype": "commonsMedia"
},
"type": "statement",
"id": "Q15397819$0d43e90b-4963-b88f-49c4-fc42afe5b606",
"rank": "normal"
}
]
When queried via SPARQL, the returned value is:
http://commons.wikimedia.org/wiki/Special:FilePath/Ace%20The%20Wonder%20Dog.jpg
I think @eshook2010 probably already has notes on this, but we should get something put in this repo.
I've started a list of APIs here, but it needs to be expanded and fleshed out.
When creating new records, the response JSON gave this, with an error message:
rocessing row: 27 Label: Julia Pim Reis new record
Write confirmation: {'entity': {'labels': {'en': {'language': 'en', 'value': 'Julia Pim Reis'}}, 'descriptions': {'en': {'language': 'en', 'value': 'biodiversity software developer'}}, 'aliases': {}, 'sitelinks': {}, 'claims': {'P108': [{'mainsnak': {'snaktype': 'value', 'property': 'P108', 'hash': 'a1db1fbb4ae38348b212f40c4b89752a0aaffacc', 'datavalue': {'value': {'entity-type': 'item', 'numeric-id': 233098, 'id': 'Q233098'}, 'type': 'wikibase-entityid'}, 'datatype': 'wikibase-item'}, 'type': 'statement', 'id': 'Q99580299$2CCF45A6-04DF-48B8-A6F4-248982C93FBC', 'rank': 'normal', 'references': [{'hash': '5c2b16da511ee084aa935c5b011d7e8c21187b41', 'snaks': {'P854': [{'snaktype': 'value', 'property': 'P854', 'hash': 'bb37fc08d6164b52ebb76f807e6c547a227f725d', 'datavalue': {'value': 'https://www.linkedin.com/in/juliapimreis/', 'type': 'string'}, 'datatype': 'url'}], 'P813': [{'snaktype': 'value', 'property': 'P813', 'hash': '5d20b450b1aa6c2cd42bb1d1b137f4d841b595e1', 'datavalue': {'value': {'time': '+2020-09-24T00:00:00Z', 'timezone': 0, 'before': 0, 'after': 0, 'precision': 11, 'calendarmodel': 'http://www.wikidata.org/entity/Q1985727'}, 'type': 'time'}, 'datatype': 'time'}]}, 'snaks-order': ['P854', 'P813']}]}], 'P31': [{'mainsnak': {'snaktype': 'value', 'property': 'P31', 'hash': 'ad7d38a03cdd40cdc373de0dc4e7b7fcbccb31d9', 'datavalue': {'value': {'entity-type': 'item', 'numeric-id': 5, 'id': 'Q5'}, 'type': 'wikibase-entityid'}, 'datatype': 'wikibase-item'}, 'type': 'statement', 'id': 'Q99580299$7E46DB32-9BED-4CDD-8DC6-AFF05ECA69D6', 'rank': 'normal'}], 'P21': [{'mainsnak': {'snaktype': 'value', 'property': 'P21', 'hash': '5760796ff6ebc63aae12cdcbf509b07ebf0bd201', 'datavalue': {'value': {'entity-type': 'item', 'numeric-id': 6581072, 'id': 'Q6581072'}, 'type': 'wikibase-entityid'}, 'datatype': 'wikibase-item'}, 'type': 'statement', 'id': 'Q99580299$5EC83EC8-B4E2-41CD-8EE3-60D38F403A3B', 'rank': 'normal'}]}, 'id': 'Q99580299', 'type': 'item', 'lastrevid': 1281714167}, 'success': 1}
No reference in the response JSON matched with the reference for statement: Q99580299 P108 Q233098
Reference {'refHashColumn': 'employerReferenceHash', 'refPropList': ['P854', 'P813'], 'refValueColumnList': ['employerReferenceSourceUrl', 'employerReferenceRetrieved'], 'refEntityOrLiteral': ['literal', 'value'], 'refTypeList': ['url', 'time'], 'refValueTypeList': ['string', 'time']}
The error message is generated on line 1315, where it is noted that the condition causing it should never occur. This needs to be debugged by recording which instance of setting referenceMatch = False
was the one that triggered the error. Probably also good to record the value of responseReference
during the loop that sets it False
. I'm thinking that the break
in line 1299 isn't really killing the loop and that it's continuing after there is a match to the correct reference. The reason I think so is that the value is getting correctly set in the table.
Here is the situation:
Generally, when a property column like employer_startDate
has a value type of Date
chosen from the dropdown, the csv-metadata.json
output file needs to change (in a qualifier example) from
{
"titles": "employer_startDate",
"name": "employer_startDate",
"datatype": "dateTime",
"aboutUrl": "http://www.wikidata.org/entity/statement/{qid}-{employer_uuid}",
"propertyUrl": "http://www.wikidata.org/prop/qualifier/P580"
},
to
{
"titles": "employer_startDate_rand",
"name": "employer_startDate_rand",
"datatype": "string",
"aboutUrl": "http://www.wikidata.org/entity/statement/{qid}-{employer_uuid}",
"propertyUrl": "http://www.wikidata.org/prop/qualifier/value/P580",
"valueUrl": "http://example.com/.well-known/genid/{employer_startDate_rand}"
},
{
"titles": "employer_startDate_val",
"name": "employer_startDate_val",
"datatype": "dateTime",
"aboutUrl": "http://example.com/.well-known/genid/{employer_startDate_rand}",
"propertyUrl": "http://wikiba.se/ontology#timeValue"
},
{
"titles": "employer_startDate_prec",
"name": "employer_startDate_prec",
"datatype": "integer",
"aboutUrl": "http://example.com/.well-known/genid/{employer_startDate_rand}",
"propertyUrl": "http://wikiba.se/ontology#timePrecision"
},
The CSV header needs to change from
...,employer_startDate,...
to
...,employer_startDate_rand,employer_startDate_val,employer_startDate_prec,...
Before, the property linking to the direct value had a namespace like http://www.wikidata.org/prop/x/
where x
was statement
, reference
, or qualifier
. Now, the property linking to the value node needs to have a namespace like http://www.wikidata.org/prop/x/value/
where x
still is statement
, reference
, or qualifier
. The links from the value node to the time value and time precision are the same regardless of the property link. The IRI pattern for the value node identifier is always the same, a blank node Skolem IRI: http://example.org/.well-known/genid/{propertyName_rand}
, where propertyName_rand
is the property name from the form with _rand
appended.
In these cases, the record was successfully written to the API and the rest of the metadata (except for the associated references) is written to the CSV file. However, if the CSV is used to write again, duplicate claims will be made since the UUIDs and hashes aren't recorded in the table. So they have to be manually copied out of the returned JSON and pasted into the CSV.
Q111821575
transformed image URL:
http://commons.wikimedia.org/wiki/Special:FilePath/%22Accusing%20Finger%20of%20Conscience...God%20and%20Conscience%20Witness%20Every%20Action...The%20Authorities%20Ask%20That%20You%20Save%20Fats...Reli%20-%20NARA%20-%20512560.jpg
API value:
Accusing Finger of Conscience...God and Conscience Witness Every Action...The Authorities Ask That You Save Fats...Reli - NARA - 512560.jpg
Q111821677
transformed image URL:
http://commons.wikimedia.org/wiki/Special:FilePath/Henry%20Dunant%20apocalypse%20diagram%20.JPG
API value:
Henry Dunant apocalypse diagram.JPG
In this case it looks like the API stripped off a trailing space before the file extension.
Q111822239
transformed image URL:
http://commons.wikimedia.org/wiki/Special:FilePath/Simon%20Bening%20%28Flemish%20-%20Villagers%20on%20Their%20Way%20to%20Church%20-%20Google%20Art%20Project.jpg
API value:
Simon Bening - Villagers on Their Way to Church - Google Art Project.jpg
Pywikibot has throttling built in deeply into the library and I don't see a simple way to reduce the throttling time. It may be that it can't be easily reduced, and perhaps a different module would be necessary.
I'm exploring this software and looking to possibly utilize it.
However, before I begin, I'd like to know what is LICENSE for this project?
linked-data/publications/apis.md
Line 1 in bf68e00
@jbaskauf Sorry, the example I gave you to look at for qualifier statements was an old one where I was expressing the statement IRIs incorrectly. They are supposed to have the item Q ID appended with a dash in front of the UUIDs. You can see the correction in this diff: 573b3b8
This change makes the IRI the same as it is in all of the other cases where the statement is the subject of a triple.
It isn't clear to me what the best place is to store the data that we have scraped so that we can clean and disambiguate it. Some options are:
I didn't list Wikibase itself because before data can be put into it, we need to get past the identifier and data model issues. Eventually we would like for the data to live in a Wikibase instance, but it's going to have to be cleaned a lot first.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.