Code Monkey home page Code Monkey logo

wikibaseintegrator's Introduction

Wikibase Integrator

PyPi Python pytest Python Code Quality and Lint CodeQL Pyversions Read the Docs

Wikibase Integrator is a python package whose purpose is to manipulate data present on a Wikibase instance (like Wikidata).

Breaking changes in v0.12

A complete rewrite of the WikibaseIntegrator core has been done in v0.12 which has led to some important changes.

It offers a new object-oriented approach, better code readability and support for Property, Lexeme and MediaInfo entities (in addition to Item).

If you want to stay on v0.11.x, you can put this line in your requirements.txt:

wikibaseintegrator~=0.11.3

WikibaseIntegrator / WikidataIntegrator

WikibaseIntegrator (wbi) is a fork of WikidataIntegrator (wdi) whose purpose is to focus on an improved compatibility with Wikibase and adding missing functionalities. The main differences between these two libraries are :

  • A complete rewrite of the library with a more object-oriented architecture allowing for easy interaction, data validation and extended functionality
  • Add support for reading and writing Lexeme, MediaInfo and Property entities
  • Python 3.8 to 3.12 support, validated with unit tests
  • Type hints implementation for arguments and return, checked with mypy static type checker
  • Add OAuth 2.0 login method
  • Add logging module support

But WikibaseIntegrator lack the "fastrun" functionality implemented in WikidataIntegrator.

Documentation

A (basic) documentation generated from the python source code is available on the Read the Docs website.

Jupyter notebooks

You can find some sample code (adding an entity, a lexeme, etc.) in the Jupyter notebook directory of the repository.

Common use cases

Read an existing entity

From import_entity.ipynb

entity = wbi.item.get('Q582')

Start a new entity

From item_create_new.ipynb

entity = wbi.item.new()

Write an entity to instance

From import_entity.ipynb

entity.write()

Add labels

From item_create_new.ipynb

entity.labels.set('en', 'New item')
entity.labels.set('fr', 'Nouvel élément')

Get label value

From item_get.ipynb

entity.labels.get('en').value

Add aliases

From item_create_new.ipynb

entity.aliases.set('en', 'Item')
entity.aliases.set('fr', 'Élément')

Add descriptions

From item_create_new.ipynb

entity.descriptions.set('en', 'A freshly created element')
entity.descriptions.set('fr', 'Un élément fraichement créé')

Add a simple claim

From item_create_new.ipynb

claim_time = datatypes.Time(prop_nr='P74', time='now')

entity.claims.add(claim_time)

Get claim value

From item_get.ipynb

entity.claims.get('P2048')[0].mainsnak.datavalue['value']['amount']

Manipulate claim, add a qualifier

From item_create_new.ipynb

qualifiers = Qualifiers()
qualifiers.add(datatypes.String(prop_nr='P828', value='Item qualifier'))

claim_string = datatypes.String(prop_nr='P31533', value='A String property', qualifiers=qualifiers)
entity.claims.add(claim_string)

Manipulate claim, add references

From item_create_new.ipynb

references = References()
reference1 = Reference()
reference1.add(datatypes.String(prop_nr='P828', value='Item string reference'))

reference2 = Reference()
reference2.add(datatypes.String(prop_nr='P828', value='Another item string reference'))

references.add(reference1)
references.add(reference2)

new_claim_string = datatypes.String(prop_nr='P31533', value='A String property', references=references)
entity.claims.add(claim_string)

Get lemma on lexeme

lexeme.lemmas.get(language='fr')

Set lemma on lexeme

From lexeme_update.ipynb

lexeme.lemmas.set(language='fr', value='réponse')

Add gloss to a sense on lexeme

From lexeme_write.ipynb

sense = Sense()
sense.glosses.set(language='en', value='English gloss')
sense.glosses.set(language='fr', value='French gloss')
claim = datatypes.String(prop_nr='P828', value="Create a string claim for sense")
sense.claims.add(claim)
lexeme.senses.add(sense)

Add form to a lexeme

From lexeme_write.ipynb

form = Form()
form.representations.set(language='en', value='English form representation')
form.representations.set(language='fr', value='French form representation')
claim = datatypes.String(prop_nr='P828', value="Create a string claim for form")
form.claims.add(claim)
lexeme.forms.add(form)

Other projects

Here is a list of different projects that use the library:

Installation

The easiest way to install WikibaseIntegrator is to use the pip package manager. WikibaseIntegrator supports Python 3.8 and above. If Python 2 is installed, pip will lead to an error indicating missing dependencies.

python -m pip install wikibaseintegrator

You can also clone the repo and run it with administrator rights or install it in a virtualenv.

git clone https://github.com/LeMyst/WikibaseIntegrator.git

cd WikibaseIntegrator

python -m pip install --upgrade pip setuptools

python -m pip install .

You can also use Poetry:

python -m pip install --upgrade poetry

python -m poetry install

To check that the installation is correct, launch a Python console and run the following code (which will retrieve the Wikidata element for Human):

from wikibaseintegrator import WikibaseIntegrator

wbi = WikibaseIntegrator()
my_first_wikidata_item = wbi.item.get(entity_id='Q5')

# to check successful installation and retrieval of the data, you can print the json representation of the item
print(my_first_wikidata_item.get_json())

Using a Wikibase instance

WikibaseIntegrator uses Wikidata as default endpoint. To use another instance of Wikibase instead, you can override the wbi_config module.

An example for a Wikibase instance installed with wikibase-docker, add this to the top of your script:

from wikibaseintegrator.wbi_config import config as wbi_config

wbi_config['MEDIAWIKI_API_URL'] = 'http://localhost/api.php'
wbi_config['SPARQL_ENDPOINT_URL'] = 'http://localhost:8834/proxy/wdqs/bigdata/namespace/wdq/sparql'
wbi_config['WIKIBASE_URL'] = 'http://wikibase.svc'

You can find more default settings in the file wbi_config.py

Wikimedia Foundation User-Agent policy

If you interact with a Wikibase instance hosted by the Wikimedia Foundation (like Wikidata, Wikimedia Commons, etc.), it's highly advised to follow the User-Agent policy that you can find on the page User-Agent policy of the Wikimedia Meta-Wiki.

You can set a complementary User-Agent by modifying the variable wbi_config['USER_AGENT'] in wbi_config.

For example, with your library name and contact information:

from wikibaseintegrator.wbi_config import config as wbi_config

wbi_config['USER_AGENT'] = 'MyWikibaseBot/1.0 (https://www.wikidata.org/wiki/User:MyUsername)'

The Core Parts

WikibaseIntegrator supports two modes in which it can be used, a normal mode, updating each item at a time, and a fast run mode, which preloads some data locally and then just updates items if the new data provided differs from Wikidata. The latter mode allows for great speedups when tens of thousands of Wikidata elements need to be checked for updates, but only a small number will eventually be updated, a situation typically encountered when synchronising Wikidata with an external resource.

Entity manipulation

WikibaseIntegrator supports the manipulation of Item, Property, Lexeme and MediaInfo entities through these classes:

  • wikibaseintegrator.entities.item.Item
  • wikibaseintegrator.entities.property.Property
  • wikibaseintegrator.entities.lexeme.Lexeme
  • wikibaseintegrator.entities.mediainfo.MediaInfo

Features:

  • Loading a Wikibase entity based on its Wikibase entity ID.
  • All Wikibase data types are implemented (and some data types implemented by extensions).
  • Full access to the entire Wikibase entity in the form of a JSON dict representation.

wbi_login

wbi_login provides the login functionality and also stores the cookies and edit tokens required (For security reasons, every MediaWiki edit requires an edit token). There is multiple methods to login:

  • wbi_login.OAuth2(consumer_token, consumer_secret) (recommended)
  • wbi_login.OAuth1(consumer_token, consumer_secret, access_token, access_secret)
  • wbi_login.Clientlogin(user, password)
  • wbi_login.Login(user, password)

There is more parameters available. If you want to authenticate on another instance than Wikidata, you can set the mediawiki_api_url, mediawiki_rest_url or mediawiki_index_url. Read the documentation for more information.

Login using OAuth1 or OAuth2

OAuth is the authentication method recommended by the MediaWiki developers. It can be used to authenticate a bot or to use WBI as a backend for an application.

As a bot

If you want to use WBI with a bot account, you should use OAuth as an Owner-only consumer. This allows to use the authentication without the "continue oauth" step.

The first step is to request a new OAuth consumer on your MediaWiki instance on the page "Special:OAuthConsumerRegistration", the "Owner-only" (or "This consumer is for use only by ...") has to be checked and the correct version of the OAuth protocol must be set (OAuth 2.0). You will get a consumer token and consumer secret (and an access token and access secret if you chose OAuth 1.0a). For a Wikimedia instance (like Wikidata), you need to use the Meta-Wiki website.

Example if you use OAuth 2.0:

from wikibaseintegrator import wbi_login

login_instance = wbi_login.OAuth2(consumer_token='<your_client_app_key>', consumer_secret='<your_client_app_secret>')

Example if you use OAuth 1.0a:

from wikibaseintegrator import wbi_login

login_instance = wbi_login.OAuth1(consumer_token='<your_consumer_key>', consumer_secret='<your_consumer_secret>',
                                  access_token='<your_access_token>', access_secret='<your_access_secret>')

To impersonate a user (OAuth 1.0a)

If WBI is to be used as a backend for a web application, the script must use OAuth for authentication, WBI supports this, you just need to specify consumer key and consumer secret when instantiating wbi_login.Login. Unlike login by username and password, OAuth is a 2-step process, as manual confirmation of the user for the OAuth login is required. This means that the wbi_login.OAuth1.continue_oauth() method must be called after creating the wbi_login.Login instance.

Example:

from wikibaseintegrator import wbi_login

login_instance = wbi_login.OAuth1(consumer_token='<your_consumer_key>', consumer_secret='<your_consumer_secret>')
login_instance.continue_oauth(oauth_callback_data='<the_callback_url_returned>')

The wbi_login.OAuth1.continue_oauth() method will either ask the user for a callback URL (normal bot execution) or take a parameter. Thus, in the case where WBI is used as a backend for a web application for example, the callback will provide the authentication information directly to the backend and thus no copy and paste of the callback URL is needed.

Login with a bot password

It's a good practice to use Bot password instead of simple username and password, this allows limiting the permissions given to the bot.

from wikibaseintegrator import wbi_login

login_instance = wbi_login.Login(user='<bot user name>', password='<bot password>')

Login with a username and a password

If you want to log in with your user account, you can use the "clientlogin" authentication method. This method is not recommended.

from wikibaseintegrator import wbi_login

login_instance = wbi_login.Clientlogin(user='<user name>', password='<password>')

Wikibase Data Types

Currently, Wikibase supports 17 different data types. The data types are represented as their own classes in wikibaseintegrator.datatypes. Each datatype has its own peculiarities, which means that some of them require special parameters (e.g. Globe Coordinates). They are available under the namespace wikibase.datatypes.

The data types currently implemented:

  • CommonsMedia
  • ExternalID
  • Form
  • GeoShape
  • GlobeCoordinate
  • Item
  • Lexeme
  • Math
  • MonolingualText
  • MusicalNotation
  • Property
  • Quantity
  • Sense
  • String
  • TabularData
  • Time
  • URL

Two additional data types are also implemented but require the installation of the MediaWiki extension to work properly:

For details of how to create values (=instances) with these data types, please (for now) consult the docstrings in the source code or the documentation website. Of note, these data type instances hold the values and, if specified, data type instances for references and qualifiers.

Structured Data on Commons

WikibaseIntegrator supports SDC (Structured Data on Commons) to update a media file hosted on Wikimedia Commons.

Retrieve data

from wikibaseintegrator import WikibaseIntegrator

wbi = WikibaseIntegrator()
media = wbi.mediainfo.get('M16431477')

# Retrieve the first "depicts" (P180) claim
print(media.claims.get('P180')[0].mainsnak.datavalue['value']['id'])

Write data

from wikibaseintegrator import WikibaseIntegrator
from wikibaseintegrator.datatypes import Item

wbi = WikibaseIntegrator()
media = wbi.mediainfo.get('M16431477')

# Add the "depicts" (P180) claim
media.claims.add(Item(prop_nr='P180', value='Q3146211'))

media.write()

More than Wikibase

WikibaseIntegrator natively supports some extensions:

Helper Methods

Use MediaWiki API

The method wbi_helpers.mediawiki_api_call_helper() allows you to execute MediaWiki API POST call. It takes a mandatory data array (data) and multiple optionals parameters like a login object of type wbi_login.Login, a mediawiki_api_url string if the MediaWiki is not Wikidata, a user_agent string to set a custom HTTP User Agent header, and an allow_anonymous boolean to force authentication.

Example:

Retrieve last 10 revisions from Wikidata element Q2 (Earth):

from wikibaseintegrator import wbi_helpers

data = {
    'action': 'query',
    'prop': 'revisions',
    'titles': 'Q2',
    'rvlimit': 10
}

print(wbi_helpers.mediawiki_api_call_helper(data=data, allow_anonymous=True))

Execute SPARQL queries

The method wbi_helpers.execute_sparql_query() allows you to execute SPARQL queries without a hassle. It takes the actual query string (query), optional prefixes (prefix) if you do not want to use the standard prefixes of Wikidata, the actual endpoint URL (endpoint), and you can also specify a user agent for the http header sent to the SPARQL server ( user_agent). The latter is very useful to let the operators of the endpoint know who you are, especially if you execute many queries on the endpoint. This allows the operators of the endpoint to contact you (e.g. specify an email address, or the URL to your bot code repository.)

Wikibase search entities

The method wbi_helpers.search_entities() allows for string search in a Wikibase instance. This means that labels, descriptions and aliases can be searched for a string of interest. The method takes five arguments: The actual search string (search_string), an optional server (mediawiki_api_url, in case the Wikibase instance used is not Wikidata), an optional user_agent, an optional max_results (default 500), an optional language (default 'en'), and an option dict_id_label to return a dict of item id and label as a result.

Merge Wikibase items

Sometimes, Wikibase items need to be merged. An API call exists for that, and wbi_core implements a method accordingly. wbi_helpers.merge_items() takes five arguments:

  • the QID of the item which should be merged into another item (from_id)
  • the QID of the item the first item should be merged into (to_id)
  • a login object of type wbi_login.Login to provide the API call with the required authentication information
  • a boolean if the changes need to be marked as made by a bot (is_bot)
  • a flag for ignoring merge conflicts (ignore_conflicts), will do a partial merge for all statements which do not conflict. This should generally be avoided because it leaves a crippled item in Wikibase. Before a merge, any potential conflicts should be resolved first.

Examples (in "normal" mode)

In order to create a minimal bot based on wbi_core, two things are required:

  • A datatype object containing a value.
  • An entity object (Item/Property/Lexeme/...) which takes the data, does the checks and performs write.

An optional Login object can be used to be authenticated on the Wikibase instance.

Create a new Item

from wikibaseintegrator import wbi_login, WikibaseIntegrator
from wikibaseintegrator.datatypes import ExternalID
from wikibaseintegrator.wbi_config import config as wbi_config

wbi_config['USER_AGENT'] = 'MyWikibaseBot/1.0 (https://www.wikidata.org/wiki/User:MyUsername)'

# login object
login_instance = wbi_login.OAuth2(consumer_token='<consumer_token>', consumer_secret='<consumer_secret>')

wbi = WikibaseIntegrator(login=login_instance)

# data type object, e.g. for a NCBI gene entrez ID
entrez_gene_id = ExternalID(value='<some_entrez_id>', prop_nr='P351')

# data goes into a list, because many data objects can be provided to
data = [entrez_gene_id]

# Create a new item
item = wbi.item.new()

# Set an english label
item.labels.set(language='en', value='Newly created item')

# Set a French description
item.descriptions.set(language='fr', value='Une description un peu longue')

item.claims.add(data)
item.write()

Modify an existing item

from wikibaseintegrator import wbi_login, WikibaseIntegrator
from wikibaseintegrator.datatypes import ExternalID
from wikibaseintegrator.wbi_enums import ActionIfExists
from wikibaseintegrator.wbi_config import config as wbi_config

wbi_config['USER_AGENT'] = 'MyWikibaseBot/1.0 (https://www.wikidata.org/wiki/User:MyUsername)'

# login object
login_instance = wbi_login.OAuth2(consumer_token='<consumer_token>', consumer_secret='<consumer_secret>')

wbi = WikibaseIntegrator(login=login_instance)

# data type object, e.g. for a NCBI gene entrez ID
entrez_gene_id = ExternalID(value='<some_entrez_id>', prop_nr='P351')

# data goes into a list, because many data objects can be provided to
data = [entrez_gene_id]

# Search and then edit an Item
item = wbi.item.get(entity_id='Q141806')

# Set an english label but don't modify it if there is already an entry
item.labels.set(language='en', value='An updated item', action_if_exists=ActionIfExists.KEEP)

# Set a French description and replace the existing one
item.descriptions.set(language='fr', value='Une description un peu longue', action_if_exists=ActionIfExists.REPLACE_ALL)

item.claims.add(data)
item.write()

A bot for Mass Import

An enhanced example of the previous bot just puts two of the three things into a 'for loop' and so allows mass creation, or modification of items.

from wikibaseintegrator import WikibaseIntegrator, wbi_login
from wikibaseintegrator.datatypes import ExternalID, Item, String, Time
from wikibaseintegrator.wbi_config import config as wbi_config
from wikibaseintegrator.wbi_enums import WikibaseDatePrecision

wbi_config['USER_AGENT'] = 'MyWikibaseBot/1.0 (https://www.wikidata.org/wiki/User:MyUsername)'

# login object
login_instance = wbi_login.OAuth2(consumer_token='<consumer_token>', consumer_secret='<consumer_secret>')

# We have raw data, which should be written to Wikidata, namely two human NCBI entrez gene IDs mapped to two Ensembl Gene IDs
raw_data = {
    '50943': 'ENST00000376197',
    '1029': 'ENST00000498124'
}

wbi = WikibaseIntegrator(login=login_instance)

for entrez_id, ensembl in raw_data.items():
    # add some references
    references = [
        [
            Item(value='Q20641742', prop_nr='P248'),
            Time(time='+2020-02-08T00:00:00Z', prop_nr='P813', precision=WikibaseDatePrecision.DAY),
            ExternalID(value='1017', prop_nr='P351')
        ]
    ]

    # data type object
    entrez_gene_id = String(value=entrez_id, prop_nr='P351', references=references)
    ensembl_transcript_id = String(value=ensembl, prop_nr='P704', references=references)

    # data goes into a list, because many data objects can be provided to
    data = [entrez_gene_id, ensembl_transcript_id]

    # Search for and then edit/create new item
    item = wbi.item.new()
    item.claims.add(data)
    item.write()

Examples (in "fast run" mode)

In order to use the fast run mode, you need to know the property/value combination which determines the data corpus you would like to operate on. E.g. for operating on human genes, you need to know that P351 is the NCBI Entrez Gene ID and you also need to know that you are dealing with humans, best represented by the found in taxon property (P703) with the value Q15978631 for Homo sapiens.

IMPORTANT: In order for the fast run mode to work, the data you provide in the constructor must contain at least one unique value/id only present on one Wikidata element, e.g. an NCBI entrez gene ID, Uniprot ID, etc. Usually, these would be the same unique core properties used for defining domains in wbi_core, e.g. for genes, proteins, drugs or your custom domains.

Below, the normal mode run example from above, slightly modified, to meet the requirements for the fast run mode. To enable it, ItemEngine requires two parameters, fast_run=True/False and fast_run_base_filter which is a dictionary holding the properties to filter for as keys, and the item QIDs as dict values. If the value is not a QID but a literal, just provide an empty string. For the above example, the dictionary looks like this:

from wikibaseintegrator.datatypes import ExternalID, Item

fast_run_base_filter = [ExternalID(prop_nr='P351'), Item(prop_nr='P703', value='Q15978631')]

The full example:

from wikibaseintegrator import WikibaseIntegrator, wbi_login
from wikibaseintegrator.datatypes import ExternalID, Item, String, Time
from wikibaseintegrator.wbi_enums import WikibaseDatePrecision

# login object
login = wbi_login.OAuth2(consumer_token='<consumer_token>', consumer_secret='<consumer_secret>')

fast_run_base_filter = [ExternalID(prop_nr='P351'), Item(prop_nr='P703', value='Q15978631')]
fast_run = True

# We have raw data, which should be written to Wikidata, namely two human NCBI entrez gene IDs mapped to two Ensembl Gene IDs
# You can iterate over any data source as long as you can map the values to Wikidata properties.
raw_data = {
    '50943': 'ENST00000376197',
    '1029': 'ENST00000498124'
}

for entrez_id, ensembl in raw_data.items():
    # add some references
    references = [
        [
            Item(value='Q20641742', prop_nr='P248')
        ],
        [
            Time(time='+2020-02-08T00:00:00Z', prop_nr='P813', precision=WikibaseDatePrecision.DAY),
            ExternalID(value='1017', prop_nr='P351')
        ]
    ]

    # data type object
    entrez_gene_id = String(value=entrez_id, prop_nr='P351', references=references)
    ensembl_transcript_id = String(value=ensembl, prop_nr='P704', references=references)

    # data goes into a list, because many data objects can be provided to
    data = [entrez_gene_id, ensembl_transcript_id]

    # Search for and then edit/create new item
    wb_item = WikibaseIntegrator(login=login).item.new()
    wb_item.add_claims(claims=data)
    wb_item.init_fastrun(base_filter=fast_run_base_filter)
    wb_item.write()

Note: Fastrun mode checks for equality of property/value pairs, qualifiers (not including qualifier attributes), labels, aliases and description, but it ignores references by default! References can be checked in fast run mode by setting use_refs to True.

Debugging

You can enable debugging by adding this piece of code to the top of your project:

import logging

logging.basicConfig(level=logging.DEBUG)

wikibaseintegrator's People

Contributors

andrawaag avatar andrewsu avatar cclauss avatar cthoyt avatar daniel-mietchen avatar danifdezalvarez avatar dependabot-preview[bot] avatar dependabot[bot] avatar dpriskorn avatar e-gor avatar eloiferrer avatar floatingpurr avatar guyfawcus avatar jleong-ndn avatar johnsamuelwrites avatar konstin avatar lemyst avatar luguenth avatar pdehaye avatar pgrond avatar physikerwelt avatar putmantime avatar sebotic avatar stuppie avatar tarrow avatar vrandezo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

wikibaseintegrator's Issues

mediawiki_api_call_helper example results in an error

from wikibaseintegrator import wbi_core

query = {
    'action': 'query',
    'prop': 'revisions',
    'titles': 'Q2',
    'rvlimit': 10
}

print(wbi_core.FunctionsEngine.mediawiki_api_call_helper(query, allow_anonymous=True))
->
Traceback (most recent call last):
  File "/home/egil/src/python/descriptionbot/test.py", line 10, in <module>
    print(wbi_core.FunctionsEngine.mediawiki_api_call_helper(query, allow_anonymous=True))
  File "/usr/lib/python3.9/site-packages/wikibaseintegrator/wbi_core.py", line 1130, in mediawiki_api_call_helper
    return FunctionsEngine.mediawiki_api_call('POST', mediawiki_api_url, login_session, data=data, headers=headers, max_retries=max_retries,
  File "/usr/lib/python3.9/site-packages/wikibaseintegrator/wbi_core.py", line 965, in mediawiki_api_call
    json_data = response.json()
  File "/usr/lib/python3.9/site-packages/requests/models.py", line 900, in json
    return complexjson.loads(self.text, **kwargs)
  File "/usr/lib/python3.9/site-packages/simplejson/__init__.py", line 525, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.9/site-packages/simplejson/decoder.py", line 370, in decode
    obj, end = self.raw_decode(s)
  File "/usr/lib/python3.9/site-packages/simplejson/decoder.py", line 400, in raw_decode
    return self.scan_once(s, idx=_w(s, idx).end())
simplejson.errors.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

GUI

@LeMyst, there is a growing demand for GUI in WikibaseIntegrator in both Wikidata & Wikibase communities. What do you think about making a simple prototype for that?

WBI is unable to set a description

installed from pip v0.10

The correct API is wbsetdescription but I get an error suggesting that WBI use another API when setting a description:

Traceback (most recent call last):
  File "/home/egil/src/python/descriptionbot/descriptionbot/main.py", line 88, in <module>
    main()
  File "/home/egil/src/python/descriptionbot/descriptionbot/main.py", line 76, in main
    result = item.write(
  File "/usr/lib/python3.9/site-packages/wikibaseintegrator/wbi_core.py", line 632, in write
    json_data = FunctionsEngine.mediawiki_api_call_helper(data=payload, login=login, max_retries=max_retries, retry_after=retry_after,
  File "/usr/lib/python3.9/site-packages/wikibaseintegrator/wbi_core.py", line 1130, in mediawiki_api_call_helper
    return FunctionsEngine.mediawiki_api_call('POST', mediawiki_api_url, login_session, data=data, headers=headers, max_retries=max_retries,
  File "/usr/lib/python3.9/site-packages/wikibaseintegrator/wbi_core.py", line 996, in mediawiki_api_call
    raise MWApiError(response.json() if response else dict())
wikibaseintegrator.wbi_core.MWApiError: {'error': {'code': 'permissiondenied', 'info': 'You do not have the permissions needed to carry out this action.', 'messages': [{'name': 'wikibase-api-permissiondenied', 'parameters': [], 'html': {'*': 'You do not have the permissions needed to carry out this action.'}}, {'name': 'badaccess-groups', 'parameters': ['*', 1], 'html': {'*': 'Den åtgärd du har begärt kan enbart utföras av användare i gruppen: *.'}}], '*': 'See https://www.wikidata.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at &lt;https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce&gt; for notice of API deprecations and breaking changes.'}, 'servedby': 'mw1342'}

The code I used was:

#!/usr/bin/env python3
# import warnings
# warnings.simplefilter(action='ignore', category=FutureWarning)
import gettext
import logging
from urllib.parse import quote

import requests
from wikibaseintegrator import wbi_core, wbi_login
from pprint import pprint

import config
# Import util first
# from modules import util
from modules import loglevel

# Settings
_ = gettext.gettext

#
# Functions
#
            
def main():
    logger = logging.getLogger(__name__)
    if config.loglevel is None:
        # Set loglevel
        loglevel.set_loglevel()
    logger.setLevel(config.loglevel)
    logger.level = logger.getEffectiveLevel()
    # file_handler = logging.FileHandler("europarl.log")
    # logger.addHandler(file_handler)

    engine = wbi_core.FunctionsEngine()
    params = {
        'action': 'query',
        'list': 'search',
        'format': 'json',
        'utf8': '1',
        # All scientific articles without any description
        'srsearch': "haswbstatement:P31=Q13442814 -hasdescription:*",
    }
    pprint(params)
    # data = engine.mediawiki_api_call(
    #     "GET",
    #     mediawiki_api_url="https://wikidata.org/w/api.php",
    #     data=params
    # )
    # pprint(data)
    data = requests.get(
        url="https://www.wikidata.org/w/api.php",
        params=params
    )
    # pprint(data.json())
    qids = [] # type: str
    for result in data.json()["query"]["search"]:
        qids.append(result["title"])
    pprint(qids)
    if len(qids) > 0:
        print("Logging in with Wikibase Integrator")
        config.login_instance = wbi_login.Login(
            user=config.username, pwd=config.password
        )
    count = 0
    for qid in qids:
        if count == 1:
            quit(0)
        item = wbi_core.ItemEngine(item_id=qid)
        print(type(item))
        item.set_description(
                "scientific article",
                lang="en",
                if_exists="KEEP"
        )
        print(type(item))
        result = item.write(
            config.login_instance,
            edit_summary=(
                "Added usage example " +
                "with [[Wikidata:Tools/DescriptionBot]]"
            )
        )
        print(result)
        print(f"{config.wd_prefix}{qid}")
        count += 1

if __name__ == "__main__":
    main()

Improve wbi_core.Url url schemes validator

WBI does not support the python version in PAWS

@PAWS:~$ pip install git+https://github.com/LeMyst/WikibaseIntegrator.git
Collecting git+https://github.com/LeMyst/WikibaseIntegrator.git
  Cloning https://github.com/LeMyst/WikibaseIntegrator.git to /tmp/pip-req-build-_eo5g689
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
    Preparing wheel metadata ... done
ERROR: Package 'wikibaseintegrator' requires a different Python: 3.6.9 not in '<3.10,>=3.7'
WARNING: You are using pip version 20.2; however, version 21.0.1 is available.
You should consider upgrading via the '/srv/paws/bin/python3.6 -m pip install --upgrade pip' command.

This is not really a bug of WBI, but it is nice to know so we should document it in the README IMO.
See https://phabricator.wikimedia.org/T265957

Bug on write, WBI invented a sense when trying to add a statement

installed from pip yesterday
It has worked fine writing data to a few thousand lexemes yesterday, but I got this error now:

Error while writing to the Wikibase instance
Traceback (most recent call last):
  File "/home/egil/src/python/lexsaob/lexsaob.py", line 166, in <module>
    result = item.write(
  File "/usr/lib/python3.9/site-packages/wikibaseintegrator/wbi_core.py", line 1008, in write
    raise MWApiError(json_data)
wikibaseintegrator.wbi_core.MWApiError: {'error': {'code': 'modification-failed', 'info': '[[Lexeme:L47647#S2|L47647-S2]] not found', 'messages': [{'name': 'wikibase-validator-no-such-entity', 'parameters': ['[[Lexeme:L47647#S2|L47647-S2]]'], 'html': {'*': '<a href="/wiki/Lexeme:L47647#S2" title="Lexeme:L47647">L47647-S2</a> hittades inte'}}], '*': 'See https://www.wikidata.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at &lt;https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce&gt; for notice of API deprecations and breaking changes.'}, 'servedby': 'mw1283'}

The error from Wikibase makes sense because WBI invented a second sense that does not exist in WD, in the claim representation:

{'claims': {'P5185': [{'mainsnak': {'snaktype': 'value', 'property': 'P5185', 'datavalue': {'value': {'entity-type': 'item', 'numeric-id': 1305037, 'id': 'Q1305037'}, 'type': 'wikibase-entityid'}, 'datatype': 'wikibase-item'}, 'type': 'statement', 'rank': 'normal', 'qualifiers': {}, 'qualifiers-order': [], 'references': [], 'id': 'L47647$C6135942-6978-498C-A737-E1C1847F516D'}], 'P5831': [{'mainsnak': {'snaktype': 'value', 'property': 'P5831', 'datavalue': {'value': {'text': 'Införsel av läroböcker på albanska och ungerska för användning i Vojvodina har godkänts.', 'language': 'sv'}, 'type': 'monolingualtext'}, 'datatype': 'monolingualtext'}, 'type': 'statement', 'rank': 'normal', 'qualifiers': {'P5830': [{'snaktype': 'value', 'property': 'P5830', 'datavalue': {'value': {'entity-type': 'form', 'id': 'L47647-F1'}, 'type': 'wikibase-entityid'}, 'datatype': 'wikibase-form', 'hash': 'fd2d2c70393ea7e3b081ad40ffffddd0d1ef0a43'}], 'P6072': [{'snaktype': 'value', 'property': 'P6072', 'datavalue': {'value': {'entity-type': 'sense', 'id': 'L47647-S2'}, 'type': 'wikibase-entityid'}, 'datatype': 'wikibase-sense', 'hash': '7016488e0dfd88e2470e8497bccf92bbaaf52dbf'}], 'P6191': [{'snaktype': 'value', 'property': 'P6191', 'datavalue': {'value': {'entity-type': 'item', 'numeric-id': 104597585, 'id': 'Q104597585'}, 'type': 'wikibase-entityid'}, 'datatype': 'wikibase-item', 'hash': '612d94ccfbed96431fa36ed922f4a8614da5e0d1'}]}, 'qualifiers-order': ['P5830', 'P6072', 'P6191'], 'references': [{'snaks': {'P248': [{'snaktype': 'value', 'property': 'P248', 'datavalue': {'value': {'entity-type': 'item', 'numeric-id': 5412081, 'id': 'Q5412081'}, 'type': 'wikibase-entityid'}, 'datatype': 'wikibase-item'}], 'P813': [{'snaktype': 'value', 'property': 'P813', 'datavalue': {'value': {'time': '+2021-01-30T00:00:00Z', 'timezone': 0, 'before': 0, 'after': 0, 'precision': 11, 'calendarmodel': 'http://www.wikidata.org/entity/Q1985727'}, 'type': 'time'}, 'datatype': 'time'}], 'P577': [{'snaktype': 'value', 'property': 'P577', 'datavalue': {'value': {'time': '+2012-05-12T00:00:00Z', 'timezone': 0, 'before': 0, 'after': 0, 'precision': 11, 'calendarmodel': 'http://www.wikidata.org/entity/Q1985727'}, 'type': 'time'}, 'datatype': 'time'}], 'P854': [{'snaktype': 'value', 'property': 'P854', 'datavalue': {'value': 'http://www.statmt.org/europarl/v7/sv-en.tgz', 'type': 'string'}, 'datatype': 'url'}], 'P7793': [{'snaktype': 'value', 'property': 'P7793', 'datavalue': {'value': 'europarl-v7.sv-en.sv', 'type': 'string'}, 'datatype': 'string'}], 'P7421': [{'snaktype': 'value', 'property': 'P7421', 'datavalue': {'value': '833630', 'type': 'string'}, 'datatype': 'string'}], 'P3865': [{'snaktype': 'value', 'property': 'P3865', 'datavalue': {'value': {'entity-type': 'item', 'numeric-id': 47461344, 'id': 'Q47461344'}, 'type': 'wikibase-entityid'}, 'datatype': 'wikibase-item'}]}, 'snaks-order': ['P248', 'P813', 'P577', 'P854', 'P7793', 'P7421', 'P3865'], 'hash': '89565efd99bc0c3e3a449119e2f30cb69f233663'}], 'id': 'L47647$56CA4914-1618-4BFC-A81D-F822E953B688'}], 'P1343': [{'mainsnak': {'snaktype': 'value', 'property': 'P1343', 'datavalue': {'value': {'entity-type': 'item', 'numeric-id': 1935308, 'id': 'Q1935308'}, 'type': 'wikibase-entityid'}, 'datatype': 'wikibase-item'}, 'type': 'statement', 'rank': 'normal', 'qualifiers': {}, 'qualifiers-order': [], 'references': []}], 'P8478': [{'mainsnak': {'snaktype': 'value', 'property': 'P8478', 'datavalue': {'value': 'U_0275-0054.vx9Q', 'type': 'string'}, 'datatype': 'external-id'}, 'type': 'statement', 'rank': 'normal', 'qualifiers': {}, 'qualifiers-order': [], 'references': []}]}, 'sitelinks': {}}

Url of lexeme: https://www.wikidata.org/wiki/Lexeme:L47647
Code running: https://github.com/dpriskorn/LexSAOB/blob/e6fd4246cc92b15550501004c32a3ef974facaa6/lexsaob.py#L166
Json of the lexeme:

{
  "entities": {
    "L47647": {
      "pageid": 63851792,
      "ns": 146,
      "title": "Lexeme:L47647",
      "lastrevid": 1351433049,
      "modified": "2021-01-30T21:28:15Z",
      "type": "lexeme",
      "id": "L47647",
      "lemmas": {
        "sv": {
          "language": "sv",
          "value": "ungerska"
        }
      },
      "lexicalCategory": "Q1084",
      "language": "Q9027",
      "claims": {
        "P5185": [
          {
            "mainsnak": {
              "snaktype": "value",
              "property": "P5185",
              "datavalue": {
                "value": {
                  "entity-type": "item",
                  "numeric-id": 1305037,
                  "id": "Q1305037"
                },
                "type": "wikibase-entityid"
              },
              "datatype": "wikibase-item"
            },
            "type": "statement",
            "id": "L47647$C6135942-6978-498C-A737-E1C1847F516D",
            "rank": "normal"
          }
        ],
        "P5831": [
          {
            "mainsnak": {
              "snaktype": "value",
              "property": "P5831",
              "datavalue": {
                "value": {
                  "text": "Införsel av läroböcker på albanska och ungerska för användning i Vojvodina har godkänts.",
                  "language": "sv"
                },
                "type": "monolingualtext"
              },
              "datatype": "monolingualtext"
            },
            "type": "statement",
            "qualifiers": {
              "P5830": [
                {
                  "snaktype": "value",
                  "property": "P5830",
                  "hash": "fd2d2c70393ea7e3b081ad40ffffddd0d1ef0a43",
                  "datavalue": {
                    "value": {
                      "entity-type": "form",
                      "id": "L47647-F1"
                    },
                    "type": "wikibase-entityid"
                  },
                  "datatype": "wikibase-form"
                }
              ],
              "P6072": [
                {
                  "snaktype": "value",
                  "property": "P6072",
                  "hash": "7016488e0dfd88e2470e8497bccf92bbaaf52dbf",
                  "datavalue": {
                    "value": {
                      "entity-type": "sense",
                      "id": "L47647-S2"
                    },
                    "type": "wikibase-entityid"
                  },
                  "datatype": "wikibase-sense"
                }
              ],
              "P6191": [
                {
                  "snaktype": "value",
                  "property": "P6191",
                  "hash": "612d94ccfbed96431fa36ed922f4a8614da5e0d1",
                  "datavalue": {
                    "value": {
                      "entity-type": "item",
                      "numeric-id": 104597585,
                      "id": "Q104597585"
                    },
                    "type": "wikibase-entityid"
                  },
                  "datatype": "wikibase-item"
                }
              ]
            },
            "qualifiers-order": [
              "P5830",
              "P6072",
              "P6191"
            ],
            "id": "L47647$56CA4914-1618-4BFC-A81D-F822E953B688",
            "rank": "normal",
            "references": [
              {
                "hash": "89565efd99bc0c3e3a449119e2f30cb69f233663",
                "snaks": {
                  "P248": [
                    {
                      "snaktype": "value",
                      "property": "P248",
                      "datavalue": {
                        "value": {
                          "entity-type": "item",
                          "numeric-id": 5412081,
                          "id": "Q5412081"
                        },
                        "type": "wikibase-entityid"
                      },
                      "datatype": "wikibase-item"
                    }
                  ],
                  "P813": [
                    {
                      "snaktype": "value",
                      "property": "P813",
                      "datavalue": {
                        "value": {
                          "time": "+2021-01-30T00:00:00Z",
                          "timezone": 0,
                          "before": 0,
                          "after": 0,
                          "precision": 11,
                          "calendarmodel": "http://www.wikidata.org/entity/Q1985727"
                        },
                        "type": "time"
                      },
                      "datatype": "time"
                    }
                  ],
                  "P577": [
                    {
                      "snaktype": "value",
                      "property": "P577",
                      "datavalue": {
                        "value": {
                          "time": "+2012-05-12T00:00:00Z",
                          "timezone": 0,
                          "before": 0,
                          "after": 0,
                          "precision": 11,
                          "calendarmodel": "http://www.wikidata.org/entity/Q1985727"
                        },
                        "type": "time"
                      },
                      "datatype": "time"
                    }
                  ],
                  "P854": [
                    {
                      "snaktype": "value",
                      "property": "P854",
                      "datavalue": {
                        "value": "http://www.statmt.org/europarl/v7/sv-en.tgz",
                        "type": "string"
                      },
                      "datatype": "url"
                    }
                  ],
                  "P7793": [
                    {
                      "snaktype": "value",
                      "property": "P7793",
                      "datavalue": {
                        "value": "europarl-v7.sv-en.sv",
                        "type": "string"
                      },
                      "datatype": "string"
                    }
                  ],
                  "P7421": [
                    {
                      "snaktype": "value",
                      "property": "P7421",
                      "datavalue": {
                        "value": "833630",
                        "type": "string"
                      },
                      "datatype": "string"
                    }
                  ],
                  "P3865": [
                    {
                      "snaktype": "value",
                      "property": "P3865",
                      "datavalue": {
                        "value": {
                          "entity-type": "item",
                          "numeric-id": 47461344,
                          "id": "Q47461344"
                        },
                        "type": "wikibase-entityid"
                      },
                      "datatype": "wikibase-item"
                    }
                  ]
                },
                "snaks-order": [
                  "P248",
                  "P813",
                  "P577",
                  "P854",
                  "P7793",
                  "P7421",
                  "P3865"
                ]
              }
            ]
          }
        ]
      },
      "forms": [
        {
          "id": "L47647-F1",
          "representations": {
            "sv": {
              "language": "sv",
              "value": "ungerska"
            }
          },
          "grammaticalFeatures": [
            "Q110786",
            "Q131105",
            "Q53997857"
          ],
          "claims": []
        },
        {
          "id": "L47647-F2",
          "representations": {
            "sv": {
              "language": "sv",
              "value": "ungerskan"
            }
          },
          "grammaticalFeatures": [
            "Q110786",
            "Q131105",
            "Q53997851"
          ],
          "claims": []
        },
        {
          "id": "L47647-F3",
          "representations": {
            "sv": {
              "language": "sv",
              "value": "ungerskas"
            }
          },
          "grammaticalFeatures": [
            "Q110786",
            "Q146233",
            "Q53997857"
          ],
          "claims": []
        },
        {
          "id": "L47647-F4",
          "representations": {
            "sv": {
              "language": "sv",
              "value": "ungerskans"
            }
          },
          "grammaticalFeatures": [
            "Q110786",
            "Q146233",
            "Q53997851"
          ],
          "claims": []
        }
      ],
      "senses": [
        {
          "id": "L47647-S1",
          "glosses": {
            "sv": {
              "language": "sv",
              "value": "ett finskt-ugriskt språk som talas i Ungern"
            }
          },
          "claims": {
            "P5137": [
              {
                "mainsnak": {
                  "snaktype": "value",
                  "property": "P5137",
                  "datavalue": {
                    "value": {
                      "entity-type": "item",
                      "numeric-id": 9067,
                      "id": "Q9067"
                    },
                    "type": "wikibase-entityid"
                  },
                  "datatype": "wikibase-item"
                },
                "type": "statement",
                "id": "L47647-S1$772e8001-46ec-95c1-2215-ab3531b77186",
                "rank": "normal"
              }
            ]
          }
        }
      ]
    }
  }
}

Invalid CSRF token when using bot password

Error while writing to the Wikibase instance
Traceback (most recent call last):
  File "test.py", line 6, in <module>
    wd_item.write(login_instance)
  File "/home/user/WikibaseIntegrator/wikibaseintegrator/wbi_core.py", line 673, in write
    raise MWApiError(json_data)
wikibaseintegrator.wbi_core.MWApiError: {'error': {'code': 'badtoken', 'info': 'Invalid CSRF token.', '*': 'See https://www.wikidata.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at &lt;https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce&gt; for notice of API deprecations and breaking changes.'}, 'servedby': 'mw1377'}

core_props fails when prop is an ItemID (bug)

When using ItemID property as a core_prop in ItemEngine, the function __select_item() fails because of this line, where get_sparql_value() returns an int for the self.value of an ItemID, so replace() fails.

Is there another way to use an ItemID property in a core_props? Or is this a bug? Thanks a lot :)
cc: @annaelleduff

add a reference, following the 'A Minimal Bot for Mass Import'

Hi there,

following the example A Minimal Bot for Mass Import on the project page, I can't add references trought wd_item = wbi_core.ItemEngine(data=data, references=references)

The import works but without the references : TypeError: __init__() got an unexpected keyword argument 'references'

Unclear purpose of fast run-mode

In the readme the purpose of this mode is not explained. Why does it exist? Does it run significantly faster than the normal mode?

Add language config option

In multiple functions we have a default language set to 'en'.

It will be better if we have a wbi_config option to set the default language, with a default at 'en'.

Default unit is a wikidata entity

The default unit ('1') in Q199 when using wikibase or wikidata SPARQL service.

fast_run don't handle this case and show a difference (Q199 != 1)

Example of adding a claim to an item with a reference

Hi, I would like to add a claim to a lexeme with a reference. I saw no example of how to do that with this library in the readme. Could you add a working example of that?
I will try playing with it and see if I can get it to work, but I don't really know if the reference should be put in the value for example or where to put it.

Bug: class quantity does not throw error when quantity param is not set

then wbi passes on bad data and I get a mwapi error instead
Traceback (most recent call last):
File "/home/egil/src/python/lexuse/swedish.py", line 57, in
main()
File "/home/egil/src/python/lexuse/swedish.py", line 53, in main
util.process_lexeme_data(results)
File "/home/egil/src/python/lexuse/util.py", line 805, in process_lexeme_data
continue
File "/home/egil/src/python/lexuse/util.py", line 712, in process_result
print("Presenting sentence " +
File "/home/egil/src/python/lexuse/util.py", line 603, in present_sentence
sense_id = selected_sense["sense_id"]
File "/home/egil/src/python/lexuse/util.py", line 389, in add_usage_example
print("Logging in with Wikibase Integrator")
File "/usr/lib/python3.9/site-packages/wikibaseintegrator/wbi_core.py", line 1008, in write
raise MWApiError(json_data)
wikibaseintegrator.wbi_core.MWApiError: {'error': {'code': 'modification-failed', 'info': 'Bad value type quantity, expected string', 'messages': [{'name': 'wikibase-validator-bad-value-type', 'parameters': ['quantity', 'string'], 'html': {'': 'Bad value type quantity, expected string'}}], '': 'See https://www.wikidata.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at <https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce> for notice of API deprecations and breaking changes.'}, 'servedby': 'mw1289'}

Add clear()

Add clear() function to use the clear parameter from wbeditentity

Recommend the user to set a custom user-agent

According to this WMF policy https://meta.wikimedia.org/wiki/User-Agent_policy

If you run a bot, please send a User-Agent header identifying the bot with an identifier that isn't going to be confused with many other bots, and supplying some way of contacting you (e.g. a userpage on the local wiki, a userpage on a related wiki using interwiki linking syntax, a URI for a relevant external website, or an email address)

The generic format is <client name>/<version> (<contact information>) <library/framework name>/<version> [<library name>/<version> ...]. Parts that are not applicable can be omitted.

I think WBI should mandate a custom user-agent when working towards WMF-endpoints.

Update description?

A Python module integrating the MediaWiki API and the Wikibase SPARQL endpoint
=> mention wikibase api?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.