nichtich / wikidata-taxonomy Goto Github PK

View Code? Open in Web Editor NEW

122.0 9.0 10.0 299 KB

command-line tool to extract taxonomies from Wikidata

Home Page: https://www.npmjs.org/package/wikidata-taxonomy

License: MIT License

JavaScript 95.13% Makefile 0.48% Python 4.38%

wikidata cli coli-conc jskos library

wikidata-taxonomy's Introduction

Wikidata-Taxonomy

Command-line tool and library to extract taxonomies from Wikidata.

Installation

wikidata-taxonomy requires at least NodeJs version 6.

Install globally to make command wdtaxonomy accessible from your shell $PATH:

$ npm install -g wikidata-taxonomy

Installation and usage as module and in web applications is described below.

Usage

This module provides the command wdtaxonomy. By default, a usage help is printed:

$ wdtaxonomy


  Usage: wdtaxonomy [options] <id>

  extract taxonomies from Wikidata


  Options:

    -V, --version                           output the version number
    -b, --brief                             omit counting instances and sites
    -c, --children                          get direct subclasses only
    -C, --color                             enforce color output
    -d, --descr                             include item descriptions
    -e, --sparql-endpoint <url>             customize the SPARQL endpoint
    -f, --format <txt|csv|tsv|json|ndjson>  output format
    -i, --instances                         include instances
    -I, --no-instancecount                  omit counting instances
    -j, --json                              use JSON output format
    -l, --lang <lang>                       specify the language to use
    -L, --no-labels                         omit all labels
    -m, --mappings <ids>                    mapping properties (e.g. P1709)
    -n, --no-colors                         disable color output
    -o, --output <file>                     write result to a file
    -P, --property <id>                     hierarchy property (e.g. P279)
    -R, --prune <criteria>                  prune hierarchies (e.g. mappings)
    -p, --post                              use HTTP POST to disable caching
    -r, --reverse                           get superclasses instead
    -s, --sparql                            print SPARQL query and exit
    -S, --no-sitecount                      omit counting sites
    -t, --total                             count total number of instances
    -u, --user <name>                       user to the SPARQL endpoint
    -U, --uris                              show full URIs in output formats
    -v, --verbose                           make the output more verbose
    -w, --password <string>                 password to the SPARQL endpoint
    -h, --help                              output usage information

The first arguments needs to be a Wikidata identifier to be used as root of the taxonomy. For instance extract a taxonomy of planets (Q634):

$ wdtaxonomy Q634

To look up by label, use wikidata-cli (e.g wd id planet or wd f planet).

The extracted taxonomy by default is based on statements using the property "subclass of" (P279) or "subproperty of" (P1647). Taxonomy extraction and output can be controlled by several options. Option --sparql (or -s) prints the underlying SPARQL queries instead of executing them.

Examples

Direct subclasses of planet (Q634) with description and mappings:

$ wdtaxonomy Q634 -c -d -m =

The hierarchy properties P279 ("subclass of") and P31 ("instance of") to build taxonomies from can be changed with option property (-P).

Members of (P463) the European Union (Q458):

$ wdtaxonomy Q458 -P P463

Members of (P463) the European Union (Q458) and number of its citizens in Wikidata (P27):

$ wdtaxonomy Q458 -P 463/27

Wikiversity (Q370) editions mapped to their homepage URL (P856):

$ wdtaxonomy Q370 -i -m P856

Biological taxonomy of mammals (Q7377):

$ wdtaxonomy Q7377 -P P171 --brief

Property constraints (Q21502402) with number of properties that have each constraint:

$ wdtaxonomy Q21502402 -P 279,2302

As Wikidata is no strict ontology, subproperties are not factored in. For instance this query does not include members of the European Union although P463 is a subproperty of P361.

Parts of (P361) the European Union (Q458):

$ wdtaxonomy Q458 -P P361

A taxonomy of subproperties can be queried like taxonomies of items. The hierarchy property is set to P1647 ("subproperty of") by default:

$ wdtaxonomy P361
$ wdtaxonomy P361 -P P1647  # equivalent

Subproperties of "part of" (P361) and which of them have an inverse property (P1696):

$ wdtaxonomy P361 -P P1647/P1696

Inverse properties are neither factored in so queries like these do not necessarily return the same results:

What hand (Q33767) is part of (P361):

$ wdtaxonomy Q33767 -P 361 -r

What parts the hand (Q33767) has (P527):

$ wdtaxonomy Q33767 -P 527

Options

Query options

brief (`-b`)

Don't count instance and sites. Same as -S/--no-sitecount and -I/--no-instancecount.

children (`-c`)

Get direct subclasses only

descr (`-d`)

Include item descriptions

sparql-endpoint (`-e`)

SPARQL endpoint to query (default: https://query.wikidata.org/sparql)

instances (`-i`)

Include instances

no-instancecount (`-I`)

Don't count number of instances

lang (`-l`)

Language to get labels in (default: en)

no-labels (`-L`)

Omit all labels. This allows for querying larger taxonomies (several thousands of classes), especially if combined with option --brief.

mappings (`-m`)

Lookup mappings based on given comma-separated properties such as P1709 (equivalent class). The following keywords can be used as shortcuts:

equal or =: equivalent property (P1628), equivalent class (P1709), and exact match (P2888)
broader: external superproperty (P2235)
narrower: narrower external class (P3950), external subproperty (P2236)
class: properties for mapping classes
property: properties for mapping properties
all all properties for ontology mapping (instances of Q30249126)

reverse (`-r`)

Get superclasses instead of subclasses up to the root

no-sitecount (`-I`)

Don't count number of sites

total (`-t`)

Count total (transitive) number of instances, including instances of subclasses

post (`-p`)

Use HTTP POST to disable caching

sparql (`-s`)

Don't actually perform a query but print SPARQL query and exit

user (`-u`)

User to the SPARQL endpoint

password (`-w`)

Password to the SPARQL endpoint

Output options

color (`-C`)

enable color output if it's disabled (e.g. when output is piped or written to a file)

format (`-f`)

Output format

json (`-j`)

Use JSON output format. Same as --format json but shorter.

no-colors (`-n`)

disable color output

output (`-o`)

write result to a file given by name

prune (`-R`)

prune hierarchy to all entries with any of a given criteria plus their broader concepts and all top concepts:

mappings: has mappings
sites: has sites
instances: has instances
occurences : has sites or instances

Multiple criteria can be combined as alternatives with comma.

uris (`-U`)

Show full URIs in output formats, e.g. http://www.wikidata.org/entity/Q1 instead of Q1

verbose (`-v`)

Show verbose error messages

Output formats

Text format

By default, the taxonomy is printed in "text" format with colored Unicode characters:

$ wdtaxonomy Q17362350

planet of the Solar System (Q17362350) •2 ↑
├──outer planet (Q30014) •25 ×4 ↑↑
└──inner planets (Q3504248) •8 ×4 ↑↑

The output contains item labels, Wikidata identifiers, the number of Wikimedia sites connected to each item (indicated by bullet character "•"), the number of instances (property P31), indicated by a multiplication sign "×"), and an upwards arrow ("↑") as indicator for additional superclasses.

Option "--instances" (or "-i") explicitly includes instances:

$ wdtaxonomy -i Q17362350

planet of the Solar System (Q17362350) •2 ↑
├──outer planet (Q30014) •25 ↑↑
|   -Saturn (Q193)
|   -Jupiter (Q319)
|   -Uranus (Q324)
|   -Neptune (Q332)
└──inner planets (Q3504248) •8 ↑↑
    -Earth (Q2)
    -Mars (Q111)
    -Mercury (Q308)
    -Venus (Q313)

Classes that occur at multiple places in the taxonomy (multihierarchy) are marked like in the following example:

$ wdtaxonomy Q634

planet (Q634) •202 ×7 ↑
├──extrasolar planet (Q44559) •88 ×833 ↑
|  ├──circumbinary planet (Q205901) •15 ×10
|  ├──super-Earth (Q327757) •32 ×46 ↑
...
├──terrestrial planet (Q128207) •70 ×7
|  ╞══super-Earth (Q327757) •32 ×46 ↑ …
...

JSON format

Option --format json serializes the taxonomy as JSON object. The format follows specification of JSKOS Concept Schemes:

{
  "type": [ "http://www.w3.org/2004/02/skos/core#ConceptScheme" ],
  "modified": "2017-11-06T10:25:54.966Z",
  "license": [
    {
      "uri": "http://creativecommons.org/publicdomain/zero/1.0/",
      "notation": [ "CC0" ]
    }
  ],
  "languages": [ "en" ],
  "topConcepts": [
    { "uri": "http://www.wikidata.org/entity/Q17362350" }
  ],
  "concepts": [ ]
}

Field concepts contains an array of all extracted Wikidata entities (usually classes and instances) as JSKOS Concepts:

{
  "uri": "http://www.wikidata.org/entity/Q17362350",
  "notation": [ "Q17362350" ],
  "prefLabel": {
    "en": "planet of the Solar System"
  },
  "scopeNote": {
    "en": [ "inner and outer planets of our solar system" ]
  },
  "broader": [
    { "uri": "http://www.wikidata.org/entity/Q634" }
  ],
  "narrower": [
    { "uri": "http://www.wikidata.org/entity/Q30014" },
    { "uri": "http://www.wikidata.org/entity/Q3504248" }
  ]
}

Instances (option --instances) are linked via field subjectOf the same way as field broader and narrower.

The number of instances and sites, if counted is given as array of JSKOS Concept Occurrences in field occurrences, each identified by subfield relation:

{
  "uri": "http://www.wikidata.org/entity/Q30014",
  "notation": [ "Q30014" ],
  "prefLabel": {
    "en": "outer planet of the Solar system"
  },
  "occurrences": [
    {
      "relation": "http://www.wikidata.org/entity/P31",
      "count": 4
    },
    {
      "relation": "http://schema.org/about",
      "count": 25
    }
  ]
}

Mappings (option --mappings) are stored in field mappings as array of JSKOS Concept Mappings:

[
  {
    "from": {
      "memberSet": [
        { "uri": "http://www.wikidata.org/entity/Q634" }
      ]
    },
    "to": {
      "memberSet": [
        { "uri": "http://dbpedia.org/ontology/Planet" }
      ]
    },
    "type": [
      "http://www.w3.org/2004/02/skos/core#exactMatch",
      "http://www.w3.org/2002/07/owl#equivalentClass",
      "http://www.wikidata.org/entity/P1709"
    ]
  }
]

The mapping type is given in field type with the Wikidata property URI as last array element and the SKOS mapping relation URI as first.

NDJSON format

Option --format ndjson serializes JSON field concepts with one record per line. The order if records is same as in txt, json, and csv format but each concept is only included once.

CSV and TSV format

CSV and TSV format are optimized for comparing differences in time. Each output row consists of five fields:

level in the hierarchy indicated by zero or more "-" (default) or "=" characters (multihierarchy).
id of the item. Items on the same level are sorted by their id.
label of the item. Language can be selected with option --language. The label in csv format is quoted.
sites: number of connected sites (Wikipedia and related project editions). Larger numbers may indicate more established concepts.
parents outside of the hierarchy, indicated by zero or more "^" characters.

For instance the CSV output for Q634 would be like this:

$ wdtaxonomy -f csv Q634

level,id,label,sites,instances,parents
,Q634,"planet",196,7,^
-,Q44559,"extrasolar planet",81,833,^
--,Q205901,"circumbinary planet",14,10,
--,Q327757,"super-Earth",32,46,
...
-,Q128207,"terrestrial planet",67,7,
==,Q327757,"super-Earth",32,46,
...

In this example there are 196 Wikipedia editions or other sites with an article about planets and seven Wikidata items are direct instance of a planet. At the end of the line "^" indicates that "planet" has one superclass. In the next rows "extrasolar planet" (Q44559) is a subclass of planet with another superclass indicated by "^". Both "circumbinary planet" and "super-Earth" are subclasses of "extrasolar planet". The latter also occurs as subclass of "terrestrial planet" where it is marked by "==" instead of "--".

Usage as module

Add wikidata-taxonomy as dependency to you package.json:

$ npm install wikidata-taxonomy --save

The library provides:

queryTaxonomy(id, options) returns a promise with a taxonomy extracted from Wikidata as JSKOS Concept Scheme. See JSON format of the command line client for documentation.
```
const { queryTaxonomy } = require('wikidata-taxonomy')

var options = { lang: 'fr', brief: true }
queryTaxonomy('Q634', lang)
.then(taxonomy => {
  taxonomy.concepts.forEach(concept => {
    var qid = concept.notation[0]
    var label = (concept.prefLabel || {}).fr || '???'
    console.log('%s %s', qid, label)
  })
})
.catch(error => console.error("E",error))
```
Options roughly equivalent command line query options:
- boolean flags brief, children, description, labels, total, instances, instancecount, sitecount, reverse, post
- SPARQL endpoint configuration with endpoint, user, password
- language tag language or lang
- array property (set to ['P279', 'P31'] by default)
- array or string mappings

serializeTaxonomy contains serializers to be called with a taxonomy, an output stream, and optional configuration:

const { serializeTaxonomy } = require('wikidata-taxonomy')

// serialize taxonomy to stream
serializeTaxonomy.csv(taxonomy, process.stdout)
serializeTaxonomy.txt(taxonomy, process.stdout, {colors: true}) // FIXME
serializeTaxonomy.json(taxonomy, process.stdout)
serializeTaxonomy.ndjson(taxonomy, process.stdout)

Usage in web applications

Experimental support of this library in web application is given with file wikidata-taxonomy.js in directoy dist. The gh-pages branch contains a sample application, also available at http://jakobvoss.de/wikidata-taxonomy/.

Requires wikidata-sdk and a HTTP client library. The latter can be attached to window.requestPromise (before wikidata-taxonomy is loaded). Axios is detected by default.

<html>
  <head>
    <script src="https://unpkg.com/wikidata-sdk/dist/wikidata-sdk.min.js"></script>
    <script src="https://unpkg.com/axios/dist/axios.min.js"></script>
    <script src="https://unpkg.com/wikidata-taxonomy/dist/wikidata-taxonomy.min.js"></script>
  </head>
  <body>
    ...
  </body>
</html>

Release notes

Release notes are listed in file CHANGES.md in the source code repository.

wikidata-taxonomy's People

Contributors

Stargazers

Watchers

Forkers

zuphilip concured kundiis aymansalama asanchez75 waldenn nirvananimbusa 4tikhonov smartniz bootsa

wikidata-taxonomy's Issues

Sane error message if item not found

$ wdtaxonomy Q3399
Cannot read property 'endpoint' of undefined

should give a meaningful error message instead

json output format is no valid JSKOS

because concepts must be an array instead of an object.

Tool not working

Just tried the tool after some period of time.
Unfortunately, wikidata-taxonomy fails with an error:

> wdtaxonomy Q634 -v
Error: SPARQL request failed
    at XXXXX\npm\node_modules\wikidata-taxonomy\lib\query.js:32:13
    at processTicksAndRejections (node:internal/process/task_queues:96:5)

Wrong language codes if querying multiple languages

As commented in #39, multiple languages can be queried e.g.

 wdtaxonomy -l en,de Q2516517

This should better be documented but there is also a bug in the language codes of JSON output e.g.

 wdtaxonomy -l en,de Q2516517 -j

...
      "prefLabel": {
        "en,de": "Verkehrsdidaktik"
      }
...

"SPARQL request failed" when running "wdtaxonomy Q35120"

I want to get the whole taxonomy of wikidata.
Is there any other method I can use to achieve that goal?

--sparql-endpoint / -e for http://localhost:8989/bigdata/sparql (custom Wikibase instance) returns data from Wikidata

Running Wikibase locally, I can generate results via curl:

curl http://localhost:8989/bigdata/sparql?SELECT%20DISTINCT%3Fp%20WHERE%20%7B%20%3Fs%20%3Fp%20%3Fo%20%7D

But trying to query the same endpoint with wikidata-taxonomy returns data from Wikidata instead:

node wdtaxonomy.js Q3 --sparql-endpoint http://localhost:8989/bigdata/sparql 
life (Q3) •188 ↑↑↑
├──extraterrestrial life (Q181508) •81 ×1 ↑
│  ├──life on Mars (Q601319) •34 ×1
│  ├──Martian (Q913850) •25 ×4
│  ├──Life on Titan (Q2591050) •15
│  └──extraterrestrial intelligence (Q15107669) •7
├──personal life (Q2867027) •20
└──human life (Q19771042) •3

I get the same result if I install wikidata-taxonomy globally with npm install -g

It's late night so I'll toss a theory: does it implicitly depend on properties such as P279 existing in the target endpoint, and it falls back to Wikidata if the query to the specified endpoint doesn't return the expected data?

Bug in txt format: duplicated classes

e.g. try wdtaxonomy Q2623243 -m class. Maybe like a missing DISTINCT clause?

Add has-parts-of-class and topic's category

http://www.wikidata.org/wiki/Property:P2670 and http://www.wikidata.org/wiki/Property:P910

SPARQL request failed

previously wdtaxonomy worked perfectly
wdtaxonomy -V
0.6.6

recently I upgraded to Node.js v19.6.0
now when I run, for example:
wdtaxonomy -c Q35120

I see:
SPARQL request failed

Have I made an error (forgetting something since the last time I successfully used wdtaxonomy)?
Is some dependency causing this error?
Is there a work-around?
Do you need more information from me to debug this issue?

Thanks for your help here.
/jay

refactor as node module to be used in other projects

The command line script should be a wrapper to a module

results shown as ??? (three questions marks)

Sometimes the output is shown as ??? (three question marks), the given wikidata id when looked up in wikidata website does not exist, but instances and other information exist.

Is this a bug?

For example,

wdtaxonomy Q2516517

Returns

transport sciences (Q2516517) •2 ↑
├──intelligent transportation system (Q508378) •23
├──transport economics (Q660564) •9 ↑
├──transport engineering (Q775325) •22 ↑
│  └──traffic engineering (Q1640676) •13
│     └──Technology of rail vehicles (Q2234610) •2 ↑
├──transportation geography (Q795612) •19 ↑
├──transport planning (Q1034047) •16 ×2 ↑
├──??? (Q1230796) •1 ↑
├──??? (Q1308085) •3 ↑↑
├──traffic psychology (Q1362446) •12 ↑
├──transport law (Q1996243) •8 ↑
├──effects of the automobile on societies (Q2215004) •3
├──??? (Q2516123) •2 ↑
├──traffic education (Q2516186) •2 ↑
├──Timeline of transportation technology (Q2516265) •5 ↑
├──??? (Q2516343) •1 ↑
├──??? (Q2516344) •1 ↑
├──??? (Q2516371) •1 ↑
├──??? (Q2516390) •1 ↑
├──??? (Q2516430) •1 ↑
├──transport ecology (Q2516529) •1 ↑
├──??? (Q20820139) •1
└──??? (Q20850681) •1 ↑

manpage

Should be generated from README.md with option --man. Also required for Debian packaging?

Add command to make an item an instance/subclass

wdtaxonomy --isa Qparent Qchild
wdtaxonomy --broader Qparent Qchild

Both should check whether a instance-of/subclass-of relation exists to skip or modify (the latter requires maxlath/wikibase-edit#2). An item should never be both instance and subclass of the same other item.

show instances and items neither class nor instance

If an item is no class but an instance, show with the class it belongs to.

Limit taxonomy to a given number of levels

Difficult to do in SPARQL, maybe repeated queries, level by level?

Use parent as root

For instance

wdtaxonomy Q522190^

would result in

wdtaxonomy Q863247

because Q863247 is the only parent of Q522190.

However #8 may make this feature not necessary.

Include usage count in property taxonomies

Seems to be a costly operation, unless there is a special Blazegraph service to get property usage count.

colour coding of types

Very handy tool :)

I just tried this and found that the colour coding for types is pretty dark being dark blue:

It might be my settings, etc, but why not make this colour somewhat brighter? Perhaps white? The green of the Q-numbers is much more readable.

Add JSKOS as output format

Get superclasses instead of subclasses

With option --reverse

Support extraction of property taxonomies

e.g. wdtaxonomy P2561 should use property P1647 (subproperty of) to extract a taxonomy.

CSV quoting escaping is incorrect

Just tried latest release that has quoting and found that it escapes a " using \". According to the RFC this is wrong, it should use "".

Reference tree view at Wikidata query service

Tree view is similar, have a look at it and compare

Show classes of an instance with option --reverse

Before/instead of checking whether a non-class exists, classes should be queried which the item is instance of.

Add option -q to include qualifiers

Add option to include the URL

This is a commandline tool but with iTerm its possible to Cmd-click on a URL and it open in a browser. This would be very handy when scanning a taxonomy list and needing to open a few items in the list quickly.

Is there any current means to achieve something like this?

Create web application

Similar to (and maybe based on) https://github.com/AngryLoki/wikidata-graph-builder

likely requires #6

Add option to exclude all labels

This should speed up and avoid timeouts for very large hierarchies. Output in tree format could be like this:

http://www.wikidata.org/entity/Q634 •202 ×5 ↑
├──http://www.wikidata.org/entity/Q44559 •88 ×2961 ↑

in JSON format there would be no prefLabel and notation.

One-letter lowercase options available: -a, -g, -j, -k,-x, -y, -z.

Maybe -L, --no-labels?

By the way, the full class hierarchy contains more then 2 million statements so getting all would probably still not be possible.

Calculate transitive instance/site counts (for option -c)

Implemented in 0.3.1 (option --total), better documentation needed.

Sparql request failed.

Simple message after this commandline:

 node wdtaxonomy.js Q35120 --format json

When I ran it with --sparql, I got this query:

SELECT ?item ?itemLabel ?broader ?parents ?instances ?sites
WHERE {
    {
        SELECT ?item (count(distinct ?parent) as ?parents) {
            ?item wdt:P279* wd:Q35120
            OPTIONAL { ?item wdt:P279 ?parent }
        } GROUP BY ?item
    }
    {
        SELECT ?item (count(distinct ?element) as ?instances) {
            ?item wdt:P279* wd:Q35120
            OPTIONAL { ?element wdt:P31 ?item }
        } GROUP BY ?item
    }
    {
        SELECT ?item (count(distinct ?site) as ?sites) {
            ?item wdt:P279* wd:Q35120
            OPTIONAL { ?site schema:about ?item }
        } GROUP BY ?item
    }
    OPTIONAL { ?item wdt:P279 ?broader }
    SERVICE wikibase:label {
        bd:serviceParam wikibase:language "en" .
    }

and then it stopped. Not sure what I am doing wrong. The chosen root is "entity" which appears to be at least one important root (perhaps the root?) at Wikidata

feature request

The reverse switch (-r) takes the query identifier and creates a reverse tree. In other words, the query is on line 1, the superclass is on line 2, etc.
Would it be possible to provide an extra switch that would change the order of the 'reverse' switch? In other words, line 1 would be the root term (e.g. entity) and the tree would be constructed from the root to the query term.
What is the reasoning for this?
In order to see a large class structure, one might run a children (subclass) query from, for example, level 5 in the class tree. That query produces the target result where the 'top' term is the target and the children are listed below.
Now I would like to integrate the subclass results with the superclasses for the search term. But the superclass query result is not in the same form/shape as the subclass query results.
I have to do significant editing to reorder the superclass query to fit with the subclass query.
A switch -rt (reverse top) would allow the simple combination of a superclass query and a subclass query.

Thanks for creating a really useful tool.

Prune taxonomy for ontology alignment (as a grep)

The "grep external ontology" have many applications, see one example here.

The problem of simple grep is with intermediate branches...
Example of wdtaxonomy -m P1709 Q732577 | grep schema.org:

╞══news article (Q5707594) •4 ×15727 ↑ … = http://schema.org/NewsArticle
│  │  ├──atlas (Q162827) •70 ×51 ↑ = http://schema.org/Atlas
├──report (Q10870555) •30 ×7908 = http://schema.org/Report

The real branch for atlas is not news article:

├──educational material (Q6006020) •2 ×7
   ├──reference work (Q13136) •31 ×191 ↑↑
        ├──atlas (Q162827) •70 ×51 ↑ = http://schema.org/Atlas

sparqlRequest bug involving wdk.simplifySparqlResults?

great module!
If you have an example where wdk.simplifySparqlResults crashes as suggested by your comment, I would be happy to have a look at it :)

CSV output should be quoted

I was pulling a CSV output of the product taxonomy tree, which is quite large. It failed to parse as CSV because labels that include a quote character are not quoted themselves, so the first item to fail was:

-----,Q6109076,JTL-E .500 S&W Magnum 12",1,0,

Would it make sense for all labels to be quoted by default?

An OR list in the --mappings option

The "OR list" have many applications, see one example here or this analytic query...

Suggestion: use comma-separated list as P1709,P2888 to recognize an "OR list".

Example:

wdtaxonomy -m P1709 Q33999 |grep schema.org is the default command, but returns empty.
wdtaxonomy -m P1709,P2888 Q33999 |grep schema.org as default command is better, will return something.

"getPrefLabel is not a function"

I get this output running wdtaxonomy -f csv Q634 on node v9.1.0. It works fine on v6.10.0.

additional output format to paste into Wikidata

e.g. {{Q'|Q634}} ...

comment

this is an amazing tool!
thank you for developing.

document wdmappings script

Read options from config file

Get items by label instead of identifier

See wikidata-cli for how to implement

Distinguish mapping types in output

The current (0.5.0) output does not distinguish mapping types because they are all stored as identifier.

implement color output

see tee for an example. The screenshot https://commons.wikimedia.org/wiki/File:Wdtaxonomy-example.png should be updated afterwards

Include related links on request

e.g. see also is used to link properties. The list of related link properties needs to be specified by an additional option (a/g/k/x/y/z?)

Refactor internal data structure to better align with JSKOS

It would be nice to allow serializing arbitrary JSKOS data sets as tree, so factor out serialization modules. This requires to:

add language tag to label and description
use URIs instead of plain Q../P.. ids
decide how to store instances (JSKOS subjectOf / foaf:topic)
decide how to store fields with number of sites and instances

At least the last likely requires extension of JSKOS to handle usage statistics (number of records indexed with some concept in a given database).

getting "SPARQL request failed" for every call

I've been using wdtaxonomy (v 0.6.6) happily for many months on my macbook running 10.14.5. Starting yesterday, every call I make (e.g., "wdtaxonomy -c Q5") produces an immediate "SPARQL request failed" message.

I tried capturing the sparql queries with --sparql and pasting that into the wikidata query service web page, and it they work. I also tried passing the standard query service URL with --sparql-endpoint and that did not help. I tried uninstalling and then installing again, which did not fix the problem.

Might it be due to this: https://lists.wikimedia.org/pipermail/wikidata/2019-June/013161.html ?

Any suggestions?

suggestion to sort items

Big ou medium output need some ordering (order by prefLabel) to better usability. Perhaps an "order-by" option.

Example wdtaxonomy -l pt-BR,pt,es,en -P P31 Q485258 generated a unordered list.

serializeTaxonomy txt error

serializeTaxonomy.txt(taxonomy, process.stdout, { colors: true });

has error

TypeError: Cannot read property 'delimiter' of undefined

I believe the error traces to (wikidata-taxonomy/lib/serialize-txt.js:24:25) where it is looking for env.chalk but none is specified.

Add dot output format

Requires https://www.npmjs.com/package/graphlib-dot. Maybe better factor out to new module jskos-writers?

new command to analyze common properties and statements

See https://lucaswerkmeister.github.io/wikidata-ontology-explorer/, e.g.

SELECT ?property ?propertyLabel ?count WITH {
  SELECT ?property (COUNT(DISTINCT ?statement) AS ?count) WHERE {
    ?item wdt:P279* wd:Q6423319 ;
          ?p ?statement.
    ?property a wikibase:Property;
              wikibase:claim ?p.
    FILTER(?property != wd:P279)
  }
  GROUP BY ?property
  ORDER BY DESC(?count)
  LIMIT 15
} AS %results WHERE {
  INCLUDE %results.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY DESC(?count)

could be applied to either classes or to instances (-i) with typical properties (-a) or typical statements (-A). Output formats: default (colore), csv, json

nichtich / wikidata-taxonomy Goto Github PK

wikidata-taxonomy's Introduction

Wikidata-Taxonomy

Installation

Usage

Examples

Options

Query options

brief (-b)

children (-c)

descr (-d)

sparql-endpoint (-e)

instances (-i)

no-instancecount (-I)

lang (-l)

no-labels (-L)

mappings (-m)

reverse (-r)

no-sitecount (-I)

total (-t)

post (-p)

sparql (-s)

user (-u)

password (-w)