The author-disambiguator from arthurpsmith

When using the filter, it would be good to have an option to apply it as well to the potential authors

e.g. when using
https://tools.wmflabs.org/author-disambiguator/?doit=Look+for+author&name=Li%20Li&filter=wdt%3AP921+wd%3AQ2092064
to look for papers by "Li Li" about hemophilia A, it would be useful to surface/ highlight in the "potential author items" section all those that have already published stuff on the same topic.

Link external identifiers

e.g. when a DOI, PMID, PMCID, arXiv ID etc. are listed for an entry, they should be linked to facilitate checking (e.g. affiliations).

Suggestion: handle multiple authors simultaneously

Suggestion from Tony Catapano at WikiCite meeting: allow entry of multiple authors for match, when they are typically co-authors:

find all papers they are both (all) on
generate QuickStatements to replace both (all) author string records with the author items

Author lists should be ordered

All displayed author lists should be ordered by author number. If there are gaps in the author numbers that should somehow be indicated (perhaps with [xx .. xx] ?

Improve "fuzzy" search

From User:Jura on Wikidata - BTW, if one types a full name (first middle last name), fuzzy search seems to find people without the middle name, but not those where the middle name is limited to its initial. Maybe these should also be found when starting from first+last name.

Article lists should be ordered by date

The default ordering should be based on publication date (oldest first?)

Build a variant for topic disambiguation

The general concept of the tool to

help reconcile strings with items
convert a successful reconciliation into QuickStatements edits

is in principle applicable for disambiguating things other than authors as well, e.g. organizations or titles of works, venues or events.

Doing this for titles would be especially interesting, as that would help with topic tagging.

For some further inspiration, see https://www.wikidata.org/w/index.php?title=User:Daniel_Mietchen/Wikidata_lists/Long_words_in_work_titles&oldid=837291802 .

Look into how SourceMD does it

https://tools.wmflabs.org/sourcemd/new_resolve_authors.php?doit=Look+for+author&name=Jane+Brown

Handle cases where multiple authors have been linked with same name

Example: https://www.wikidata.org/wiki/Q56917927 - Wang Jun/Jun Wang

Where we have an ORCID and a DOI, check ORCID db?

There should probably be something automatic to pull in public ORCID data on authorship... anyway this needs to be looked into somehow.

Under "Potential author items", display ORCID/ other identifiers from Wikidata item if available

Potentially saves lots of clicks and confusion in "Li Li" style scenarios.

Radio-button for selection of author sometimes not working?

If there's text in the Q item field the form seems to select that even if the radio button for that field is not checked?!

Add link from the tool to the code repo here

So that people can find the repo more easily to provide comment or other contributions.

A. Brett Stephenson not found!

Probably the first name as an initial breaks something!

Make use of "stated as" (P1932) statements to find potential matches

So for something like
https://tools.wmflabs.org/author-disambiguator/?fuzzy=0&name=Collins%20WE ,
list all the identified authors for which their name has been stated as "Collins WE" at least once.

Handle case when author item already linked to article

When there's an author string and the author item associated with it is already linked to the article, the data associated with the author string statement (stated as, references, etc.) should be moved to the P50 statement, and the author string statement should be deleted.

Next to the (currently red) topic links, add link to Scholia's /missing page for the topic.

e.g. the "hemophilia A" link in https://tools.wmflabs.org/author-disambiguator/?doit=Look+for+author&name=Li%20Li&filter=wdt%3AP921+wd%3AQ2092064 (as per #31 ) would link to https://tools.wmflabs.org/scholia/topic/Q2092064/missing , which links back to the disambiguator.

Making such roundtripping simple (for multiple Scholia/ Author Disambiguator combinations) would probably be a prerequisite for getting larger numbers of people to use these disambiguation workflows.

Georgia O'Keeffe (and anything with a ') breaks SPARQL query

Looks like some filtering needed...

Li Li causing a SPARQL failure now??

Just using "Li Li" gives now:
Warning: assert(): SPARQL query failed: SELECT DISTINCT ?q { VALUES ?name { "Li Li"@en "Li Li"@de "Li Li"@fr "Li Li"@es "Li Li"@nl } . ?q (rdfs:label|skos:altLabel) ?name ; wdt:P31/wdt:P279* wd:Q16334295 . } #TOOL: legacy code failed in /Library/WebServer/Documents/disambiguator/magnustools/ToolforgeCommon.php on line 427

??

In author view, provide option not to display co-authors

.. or only the three around the target author.

That could help address the issues with many-authored papers.

Link journals to some useful page

Right now, the journal is just given as a string, but it is another unit of curation, so it should be linked somewhere useful.
This could be simply the corresponding /missing page in Scholia (example) or something else.

Things look odd if one potential author is a redirect

Things seem to display correctly, but the redirected one looks like a duplicate of what it redirects to. Probably should be some display indication of the issue.

Work with other author identifiers

From User:Jura on Wikidata -
Maybe the tool could also check if VIAF is present (and suggested its addition). If you just check for a single one, that might be the most useful one. There are obviously a few other (non-library ones) likely the be found on such author items (notably Scopus, Researchgate, even Linkedout).

Middle name strings are automatically shortened

Currently,
https://tools.wmflabs.org/author-disambiguator/?doit=Look+for+author&name=International%20Commission%20On%20Zoological%20Nomenclature
can not be used to generate QuickStatements to link the papers to Q1071346,
and the suggested string for the ORCID part creates shortened "middle names" C O Z as well,
presumably due to 7690d3d .

Articles with matching "author" statements should be included in clustering

Right now articles with no matching "author string" statements are not listed or included in the clustering. If an article would be in a cluster with other articles that have already been processed, that would be a strong indicator that the author should be mapped to the Q value in the already-processed articles. Articles with "author string" or "author" statements should both be clustered consistently.

replace wikidatatools in article fetch to improve memory performance

For papers with large numbers of authors (like Q21481859) wikidatatools sometimes(?) doesn't even seem to fetch the data with the WikidataItemList load function. In any case it appears to be using much more memory than necessary; a rewrite of some sort to improve this is needed.

Add an option to (pre) filter by way of a SPARQL query

Mostly useful for those cases which give lots of results when unfiltered (see e.g. #9 ), but can also be useful for smaller sets.

All qualifiers should be copied from author name string to author entry

There may be additional qualifiers (for example for affiliation) besides "series ordinal" that should be copied from the author name string entry to the author - we should just copy all of them, and add in the "stated as" qualifier from the string value.

Bug: Li Li uses too much memory

Error when trying this author name:
Fatal error: Allowed memory size of 1572864000 bytes exhausted (tried to allocate 64 bytes) in /data/project/author-disambiguator/public_html/magnustools/wikidata.php on line 331

Any way to fix this???

Under "Potential author items", link to Reasonator instead of Wikidata item directly

That way, it is simpler to get an overview of the things that are useful for disambiguation.

Link individual papers in Misc section

Since the "Misc" grouping is just whatever didn't fit into a cluster, we don't expect it to match a single author; it would be better to show for each individual paper which of the possible authors it matches in this seciton.

"Potential author items" section is missing for author with many many-author papers

example: https://tools.wmflabs.org/author-disambiguator/?doit=Look+for+author&name=Stephen%20J%20Chanock

Journal displays twice sometimes

Example -
https://www.wikidata.org/wiki/Q21558717
shows Physical Review Letters twice ??

Remove old "author string" statements

After adding the new author statements, the old author string statements should be removed to avoid duplication of the same information on article records

Potential author list is limited to 10 items!

The api query used limits the "Potential Authors" list to only 10 - and doesn't warn when that limit is hit! We should increase to at least 50 and add a warning if more may be found.

Further trouble with many-author papers, even before conversion

lots of errors in
https://tools.wmflabs.org/author-disambiguator/?doit=Look+for+author&name=Sandra%20S.%20Padula&filter=wdt%3AP2093+%22Sandra%20S.%20Padula%22&limit=100

Work with author pages in Wikisource(s)

From User:billinghurst - any person who has an author page at one of the Wikisources should be considered as worthy of being a hit on the tool's search results. Numerous of those people writing at WSes will not be traditional "authors" though will be writers in the sense of explorers, military officers, politicians, scientists, journalists, etc. Also, without exactly knowing the scope of your tool, I would like to flag a page like s:Littell's Living Age/Volume 135 as an example of a ToC for a journal, of which there are a large range of other samples that may be of interest, number of these will have red links, and many will have solutions for red links as we have done a lot of work in identifying these writers over time.

One-click submission to Quickstatements

The results page should include the QS commands in a form text box that can be submitted to QuickStatements directly by clicking a button, rather than requiring cut and paste.

Separate form to create new author item from form to update articles

The way it works right now is a little confusing, these should probably be two separate forms.

Preserve references in added P50 statements

When author statements are added on an article based on the author string statements, they should preserve all the qualifiers and references from the original "author string" entry (and add "stated as" with the string value as well).

Compress author lists when thousands of authors

It's not very useful to have a list of thousands of author names. If the author count is above a certain number (20?) we should do the following:

Display the first 10 authors
Display 2 authors before and 2 authors after a matched author name string
replace remaining author entries with ellipses "..."

Improve clustering algorithm

Publication date, journal, affiliation if available, main subject if available, should be accounted for in the clustering analysis (right now it just compares author lists)

Conversion sometimes works only partly

e.g. https://tools.wmflabs.org/author-disambiguator/?name=Ph.+Schwemling&doit=Look+for+author&limit=50&filter=wdt%3AP2093+%22Ph.+Schwemling%22
resulted in just one item being edited, despite 50 having been marked.

Problems with accented characters?

Not sure what's going on, but the string matching is not working for a search on 'François M. Peeters' - the first few articles don't highlight the matching name in the author list, despite being retrieved in the search!

Allow more precise clustering based on neighboring author names

Current clustering works well in some circumstances, but it does not help with some cases with common name strings. More precise partitioning based on the exact preceding and following author name strings may help.

precise clustering page doesn't use stated as values for already-resolved authors

Allow selecting one author name when several match the author string search

Example - S. Bhattacharya - appears twice in many high energy physics collaboration papers, one from Brown University and one from the Saha Institute in Kolkata.

On author string pages, include lists for authors cited to and from

That list can easily grow very long, so providing a useful way to filter (e.g. by selecting a substring) would be useful.

The idea here is that people often cite papers of their own, so for something like
https://tools.wmflabs.org/author-disambiguator/?fuzzy=0&name=Collins%20WE ,
citations to or from papers with that string would be expected to bring up some people named Collins as identified authors, which could then serve as another starting point for identifying authors.

limit should be preserved when clicking coauthor search links

If you click one of the "common names" at the bottom it defaults back to 500 article limit - probably not a good idea for names on multi-thousand-author papers

For authors with lots of papers to reconcile, the conversion to QuickStatements does not work

E.g. for https://tools.wmflabs.org/author-disambiguator/?doit=Look+for+author&name=PLOS%20ONE%20Staff .

In similar cases before, I could work around by just including one of the groups suggested by the tool, but here, it suggests only one group (which makes sense), and while I suppose I could click away manually a certain number of those tick boxes, I don't know what number that would need to be, and I do not want to try to find out manually.

One solution is likely to increase memory (just like in #9), another would be pre-filtering, as per #17, and yet another would be to limit batches to a certain number of publications at a time, as suggested in option 1 in this comment in #9.

arthurpsmith / author-disambiguator Goto Github PK

author-disambiguator's People

Contributors

Stargazers

Watchers

Forkers

author-disambiguator's Issues

Recommend Projects

Recommend Topics

Recommend Org