Code Monkey home page Code Monkey logo

recitation-bot's People

Contributors

daniel-mietchen avatar klortho avatar notconfusing avatar wrought avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

recitation-bot's Issues

No more than 10 categories per article in Wikisource header template

Add reuplod option to web interface

To benefit from ongoing bug fixes, it would be useful if the web interface had the option to re-upload (a) text or (b) media files or (c) both.

Do not import tables as both table and figure

In some journals (e.g. at PLOS), tables are being made available as image files in addition to tabular format. If the latter exists, we should always go for it and not embed the former in the Wikisource text.

I am open to the idea of importing the image file nonetheless, as it may sometimes be useful in Wikipedia articles, but my suggested default setting would be to ignore those image files entirely.

Sample case:

https://commons.wikimedia.org/w/index.php?title=File:Tracking-Marsupial-Evolution-Using-Archaic-Genomic-Retroposon-Insertions-pbio.1000436.t001.jpg&oldid=126539852 .

Sync with OAMI

Do not embed not-uploaded supplementary files

Category for uploads

Add file description from <title> and <caption> elements in PMC XML

The current file descriptions (e.g. "Media belonging to article 10.1371/journal.pbio.1000436 which is cited on Wikipedia, and automatically imported." in
https://commons.wikimedia.org/w/index.php?title=File:Tracking-Marsupial-Evolution-Using-Archaic-Genomic-Retroposon-Insertions-pbio.1000436.g002.jpg&oldid=126539861 )
are not very helpful.

The corresponding code should thus be replaced with that in OAMI. Sample upload from PLOS:
https://commons.wikimedia.org/wiki/File:Messages-Do-Diffuse-Faster-than-Messengers-Reconciling-Disparate-Estimates-of-the-Morphogen-Bicoid-pcbi.1003629.s006.ogv .

Detect duplicates

Strive for complete paper titles at Wikisource

There is a maximum length for page titles at MediaWiki - 255 bytes according to https://www.mediawiki.org/wiki/Manual:Page_table#page_title . At the OAMI, we have opted to take the first 100 characters of a paper title before we append portions of the DOI.
This has worked fine so far.

For Wikisource, this is not the best approach, though, and I think we should try to accommodate as much of the article title in the page name. Example:
https://en.wikisource.org/wiki/Wikisource:WikiProject_Open_Access/Programmatic_import_from_PubMed_Central/A_cladistically_based_reinterpretation_of_the_taxonomy_of_two_Afrotropical_tenebrionid_genera_Ectateus_Koch_1956_and_Selinus_Mulsant_%26_Rey_1853_%28 .

Set up Commons template for re-citation bot

Uploads by re-citation bot (example: https://commons.wikimedia.org/wiki/File:A-New-Basal-Hadrosauroid-Dinosaur-%28Dinosauria-Ornithopoda%29-with-Transitional-Features-from-the-Late-pone.0098821.g015.jpg ) should be marked by a template
https://commons.wikimedia.org/w/index.php?title=Template:Recitation-bot
to be modeled after
https://commons.wikimedia.org/wiki/Template:Open_Access_Media_Importer .
The latter template has been used in earlier uploads by VIAF bot and re-citation bot and should be replaced by the new one.

License statement missing

Change attribution template in uploads to Commons by VIAFbot

implement testing

abstractly, need to do the following to implement testing:

  • supply test XML articles along with repo
  • create hook or otherwise toggle bot parameters to use test endpoint(s), etc
    • specifically, point to test wikipedia for everything (should just work, especially for pywikibot)
  • include fixture data (especially for deque)
  • write tests
  • test locally
  • also use https://travis-ci.org/

Force re-upload of images for equations/ tables

Once the upload of images for equations/ tables to Wikisource works, we will need another checkbox in the web form, with the option to force re-upload of these, perhaps separately for tables and figures.

Equation uploads to Wikisource should go into their own category

Files like the one at
https://en.wikisource.org/w/index.php?title=File:Neurobiological-Models-of-Two-Choice-Decision-Making-Can-Be-Reduced-to-a-One-Dimensional-Nonlinear-pcbi.1000046.e061.jpg&oldid=5068812
have multiple categories assigned to them, none of which are all too helpful.

I thus propose to do away with these article-level keywords entirely for equation images, and to just put them into some maintenance category of the
https://en.wikisource.org/wiki/Category:Equations_uploaded_with_reCitation_Bot
and
https://en.wikisource.org/wiki/Category:Equations_uploaded_with_reCitation_Bot_and_needing_category_review
kind.

Note that the current category names use a different spelling for the bot than the bot's user name suggests.

DOI upload is broken

Not sure what the problem is precisely (cf. #35 ), but the last ca. 10 attempts to upload something all went nowhere.

pywikibot config required for tests

Need to provide pywikibot config for testing purposes

=========================================================== ERRORS ============================================================
_______________________________________ ERROR collecting tests/test_journal_article.py ________________________________________
tests/test_journal_article.py:1: in <module>
>   from recitationbot import journal_article
recitationbot/journal_article.py:11: in <module>
>   import pywikibot
env/local/lib/python2.7/site-packages/pywikibot-2.0b1-py2.7.egg/pywikibot/__init__.py:30: in <module>
>   from pywikibot import config2 as config
env/local/lib/python2.7/site-packages/pywikibot-2.0b1-py2.7.egg/pywikibot/config2.py:162: in <module>
>   _base_dir = _get_base_dir()
env/local/lib/python2.7/site-packages/pywikibot-2.0b1-py2.7.egg/pywikibot/config2.py:158: in _get_base_dir
>           raise RuntimeError(exc_text)
E           RuntimeError: No user-config.py found in directory '/home/wrought/.pywikibot'.
E             Please check that user-config.py is stored in the correct location.
E             Directory where user-config.py is searched is determined as follows:
E           
E               Return the directory in which user-specific information is stored.
E           
E               This is determined in the following order -
E               1.  If the script was called with a -dir: argument, use the directory
E                   provided in this argument
E               2.  If the user has a PYWIKIBOT2_DIR environment variable, use the value
E                   of it
E               3.  Use (and if necessary create) a 'pywikibot' folder under
E                   'Application Data' or 'AppData\Roaming' (Windows) or
E                   '.pywikibot' directory (Unix and similar) under the user's home
E                   directory.

Public Data Formats

Does make sense to generate a comprehensive link dump based on BEACON format: https://de.wikipedia.org/wiki/Wikipedia:BEACON/Format#Daten-Zeilen

However, need to consider some more use cases for export formats. Namely:

  • For a given DOI, which Wikipedia articles cite it? (And what's the total number of citations for this DOI?)

For this, and probably other use cases we just want to serve up JSON from a public URL endpoint. This should be straight-forward with python.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.