The catalog from comses

implementation

What base platform/technology should we use to develop the web tools?

Drupal
Django / Flask / something Python
Java/Scala/Groovy
Something else associated with the diging devo evo group

Other language app frameworks would need to use https://drupal.org/project/rest_auth or something similar to auth with and delegate user / permissions management to Drupal

Other resources:

autocomplete empty string error

Exception

TypeError: reduce() of empty sequence with no initial value
(3 additional frame(s) were not displayed)
...
  File "rest_framework/views.py", line 451, in dispatch
    response = self.handle_exception(exc)
  File "rest_framework/views.py", line 448, in dispatch
    response = handler(request, *args, **kwargs)
  File "catalog/core/views.py", line 271, in get
    sqs = SearchQuerySet().autocomplete(name=request.GET.get('q', '')).models(Sponsor)
  File "haystack/query.py", line 463, in autocomplete
    return clone.filter(six.moves.reduce(operator.__and__, q
 uery_bit
 s))

show audit log, who did what

status changes
notes added
fields modified

curator form should display context-specific validation errors

When an invalid URL or email address is entered the form just says "Something went wrong, please verify form data". Instead it should display the DRF error message at the form input field location.

https://catalog.comses.net/publication/2027/

support citation graphs

design a data model for a given publication's citations to create citation graphs
consider graph database like neo4j or orientdb for the citation graph
get seed data from web of science results that marco downloads as an initial import, consider importing publication data directly from web of science (if possible)

assigned curator info

list assigned curators and recent actions taken
show assigned curator in publication detail page

management command to generate one-off data out file

Create a django management command to generate a data out file of publications with the following data:

Publication Year, Lead Author, Publication Title, Journal Name, codeurl, docs, platform, sponsor1, sponsor2, ..., sponsorN

Note that for the multi valued sponsor field we'll need to keep track of the max number of sponsors N that we find in the data and create N sponsor fields for them. They will be mostly empty for those publications that only have a single sponsor for example.

zotero import: add api key Authorization header

If settings.ZOTERO_API_KEY is not None, add Authorization: Bearer <settings.ZOTERO_API_KEY> to the http header on each request

zotero import dies on bad input

I'm not sure if the collection syntax is correct - another issue is that the tool says it is pulling 614 publications but it should only be pulling 544 from https://www.zotero.org/groups/workbench-cml/items/collectionKey/7DQ82DZ3

alllee% ./manage.py zotero_import --group=289063 --collection=7DQ82DZ3
Starting to import data from Zotero. Hang tight, this may take a while.
Number of Publications to import: 614
Traceback (most recent call last):
  File "./manage.py", line 10, in <module>
    execute_from_command_line(sys.argv)
  File "/home/alllee/.virtualenvs/catalog/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 338, in execute_from_command_line
    utility.execute()
  File "/home/alllee/.virtualenvs/catalog/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 330, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/home/alllee/.virtualenvs/catalog/local/lib/python2.7/site-packages/django/core/management/base.py", line 390, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/home/alllee/.virtualenvs/catalog/local/lib/python2.7/site-packages/django/core/management/base.py", line 441, in execute
    output = self.handle(*args, **options)
  File "/home/alllee/work/comses/catalog/catalog/core/management/commands/zotero_import.py", line 235, in handle
    self.generate_entry(json_data)
  File "/home/alllee/work/comses/catalog/catalog/core/management/commands/zotero_import.py", line 214, in generate_entry
    note = self.create_note(item['data'], item['meta'])
  File "/home/alllee/work/comses/catalog/catalog/core/management/commands/zotero_import.py", line 197, in create_note
    item.added_by = self.get_user(meta)
  File "/home/alllee/work/comses/catalog/catalog/core/management/commands/zotero_import.py", line 50, in get_user
    first_name, last_name = meta_data['createdByUser']['name'].strip().split(' ')
ValueError: need more than 1 value to unpack

connect to zotero.org group library

https://www.zotero.org/groups/comses-cml

curator workflow

Followups:

refactor binding between Django models and django rest framework serializers in views.py
sort curator workflow untagged publications by Journal
add corresponding author name
add scholar.google.com URL link from curator detail
add status for irrelevant or N/A when the thing is not a model

zotero import failing

Try running ./manage.py zotero_import --group=289063 --collection=7DQ82DZ3

It first generates issues with lxml.html not being found that can be easily fixed by changing the import to from lxml import html.

It then fails on what appear to be empty notes

add AORML to visual relationships category

Work on syncing data between Zotero and local Database

invalid contact_author_name in modified_data_text

modified data isn't working properly for contact_author_name: see https://catalog.comses.net/publication/633/curate/

django haystack not compatible with solr > 5.0

currently pinned at 4.10.4, upgrade when django-haystack releases support for solr 5

deploy: uwsgi not getting restarted properly by supervisor

restarting supervisord causes uwsgi to break horribly until a manual kill -9 on the parent process is issued. Need to fix so fab deploy will work properly

data cleaning issues

Sponsor has fields like Ministry de of Ecology, Energy, Sustainable Development and Sea and

ACA Challenge Grants in Biodiversity, Foothills Research Institute-Chisholm-Dogrib Fire initiative grants, Sundre Forest Products LTD., Canon National Parks Science Scholarship for the Americas, University of Alberta and Parks Canada, NSERC

or

Alberta Innovates Technology Futures, Portland State University, Arizona State University, Uppsala University, and University of Cincinnati.

We can't always split by , here because in some cases the comma is separating multiple values and in others it is part of a singular name.

Travis reporting 404 not found error for solr

Failed to add documents to Solr: [Reason: Error 404 Not Found]
Try this: http://localhost:8983/solr/admin/cores

URLize code url if present

zotero import not setting status as UNTAGGED

--collection=3HM4CEW5

add a way for assigned curator to view past work

~~replace SearchView usage with generic SearchView, see http://django-haystack.readthedocs.org/en/latest/views_and_forms.html#new-django-class-based-views~~
~~push urls.py search view config into views.py~~
facet publications by status

Database Model

determine how to store and organize data imported from zotero.

create invitation email template model

name
text
date created
last modified
creator

upgrade djangorestframework pagination api

Looks like django rest framework 3.1 has backwards incompatible API changes to pagination:

http://www.django-rest-framework.org/api-guide/pagination/#pagination

add some basic dashboard data

list of N most recently edited items with links to the items
N most recently author edited items, e.g., author updated codeurl
Out of N publications in our catalog, M publications have a valid code URL. Provide link to the search filter that will pull up all of the publications missing a valid code URL.

Develop a wizard to send out email invites to authors of the publications.

http://vadimg.com/twitter-bootstrap-wizard-example/
http://www.panopta.com/static/bootstrap-wizard-plugin/demo/demo.html
http://ct-freebies.herokuapp.com/wizard-demo-register?

sentry not reporting errors on prod

add creator to displayed notes

Tags in Note Class

As the curators are already attaching notes to the Note class we can it for Curator's comments by either merging note to publication instead of separate class or at-least we can get rid of tags field in the Note class as AFAIK we won't be using it for any purposes..

fix unicode decoding issues with Sponsor and other modified fields

Unicode input in Sponsor field, e.g., Conacyt México in Publication 2019 causing PublicationSerializer.modified_data_text to croak

dynamic geographical visualization

It would be cool if we can create a visualization that correlates publications with geographical locations, interactive that can highlight which areas of the world are archiving their models, searchable by things like

Funding agency (sponsor)
Journal
might be more, something to discuss with Marco

consider replacing KO with something better

zotero import: pull by collection

Look into integrating pyzotero to manage access to zotero. We need to pull by collection and use API keys, this looks like a fairly clean API to zotero.

http://pyzotero.readthedocs.org/en/latest/#retrieving-collections

flatten Publication fields

remove inheritance, merge Book/Thesis/Report/JournalArticle fields into Publication and add a type field to distinguish between them

Search and Indexing

Using Django-haystack
http://django-haystack.readthedocs.org/en/latest/tutorial.html

Sample application
http://www.alexanderinteractive.com/blog/2012/08/getting-started-with-solr-and-django/
https://github.com/broderboy/django-solr-demo

google scholar link

add author names as additional filter using author: lastname

curator search: add search and ordering filter by Journal

add autocompletion to the search box

create workflow between metadata curator & author

candidate set of publications (pulled in via zotero import, harvested from other online aggregation feeds or direct sources, manual creation)
partition publications into sets (status = complete, pending, new, ...)
metadata curator selects publication, fills in some fields (at minimum verify contact author) and then clicks "preview/send email button" to send a templated email request to the author(s)

add asu library publication search

flag for review

add simple button to flag pub for further review

refactor django rest framework serializers

refactor the binding between Django models and django rest framework serializers

can probably use inheritance or mixins to reduce duplication in PublicationSerializers
see if we can manage the model to serializer bindings with less duplicity

fix pagination on search results

search results not displaying pagination

Work on syncing data between database and search index

Signal processors might be the way to go about it.

add visual status indicator for deleted notes

strikethrough or something similar

simplified form workflow

Adjust the dashboard to show publications assigned to you (Publication.assigned_curator). Should provide a link that takes the user to a search filter page that filters by assigned curator AND Publication.Status.UNTAGGED
Add simplified form for entering the following data:

Sponsor
Docs
Code URL
Contact email
Platform
Tags
Status change buttons (mark as incomplete, mark as completed, flag for further review)

Change search lists to link to the short form, and have a button to edit all publication details in that short form.

Add publication info to the exceptions in zotero_import

Along with the exception getting logged add the publication information like publication title, id etc.

saving invalid publications returns 400 bad request

ids: 1081 1082 1083 1251

comses / catalog Goto Github PK

catalog's People

Contributors

Stargazers

Watchers

Forkers

catalog's Issues

Other resources:

Recommend Projects

Recommend Topics

Recommend Org