Code Monkey home page Code Monkey logo

catalog's People

Contributors

alee avatar cpritcha avatar dependabot[bot] avatar dhruvilpatel avatar snyk-bot avatar tgpatel avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

catalog's Issues

implementation

What base platform/technology should we use to develop the web tools?

  • Drupal
  • Django / Flask / something Python
  • Java/Scala/Groovy
  • Something else associated with the diging devo evo group

Other language app frameworks would need to use https://drupal.org/project/rest_auth or something similar to auth with and delegate user / permissions management to Drupal

Other resources:

autocomplete empty string error

Exception

TypeError: reduce() of empty sequence with no initial value
(3 additional frame(s) were not displayed)
...
  File "rest_framework/views.py", line 451, in dispatch
    response = self.handle_exception(exc)
  File "rest_framework/views.py", line 448, in dispatch
    response = handler(request, *args, **kwargs)
  File "catalog/core/views.py", line 271, in get
    sqs = SearchQuerySet().autocomplete(name=request.GET.get('q', '')).models(Sponsor)
  File "haystack/query.py", line 463, in autocomplete
    return clone.filter(six.moves.reduce(operator.__and__, q
 uery_bit
 s))

support citation graphs

  • design a data model for a given publication's citations to create citation graphs
  • consider graph database like neo4j or orientdb for the citation graph
  • get seed data from web of science results that marco downloads as an initial import, consider importing publication data directly from web of science (if possible)

assigned curator info

  • list assigned curators and recent actions taken
  • show assigned curator in publication detail page

management command to generate one-off data out file

Create a django management command to generate a data out file of publications with the following data:

Publication Year, Lead Author, Publication Title, Journal Name, codeurl, docs, platform, sponsor1, sponsor2, ..., sponsorN

Note that for the multi valued sponsor field we'll need to keep track of the max number of sponsors N that we find in the data and create N sponsor fields for them. They will be mostly empty for those publications that only have a single sponsor for example.

zotero import dies on bad input

I'm not sure if the collection syntax is correct - another issue is that the tool says it is pulling 614 publications but it should only be pulling 544 from https://www.zotero.org/groups/workbench-cml/items/collectionKey/7DQ82DZ3

alllee% ./manage.py zotero_import --group=289063 --collection=7DQ82DZ3
Starting to import data from Zotero. Hang tight, this may take a while.
Number of Publications to import: 614
Traceback (most recent call last):
  File "./manage.py", line 10, in <module>
    execute_from_command_line(sys.argv)
  File "/home/alllee/.virtualenvs/catalog/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 338, in execute_from_command_line
    utility.execute()
  File "/home/alllee/.virtualenvs/catalog/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 330, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/home/alllee/.virtualenvs/catalog/local/lib/python2.7/site-packages/django/core/management/base.py", line 390, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/home/alllee/.virtualenvs/catalog/local/lib/python2.7/site-packages/django/core/management/base.py", line 441, in execute
    output = self.handle(*args, **options)
  File "/home/alllee/work/comses/catalog/catalog/core/management/commands/zotero_import.py", line 235, in handle
    self.generate_entry(json_data)
  File "/home/alllee/work/comses/catalog/catalog/core/management/commands/zotero_import.py", line 214, in generate_entry
    note = self.create_note(item['data'], item['meta'])
  File "/home/alllee/work/comses/catalog/catalog/core/management/commands/zotero_import.py", line 197, in create_note
    item.added_by = self.get_user(meta)
  File "/home/alllee/work/comses/catalog/catalog/core/management/commands/zotero_import.py", line 50, in get_user
    first_name, last_name = meta_data['createdByUser']['name'].strip().split(' ')
ValueError: need more than 1 value to unpack

curator workflow

Followups:

  • refactor binding between Django models and django rest framework serializers in views.py
  • sort curator workflow untagged publications by Journal
  • add corresponding author name
  • add scholar.google.com URL link from curator detail
  • add status for irrelevant or N/A when the thing is not a model

zotero import failing

Try running ./manage.py zotero_import --group=289063 --collection=7DQ82DZ3

It first generates issues with lxml.html not being found that can be easily fixed by changing the import to from lxml import html.

It then fails on what appear to be empty notes

data cleaning issues

Sponsor has fields like Ministry de of Ecology, Energy, Sustainable Development and Sea and

ACA Challenge Grants in Biodiversity, Foothills Research Institute-Chisholm-Dogrib Fire initiative grants, Sundre Forest Products LTD., Canon National Parks Science Scholarship for the Americas, University of Alberta and Parks Canada, NSERC

or

Alberta Innovates Technology Futures, Portland State University, Arizona State University, Uppsala University, and University of Cincinnati.

We can't always split by , here because in some cases the comma is separating multiple values and in others it is part of a singular name.

Database Model

determine how to store and organize data imported from zotero.

add some basic dashboard data

  • list of N most recently edited items with links to the items
  • N most recently author edited items, e.g., author updated codeurl
  • Out of N publications in our catalog, M publications have a valid code URL. Provide link to the search filter that will pull up all of the publications missing a valid code URL.

Tags in Note Class

As the curators are already attaching notes to the Note class we can it for Curator's comments by either merging note to publication instead of separate class or at-least we can get rid of tags field in the Note class as AFAIK we won't be using it for any purposes..

dynamic geographical visualization

It would be cool if we can create a visualization that correlates publications with geographical locations, interactive that can highlight which areas of the world are archiving their models, searchable by things like

  • Funding agency (sponsor)
  • Journal
  • might be more, something to discuss with Marco

flatten Publication fields

remove inheritance, merge Book/Thesis/Report/JournalArticle fields into Publication and add a type field to distinguish between them

create workflow between metadata curator & author

  • candidate set of publications (pulled in via zotero import, harvested from other online aggregation feeds or direct sources, manual creation)
  • partition publications into sets (status = complete, pending, new, ...)
  • metadata curator selects publication, fills in some fields (at minimum verify contact author) and then clicks "preview/send email button" to send a templated email request to the author(s)

refactor django rest framework serializers

refactor the binding between Django models and django rest framework serializers

  • can probably use inheritance or mixins to reduce duplication in PublicationSerializers
  • see if we can manage the model to serializer bindings with less duplicity

simplified form workflow

  1. Adjust the dashboard to show publications assigned to you (Publication.assigned_curator). Should provide a link that takes the user to a search filter page that filters by assigned curator AND Publication.Status.UNTAGGED
  2. Add simplified form for entering the following data:
  • Sponsor
  • Docs
  • Code URL
  • Contact email
  • Platform
  • Tags
  • Status change buttons (mark as incomplete, mark as completed, flag for further review)

Change search lists to link to the short form, and have a button to edit all publication details in that short form.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.