Code Monkey home page Code Monkey logo

aztecretrieval's People

Contributors

akhahaha avatar singularity9971 avatar kevinsheu avatar bleakley avatar vincekyi avatar dependabot[bot] avatar

Watchers

Giuseppe M. Mazzeo avatar James Cloos avatar  avatar Patrick Tan avatar  avatar  avatar  avatar Wei-Ting Chen avatar MahatiKay avatar Ravi Jayanthi avatar Allison Ko avatar  avatar defibull avatar  avatar  avatar  avatar Yichao Zhou avatar

Forkers

peichao

aztecretrieval's Issues

Document Code

Make sure your scripts are well documented (comments, function explanation). Documentation will be stored in the Wiki of the Github repo.

Fix Tag bug

Some of the tags look like “[tag1, tag2, tag3]”. Parse the tag so that they are separated into an array of tags [‘tag1’, ‘tag2’, ‘tag3’].

Use Github API

Use the Github API to extract the programmatic information about the tool.
Metadata includes: Maintainers (name, github username, email), Programming Language, Version (Version number and date), License, and number of forks/pulls.

Input extracted data into Solr

Write a new script which takes in a json file containing extracted data of publications and pushes that data into the solr database.

Create All-in-one script for pipeline

Create all-in-one script that downloads the PDFs, extracts metadata from PDF using GROBID, enrich using APIs, and insert into Solr. Each component should be modularized (1. Get papers from Journal (Download PDFs if needed), 2. Classify publication, 3. Extract metadata from PDF & enrich, 4. Insert metadata into Solr)

Parse Funding

Funding is extracted by Grobid, but they are sentences. Parse the sentence to get the Agency and Grant Number.

Setup Environment

As a developer, I would like to setup this project so that I can run/test it locally.

Add Citation Metrics to Tools

Using the CrossRef API (and/or Altmetrics), retrieve the number of citations for each tool (for those that have a DOI) and boost it's ranking accordingly in Solr.

Extract Metadata from Publication

Using Grobid, extract metadata from the PDF.
Metadata includes tools name, description (abstract), links, source code links, technologies, grant/funding information, authors, and affiliations.

Streamline Extraction Process

Given a list of PMID, use the Pubmed API to extract information, Grobid to extract info, and Github.

The Pubmed API should give you the DOI, which can be used to download the PDF for Grobid.
Be sure to look for the Github link in the PDF; if there is a link, then use the Github API to extract info.

Put it into a JSON that looks like this:
{
pubmed:{...},
grobid: {...},
github: {...}
}

Missing Descriptions

Some tools are missing descriptions; try to fill in the descriptions with the abstract.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.