ucla-bd2k / aztecretrieval Goto Github PK
View Code? Open in Web Editor NEWBioinformatics repository crawler and updater
Bioinformatics repository crawler and updater
Make sure your scripts are well documented (comments, function explanation). Documentation will be stored in the Wiki of the Github repo.
Some of the tags look like “[tag1, tag2, tag3]”. Parse the tag so that they are separated into an array of tags [‘tag1’, ‘tag2’, ‘tag3’].
Use the Github API to extract the programmatic information about the tool.
Metadata includes: Maintainers (name, github username, email), Programming Language, Version (Version number and date), License, and number of forks/pulls.
Write a new script which takes in a json file containing extracted data of publications and pushes that data into the solr database.
Create all-in-one script that downloads the PDFs, extracts metadata from PDF using GROBID, enrich using APIs, and insert into Solr. Each component should be modularized (1. Get papers from Journal (Download PDFs if needed), 2. Classify publication, 3. Extract metadata from PDF & enrich, 4. Insert metadata into Solr)
Funding is extracted by Grobid, but they are sentences. Parse the sentence to get the Agency and Grant Number.
Make sure the the Solr is also compatible with the Aztec web application
Download 5000 publications from Bioinformatics journal
Given a PDF, classify the publication as tool or nontool using the classifier.
As a developer, I would like to setup this project so that I can run/test it locally.
Platforms include if the tool is compatible with a particular OS
Using the CrossRef API (and/or Altmetrics), retrieve the number of citations for each tool (for those that have a DOI) and boost it's ranking accordingly in Solr.
Using Grobid, extract metadata from the PDF.
Metadata includes tools name, description (abstract), links, source code links, technologies, grant/funding information, authors, and affiliations.
Using modified Grobid script, extract metadata from publication
Using the Pubmed API, retrieve the following information:
PMID (Pubmed ID), Journal Name, DOI, affiliation, Author, Publication title.
Example: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=27402908&rettype=xml
Given a list of PMID, use the Pubmed API to extract information, Grobid to extract info, and Github.
The Pubmed API should give you the DOI, which can be used to download the PDF for Grobid.
Be sure to look for the Github link in the PDF; if there is a link, then use the Github API to extract info.
Put it into a JSON that looks like this:
{
pubmed:{...},
grobid: {...},
github: {...}
}
Some tools are missing descriptions; try to fill in the descriptions with the abstract.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.