Hi I would like to write a edirect query to extract number of public

You may want to take a look at the <a href="https://ftp.ncbi.nlm.nih.gov/gene/DATA/gen

Get publication counts per gene per year about edirectcookbook HOT 6 OPEN

ncbi-hackathons commented on August 23, 2024

Get publication counts per gene per year

from edirectcookbook.

Comments (6)

vkkodali commented on August 23, 2024

You may want to take a look at the gene2pubmed.gz file to see if you can use data from there. For a given list of taxids, you can get a list of all PMIDs associated with each GeneID. From there, you can probably join the PDAT for each PMID.

from edirectcookbook.

sanyalab commented on August 23, 2024

I have gotten this far. For an example gene id (816394) in taxon Arabidopsis thaliana (txid3702) I can get the count of all the pubmed articles related to this gene

esearch -db gene -query "txid3702[Organism:exp] AND 816394[UID]" | elink -target pubmed

After this the next step is to download in xml or docsum format the articles and filter the articles by date [PDAT] of publication. This is the strategy I am using. I used this next command but the error was "Too many requests"

esearch -db gene -query "txid3702[Organism:exp] AND 816394[UID]" | elink -target pubmed | efetch -format xml | xtract -pattern PubmedArticle -element PubDate

I don't know how to get around this. Thanks for the help

from edirectcookbook.

vkkodali commented on August 23, 2024

the error was "Too many requests"

Are you using the eUtils API keys?

from edirectcookbook.

sanyalab commented on August 23, 2024

Hi vkkodali

The initial part of the error looks like below

429 Too Many Requests
No do_post output returned from 'https://eutils.be-md.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&query_key=3&WebEnv=NCID_1_19870438_130.14.18.97_9001_1553967351_471391598_0MetA0_S_MegaStore&rettype=text&retmode=text&retstart=0&retmax=100&edirect=7.40&tool=edirect&email=sanyalab@lxjh218'
Result of do_post http request is
$VAR1 = bless( {
                 '_protocol' => 'HTTP/1.1',
                 '_content' => '{"error":"API rate limit exceeded","api-key":"170.54.61.190","count":"4","limit":"3"}',
                 '_rc' => 429,
                 '_headers' => bless( {
                                        'connection' => 'close',
                                        'x-ratelimit-limit' => '3',
                                        'date' => 'Sat, 30 Mar 2019 17:35:51 GMT',
                                        'vary' => 'Accept-Encoding',
                                        'client-peer' => '130.14.29.110:443',

In the latter part I get a truncated output. The query should yield me 154 articles. I get 54. Thanks for the help.

from edirectcookbook.

vkkodali commented on August 23, 2024

You need to create an API key as mentioned in the 'How do I get a key?' section here. After that, you need to either run the following command before executing esearch, or for a moe permanent fix, add it to your .bashrc file:

export NCBI_API_KEY='abcdef1234567890'

from edirectcookbook.

sanyalab commented on August 23, 2024

It worked!!! Thanks a bunch vkkodali.

One unrelated comment. I download specific EST, cDNA datasets from NCBI every quarter. I use a combination of epost and efetch. There too I face this issue sometimes, and I rerun after a gap of 250 seconds. exporting the API_KEY should take care of this too right?

Thanks for your help

from edirectcookbook.

Get publication counts per gene per year about edirectcookbook HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent