Code Monkey home page Code Monkey logo

delays's Introduction

The history of publishing delays

Here, we explore the history of scientific publishing delays. The findings from the analysis are discussed in a blog post by Daniel Himmelstein and feature in Nature News.

Delays are calculated from publisher-deposited PubMed history dates. Only journal articles published between 1960 and 2015 are included. Specifically, two delay types are calculated:

  • acceptance delay — the number of days from receival to acceptance
  • publication delay — the number of days from acceptance to online publication

Execution and notebooks

To re-execute the analysis, run the following notebooks in the following order:

  1. eutilities.ipynb (python): Use PubMed's EUtility API to retrieve the list of relevant IDs using ESearch and article summaries using ESummary.
  2. process-esummary.ipynb (python): Extract history dates from the ESummary XML output.
  3. extract-delays.ipynb (R): Calculate acceptance and publications delays from the PubMed history dates.
  4. process-nlm-catalog.ipynb (python): Download and process the NLM Catalog which contains the journals indexed by PubMed.
  5. visualize-history.ipynb (R): Plot historical delays and export several TSV summaries of the dataset.
  6. webapp.ipynb (python): Create JSON files used to initialize the select2 journal selection for the blog post.

Datasets

The following data files are generated during execution:

  1. pubmed-journals.tsv: a dataframe of the NLM Catalog (journals in PubMed)
  2. history-dates.tsv.bz2: a dataframe with all history dates extracted from the PubMed XML
  3. delays.tsv.gz: a dataframe of all acceptance and publication delays
  4. journal-summaries.tsv: a dataframe of summarizing delays for each journal
  5. yearly-summaries.tsv: a dataframe of summarizing delays for each year
  6. yearly-percentiles.tsv: a dataframe of delay percentiles for each year
  7. slopes.tsv: a dataframe journal-specific delay slopes (Δ days of delay per year)

The following data files, generated by eutilities.ipynb, are ignored due to large file size:

  1. download/esearch_journal-articles_1960-2015.tsv.gz with the list of relevant PubMed IDs
  2. download/esummary_journal-articles_1960-2015.xml.bz2 with combined XML output from the ESummary API queries

These files, along with several of the other files listed above, are available via figshare.

delays's People

Contributors

dhimmel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

delays's Issues

Quick acceptance times (< 5 days) for some respiratory journals

Hey Daniel. In the data I find some journals with papers that have a very low acceptance time, such as one or two days. Do you have an explanation for that? Could it be Invited articles or other article types than original articles, such as editorials and correspondences? Andreas

Outdated or buggy journal abbreviations data

Not sure if the data is outdated or if there is a bug but some journals have outdated/invalid(?) iso abbreviations. Example from pubmed/J_Medline.txt:

JournalTitle: The New England journal of medicine
MedAbbr: N Engl J Med
ISSN (Print): 0028-4793
ISSN (Online): 1533-4406
IsoAbbr: N Engl J Med
NlmId: 0255562

Notice the Iso and Med abbreviations (are the same), but in dhimmel/delays, they are different: N. Engl. J. Med. (Iso) vs N Engl J Med (Med) (notice the dots).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.