Code Monkey home page Code Monkey logo

altmetricanalysis's Introduction

Altmetric Analysis

[TOC]

Summary

We have data on Altmetric attention scores for over 200,000 articles from 7 idiotypic journals.

  • bioRxiv
  • Cell
  • NEJM
  • Nature
  • PLOS ONE
  • PNAS
  • Science

We are curious whether there are any interesting patterns in Altmetric scores relating to the genders of article's authors.

This analysis genderizes authors for those 200,000 articles, then models Altmetric scores as a function of author gender, year, month, journal and number of authors.

Folder structure

Scripts

  • AltmetricExtractor_092719BB.py
    • Reads the raw data from Altmetric.com.
    • Extracts articles from the 7 journals of interest (listed above)
    • Extracts fields of interest: article ID, authors, journal, type of article, DOI, publication date, Altmetric score at the following intervals post-publication: 1-day, 2-day, 3-day, 4-day, 5-day, 6-day, 1-week, 1-month, 3-months, 6-months, 1-year, all-time
    • Writes a tsv file for each journal in AltmetricData/
  • GenderizeAuthors_020420JF.R
    • Reads the extracted data from AltmetricData/
    • Cleans the data (removes articles with missing data)
    • Subsets the data to years 2011-2018 (inclusive)
      • 2011 is when Altmetric started keeping score, and many social media platforms on which the score is based were widespread
      • Analysis was done at the end of 2019; subsetting years < 2019 allowed us to use 1-year scores from articles published in 2018
    • Cleans the author variable for each article, identifies which is the first name and genderizes author names using genderizeR
    • Creates new variables with gender statistics: number of female/male/unknown gender authors, proportion of female/male/unknown gender authors, gender & probability of first author, gender & probability of last author
    • Merges gender variables with original data
    • Writes a tsv file for each journal in GenderizedData/
  • AltmetricScoreModeling_040720JF.R
    • Reads the genderized data from GenderizedData/
    • Does some basic data exploration
    • Separates data into:
      • Binary: articles with a 1-year Altmetric score of 0 and articles with a 1-year Altmetric score > 0. This is because so many articles have scores of 0 after 1 year (66% based on data exploration)
      • Linear: subset of all articles with a 1-year Altmetric score > 0
      • Also, bioRxiv was only started in 2015, so its time series is shorter than for other journals, therefore it was coded separately (binary_bioRixv, linear_bioRxiv, binary_allOtherJournals, linear_allOtherJournals)
    • Runs models:
      • Logistic model: whether or not an article obtains a score > 0 ~ interaction between publication year, journal and first/last author gender, publication month, proportion of female authors and total number of authors
      • Linear model: log(score) ~ interaction between publication year, journal and first/last author gender, publication month, proportion of female authors and total number of authors
      • Saved in Models.
    • Does standard model checking via plots
    • Does post-hoc testing via multiple comparisons
    • Creates plots for paper in Figures

AltmetricData

This is folder contains the raw data as extracted in the script AltmetricExtractor_092719BB.py. There is one tsv file for each journal. We cannot share the raw data; please contact Altmetric.com.

GenderizedData

This folder contains the Altmetric data as processed in the script GenderizeAuthors_020420JF.R. It contains the original data, plus additional variables describing the authors' genders. There is one tsv file for each journal.

Models

This folder contains the R objects containing the models as run in AltmetricScoreModeling_040720JF.R. There is one RDS file for the logistic model of bioRxiv, one for the logistic model of all journals except bioRxiv, one for the linear model of bioRxiv and one for the linear model of all journal except bioRxiv.

Figures

This folder contains the figures generated in AltmetricScoreModeling_040720JF.R.

Sensitivity analysis

We did a sensitivity analysis to see the impact of genderizing accuracy on our results.

altmetricanalysis's People

Contributors

jafortin avatar bjarnebartlett avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.