Code Monkey home page Code Monkey logo

encomium's Introduction

Encomium

This codebase merges publishing data together to analyze institutional citation patterns. It is being developed for a project to analyze BTAA publishing trends.

Installation

Clone this gem locally.

Usage

  1. After cloning the repo, setup the input data directory per instructions below
  2. Save a copy of the example config file files.yml.example as files.yml [1]
  3. Run the rake task: rake build
  4. Run the rake task: rake db_tables

The first rake task merges all the data together and creates a pair of summary CSV data files for review. The second task produces TSV files intended to be loaded as a relational database based on the same data. This latter task creates files based on a highly normalized relational schema. The table columns use the Ruby on Rails naming conventions:

  • lowercase fields
  • fields use snake_case for multiple words
  • table names (the TSV files in this case) are assumed to be pluralized nouns
  • primary keys are always id
  • foreign keys are always {singular_form_of_pluralized_source_table_name}_id
  • many-to-many tables use foreign keys described as above and are named by joining the table names in alphabetical order with an underscore _ (e.g., snake-cased)

In order to keep the files small timestamp fields are not included.

[1] Leave the original file in place because it is in the repository.

Input Data Directory Structure

Choose a base directory on your file system and create the following sub-folders:

base_directory/
  +- articles/
     +- inst_1/
     +- inst_2/
     +- ...
  +- cited-articles/
  +- COUNTER/
     +- inst_1/
     +- inst_2/
     +- ...
  +- MARC/
  +- output/
  +- wos-journals/

articles: WOS JSON files for articles published authored by the institutions being studied. all article data files should be contained in sub-directories that use the institution codes (e.g., uw, osu, mn).

cited-articles: WOS JSON files for the articles cited by the articles in the articles directory

COUNTER: COUNTER data for the institutions being studied. all COUNTER data files should be contained in sub-directories that use the institution codes (e.g., uw, osu, mn). COUNTER data should be CSV data containing the following columns (other columns will be ignored):

  • Journal Title (optional)
  • Publisher (optional)
  • ISSN
  • eISSN (optional)
  • Date in the format YYYY-MM-DD (assumed to be month-level granularity only)
  • Uses (SUM)

MARC: any files containing binary MARC records primarily used to identify LC Classification numbers

output: this file will be generated when the Rake tasks are run and will contain all processed data, some of which is derivative representations for processing and some of which will be used for final output

wos-journals: CSV files downloaded from Clarivate's website. These files determine which journals will be included in the final analysis files. All other data files match these title lists by ISSN.

License

The gem is available as open source under the terms of the MIT License.

Code of Conduct

Everyone interacting in the Encomium project's codebases, issue trackers, chat rooms and mailing lists is expected to follow the code of conduct.

encomium's People

Watchers

Steve Meyer avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.