Code Monkey home page Code Monkey logo

fairdata's Introduction

Research data management

General information

This repository contains information about research data management, reproducible code, and jupyter notebooks to create metadata schemes. There are 4 main subfolders in this repository that are focused on a main topic:
  • Electronic Lab Notebook
  • Reproducible code
  • Data documentation
  • Open Access Publishin

This repository will be updated frequently with new information and guidelines.

Data package

A data package is a collection of all digital parts of a research project including data, code, texts (protocols, reports, questionnaires, meta data). The collection is created in such a way that reproducing all results is straightforward.

When creating a data package, you should keep the following three principles in mind:
  • Files should be organized in a conventional folder structure
  • Data, methods, and output should be clearly separated
  • The computational environment should be specified

When working on your data package, you can follow the steps below:

  • Think about a good folder structure
  • Create folder structure (main directory and sub directories)
  • Add all files needed for reproducing the results of the project
  • Try to have the data package as clean and easy to use (add readme files to important folders)
  • Have a peer check the data package and see if it works correctly
  • When you finished the data package, place it in your data archive or repository

Finally, you can use the following checklist to see if you included all relevant files:

  • Readme file(s)
  • Metadata file
  • Documentation file
  • Raw data
  • Processed data
  • Protocols
  • Methodologies
  • Scripts

Data documentation

Metadata schemes

This is the testing environment to develop a workflow to build and test a FAIR metadata documentation based on the FAIR principles. We include different metadata schemes including the DataCite schema. Furthermore, it is possible to check your documentation against a predifed set of key words for a specific research area.

DataCite

The DataCite schema provides persistent identifiers (DOIs) for research data and other research outputs.
For more information visit the DataCite website

Software requirements

  1. Python (version 3.8.5)
  2. Pandas (version 1.1.3)
  3. chardet (version 3.0.4)

Test the FAIRness of your data

To start and test the FAIRness of your data, load the document you want to test:
working_document='DataCiteExample.txt'
data_to_check=pd.read_csv('DataCiteExample.txt',sep='\t')
data_to_json=create_json_object(data_to_check)

After you succesfully loaded your document, the first test will try and identify if you use a FAIR file format:

fairness_file=CheckFairification(file_name='DataCiteExample.txt')
fairness_file.score_fairness_data_type()

The second test can be used to identify if your document contains the required fields based on an xml schema.
Here, we will test if our document meets the DataCite standards:

check_data=CheckDataValidity(Data=data_json,type='DataCite')
check_data.set_the_mandatory_fields()
check_data.check_required_fields()
print('Test 1: check if all required fields are present: %s' % check_data.required_fields)
check_data.check_for_na()
print('Test 2: json object contains no NA values: %s ' % check_data.na_checked)
check_data.check_format_of_date()
print('Test 3: format date check: %s' % check_data.correct_format)
print('Overall the metadata %s the test. Please revise your documentation.' % check_data.required_fields)

The test output can be used to improve your documentation before you upload your dataset to a repository or archive.

software license

Copyright (c) 2022, TJM Kuijpers
All rights reserved.

This source code is licensed under the BSD-3-Clause.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.