Research data management

General information

This repository contains information about research data management, reproducible code, and jupyter notebooks to create metadata schemes. There are 4 main subfolders in this repository that are focused on a main topic:

Electronic Lab Notebook
Reproducible code
Data documentation
Open Access Publishin

This repository will be updated frequently with new information and guidelines.

Data package

A data package is a collection of all digital parts of a research project including data, code, texts (protocols, reports, questionnaires, meta data). The collection is created in such a way that reproducing all results is straightforward.

When creating a data package, you should keep the following three principles in mind:

Files should be organized in a conventional folder structure
Data, methods, and output should be clearly separated
The computational environment should be specified

When working on your data package, you can follow the steps below:

Think about a good folder structure
Create folder structure (main directory and sub directories)
Add all files needed for reproducing the results of the project
Try to have the data package as clean and easy to use (add readme files to important folders)
Have a peer check the data package and see if it works correctly
When you finished the data package, place it in your data archive or repository

Finally, you can use the following checklist to see if you included all relevant files:

Data documentation

Metadata schemes

This is the testing environment to develop a workflow to build and test a FAIR metadata documentation based on the FAIR principles. We include different metadata schemes including the DataCite schema. Furthermore, it is possible to check your documentation against a predifed set of key words for a specific research area.

DataCite

The DataCite schema provides persistent identifiers (DOIs) for research data and other research outputs.
For more information visit the DataCite website

Software requirements

Python (version 3.8.5)
Pandas (version 1.1.3)
chardet (version 3.0.4)

Test the FAIRness of your data

To start and test the FAIRness of your data, load the document you want to test:

working_document='DataCiteExample.txt'
data_to_check=pd.read_csv('DataCiteExample.txt',sep='\t')
data_to_json=create_json_object(data_to_check)

After you succesfully loaded your document, the first test will try and identify if you use a FAIR file format:

fairness_file=CheckFairification(file_name='DataCiteExample.txt')
fairness_file.score_fairness_data_type()

The second test can be used to identify if your document contains the required fields based on an xml schema.
Here, we will test if our document meets the DataCite standards:

check_data=CheckDataValidity(Data=data_json,type='DataCite')
check_data.set_the_mandatory_fields()
check_data.check_required_fields()
print('Test 1: check if all required fields are present: %s' % check_data.required_fields)
check_data.check_for_na()
print('Test 2: json object contains no NA values: %s ' % check_data.na_checked)
check_data.check_format_of_date()
print('Test 3: format date check: %s' % check_data.correct_format)
print('Overall the metadata %s the test. Please revise your documentation.' % check_data.required_fields)

The test output can be used to improve your documentation before you upload your dataset to a repository or archive.

software license

This source code is licensed under the BSD-3-Clause.

tjmkuijpers / fairdata Goto Github PK

fairdata's Introduction

Research data management

General information

Data package

Data documentation

Metadata schemes

DataCite

Software requirements

Test the FAIRness of your data

software license

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent