Code Monkey home page Code Monkey logo

data's Introduction

a workshop about Data

"...tidy datasets are all alike, but every messy dataset is messy in its own way" H. Wickham

Data is at the core of the scientific activity. Ecologists are managing larger and more complex data every day. There is an increasing trend to share data with collaborators, combine different datasets to synthesize existing knowledge, and deposit your data after publication to ensure reproducibility. However, there is relatively little guidance for how to manage and share datasets efficiently. Understanding how a dataset is structured and obtaining the right input format required by your statistical software often takes more time than the analysis itself. This workshop will go though the data life cycle. Planing and collecting data. Entering and storing data in electronic formats, clean and manipulate data and explore it before analysis. The workshop consists in three sections. First we will discuss about data in a question/answer forum. How do you clean data? Do you use metadata? Who owns your data? and a bit about how use tidy data. The tidy data concept is based in using variables stored in columns, observations in rows, and a single type of experimental unit per dataset. While this may seems trivial, in my experience is not. Second we will have a practical example on how to manipulate data in R (reshape, dplyr package and regular expresions). Third, we will see how to explore your data before analyzing it (also in R following Zuur et al. 2009 MEE). If, and only if, there is interest I can explain Git as a way to manage your entire workflow.

  • Find a messy dataset (and how to clean it) under example folder

    • NEW: dirty_data.csv is updloaded.
    • NEW: script to clean it is added
  • Find the slides used in the workshop under data.md file

    • Riikka and Vesna presentations added in PDF
  • Find code for following Zuur et al. (and more)

    • Data_exploration.R (and associated data inside the example folder)
  • This workshop is based on previous experiences and the following key references:

    -Tidy data paper

    -About Data management: 10 basic rules and a few more tips for data management

    -About Git

    -Other resources: DataOne; Prometheusresearch; Data is being lost; Practicaldatamanagement blog.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.