Code Monkey home page Code Monkey logo

caseforopentraitdata's Introduction

CaseForOpenTraitData

In the first week of September, contributors to the TRY trait database are meeting to discuss the future of TRY. Currently, TRY includes over 3 million trait records for about 69000 plant species, with contributions from 372 participants from 179 scientific institutes worldwide. This collaborative effort is a fantastic achievement that can rightly be celebrated. At the same time, many people both within and outside the TRY community have concerns about the model of data sharing currently adopted by TRY. One of the issues slated for discussion at the September method is whether TRY should move to a more open access model of data distribution.

The purpose of this document is to outline the potential benefits of moving to a fully open access model of data sharing within ecology, and share ideas on how this could be achieved. We understand that such a shift might pose significant infrastructural and political challenges. Yet, we strongly believe that an open model is in the public's best interests and would vastly increase the ability of trait-based research to solve key questions in ecology, evolution and conservation. Thus, while recognising the great things TRY has achieved, we encourage the members of TRY to take the next step, by moving towards an open model of data contribution and usage. We are certain any challenges that arise in moving to an open model could be overcome, if the community were to agree on the vision of an open model.

Contributions to this document are welcome, but should be constructive, recognising all the positive energy and effort that has gone into creating TRY, while also providing a vision for what it might be. The aim is to make a persuasive case for open trait data. Even if the organisers of TRY decide not to move towards an open access model, this document may help inspire others to take on that challenge.

Many people have thought and written about open access models. Thus the purpose of this document is simply to summarise the different arguments and point to relevant literature.

Key references:

  • White, E.P. & et al. Nine simple ways to make it easier to (re)use your data. PeerJ PrePrints, 1, e7v2. DOI: 10.7287/peerj.preprints.7v2
  • Poisot, T., Mounce, R. & Gravel, D. (2013) Moving toward a sustainable ecological science: don’t let data go to waste! DOI: 10.6084/m9.figshare.693745
  • Duke, C.H. & Poorter, J.H. (2013) The Ethics of Data Sharing and Reuse in Biology. BioScience, 63, 483–489. DOI: 10.1525/bio.2013.63.6.10
  • Costello, M.J., Michener, W.K., Gahegan, M., Zhang, Z.-Q. & Bourne, P.E. (2013) Biodiversity data should be published, cited, and peer reviewed. Trends in Ecology & Evolution, 28, 454–461. DOI: 10.1016/j.tree.2013.05.002
  • Lathrop, R.H., Rost, B., ISCB Membership, ISCB Executive Committee, ISCB Board of Directors & ISCB Public Affairs Committee. (2011) ISCB Public Policy Statement on Open Access to Scientific and Technical Research Literature. PLoS Comput Biol, 7, e1002014.DOI: 10.1371/journal.pcbi.1002014
  • Piwowar, H.A., Day, R.S. & Fridsma, D.B. (2007) Sharing Detailed Research Data Is Associated with Increased Citation Rate. PLoS ONE, 2, e308. DOI: 10.1371/journal.pone.0000308
  • Piwowar, H.A., Vision, T.J. & Whitlock, M.C. (2011) Data archiving is a good investment. Nature, 473, 285–285.DOI: 10.1038/473285a

Key reasons why we should support a fully open model of data sharing in ecology

Accelerating science

We are all united by a joint goal to discover key facts about natural world and adress challenges that face us in a changing world. Experience has shown that when field becomes open, moves forward very quickly.

Examples?

Transparency and reproducibility

Big move towards reproducibility. This requires data to be archived with paper, which is not possible under current TRY model.

Keeping up with the move to open data in other fields

Ecology journals soon require data archiving, already the case for many evolutionary journals. This won't be possible under current model of data access provided in TRY, so users of TRY may be prevented from publishing articles.

The moral arguments

Public money = public data

As research scientists we are funded by public money, so ultimately our responsibility

Built on generosity of others

Many traits compilations built from data that was made freely available in literature. In this good spirit, the compilations should also be open access.

Liken to GNU public license - free to use, provided porducts are also free to use.

Tit-for-tat

Trait ecologists routinely use a whole suite of open data resources, including:

  • genetic sequence data on genbank
  • climate data
  • soils data
  • distribution data, from source such as GBIF
  • phylogenetic trees: tree of life, etc
  • data from
  • remote sensing data from NASA

In addition, ecologists benefit from a wide range of open tools

  • statistical packages, especially R

All of these resources are provided free and allow scientists to achieve far more than might otherwise be possible. Ecological data has same potential. It remains unclear why trait data should be considered

Closed access models have corrosive effect on field

In early 2000's many trait datasets were provided in appendices to papers. Examples

This type of activity is discouraged under the current model of data sharing in TRY. If people perceive a perceived benefit of keeping data closed (in form of co-authorship on papers), they may be less likely to share data.

Moreover, closed data models, where co-authorship is offered to data contributors, increase the competitiveness in field, and drive towards high publication number. This is something almost all ecologists dislike, yet many are actively making the situation worse.

Common concerns about open access models

Openness is bad for my career

Examples to counter this

  • Amy Zanne's wood density database most downloaded resource on datadryad. has had very positive effects on her career.

that data was hard to collect

yes, but a lot of data is lost due to changing technology.

Need to protect poor phd students

Which license to use

Alternative ways of providing open access data

Here we describe some of the alternative models that might be considered for an open-access model of data sharing.

Centralised databases

Examples:

  • Genbank
  • climate data
  • climate modellers

Requires substantial and reliable ongoing institutional support.

Distributed model

Ecodataretriever

Places to publish data: figshare, dryad.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.