CaseForOpenTraitData

In the first week of September, contributors to the TRY trait database are meeting to discuss the future of TRY. Currently, TRY includes over 3 million trait records for about 69000 plant species, with contributions from 372 participants from 179 scientific institutes worldwide. This collaborative effort is a fantastic achievement that can rightly be celebrated. At the same time, many people both within and outside the TRY community have concerns about the model of data sharing currently adopted by TRY. One of the issues slated for discussion at the September method is whether TRY should move to a more open access model of data distribution.

The purpose of this document is to outline the potential benefits of moving to a fully open access model of data sharing within ecology, and share ideas on how this could be achieved. We understand that such a shift might pose significant infrastructural and political challenges. Yet, we strongly believe that an open model is in the public's best interests and would vastly increase the ability of trait-based research to solve key questions in ecology, evolution and conservation. Thus, while recognising the great things TRY has achieved, we encourage the members of TRY to take the next step, by moving towards an open model of data contribution and usage. We are certain any challenges that arise in moving to an open model could be overcome, if the community were to agree on the vision of an open model.

Contributions to this document are welcome, but should be constructive, recognising all the positive energy and effort that has gone into creating TRY, while also providing a vision for what it might be. The aim is to make a persuasive case for open trait data. Even if the organisers of TRY decide not to move towards an open access model, this document may help inspire others to take on that challenge.

Many people have thought and written about open access models. Thus the purpose of this document is simply to summarise the different arguments and point to relevant literature.

Key references:

White, E.P. & et al. Nine simple ways to make it easier to (re)use your data. PeerJ PrePrints, 1, e7v2. DOI: 10.7287/peerj.preprints.7v2
Poisot, T., Mounce, R. & Gravel, D. (2013) Moving toward a sustainable ecological science: don’t let data go to waste! DOI: 10.6084/m9.figshare.693745
Duke, C.H. & Poorter, J.H. (2013) The Ethics of Data Sharing and Reuse in Biology. BioScience, 63, 483–489. DOI: 10.1525/bio.2013.63.6.10
Costello, M.J., Michener, W.K., Gahegan, M., Zhang, Z.-Q. & Bourne, P.E. (2013) Biodiversity data should be published, cited, and peer reviewed. Trends in Ecology & Evolution, 28, 454–461. DOI: 10.1016/j.tree.2013.05.002
Lathrop, R.H., Rost, B., ISCB Membership, ISCB Executive Committee, ISCB Board of Directors & ISCB Public Affairs Committee. (2011) ISCB Public Policy Statement on Open Access to Scientific and Technical Research Literature. PLoS Comput Biol, 7, e1002014.DOI: 10.1371/journal.pcbi.1002014
Piwowar, H.A., Day, R.S. & Fridsma, D.B. (2007) Sharing Detailed Research Data Is Associated with Increased Citation Rate. PLoS ONE, 2, e308. DOI: 10.1371/journal.pone.0000308
Piwowar, H.A., Vision, T.J. & Whitlock, M.C. (2011) Data archiving is a good investment. Nature, 473, 285–285.DOI: 10.1038/473285a

Key reasons why we should support a fully open model of data sharing in ecology

Accelerating science

We are all united by a joint goal to discover key facts about natural world and adress challenges that face us in a changing world. Experience has shown that when field becomes open, moves forward very quickly.

Examples?

Great TED talk: Open science now! Accelerate science = share data, code, ideas
??

Transparency and reproducibility

Big move towards reproducibility. This requires data to be archived with paper, which is not possible under current TRY model.

Keeping up with the move to open data in other fields

Ecology journals soon require data archiving, already the case for many evolutionary journals. This won't be possible under current model of data access provided in TRY, so users of TRY may be prevented from publishing articles.

The moral arguments

Public money = public data

As research scientists we are funded by public money, so ultimately our responsibility

Built on generosity of others

Many traits compilations built from data that was made freely available in literature. In this good spirit, the compilations should also be open access.

Liken to GNU public license - free to use, provided porducts are also free to use.

Tit-for-tat

Trait ecologists routinely use a whole suite of open data resources, including:

genetic sequence data on genbank
climate data
soils data
distribution data, from source such as GBIF
phylogenetic trees: tree of life, etc
data from
remote sensing data from NASA

In addition, ecologists benefit from a wide range of open tools

statistical packages, especially R

All of these resources are provided free and allow scientists to achieve far more than might otherwise be possible. Ecological data has same potential. It remains unclear why trait data should be considered

Closed access models have corrosive effect on field

In early 2000's many trait datasets were provided in appendices to papers. Examples

Falster et al 2005

This type of activity is discouraged under the current model of data sharing in TRY. If people perceive a perceived benefit of keeping data closed (in form of co-authorship on papers), they may be less likely to share data.

Moreover, closed data models, where co-authorship is offered to data contributors, increase the competitiveness in field, and drive towards high publication number. This is something almost all ecologists dislike, yet many are actively making the situation worse.

Common concerns about open access models

Openness is bad for my career

Examples to counter this

Amy Zanne's wood density database most downloaded resource on datadryad. has had very positive effects on her career.

that data was hard to collect

yes, but a lot of data is lost due to changing technology.

Need to protect poor phd students

Which license to use

Alternative ways of providing open access data

Here we describe some of the alternative models that might be considered for an open-access model of data sharing.

Centralised databases

Examples:

Genbank
climate data
climate modellers

Requires substantial and reliable ongoing institutional support.

Distributed model

Ecodataretriever

Places to publish data: figshare, dryad.

ethanwhite / caseforopentraitdata Goto Github PK