Code Monkey home page Code Monkey logo

provathon-2017's Introduction

Prov-a-thon: Practical Tools for Reproducible Science

Provenance information enables datasets that are linked to the software and analysis code that created them and used them in research. It allows users to trace new and ongoing uses of data, and provides rich information about the origins of data that ultimaltely supports reproducible research workflows. Prov-a-thon is a two-day workshop designed to advance practical approaches to incorporating provenance information into tools and workflows that are useful in earth, environmental, and archeological research domains.

Goals:

  • Learn about Whole Tale/DataONE/other provenance tools and reproducibility
  • Add provenance data for rich datasets into DataONE
  • Build interest amongst data creators/submitters about adding provenance data
  • Organize efforts about reproducibility training and evangelization in archaeology and environmental science
  • Stimulate coordination of the development/use of provenance and reproducibility tools

Background reading

Lowndes, J. S. S., B. D. Best, C. Scarborough, J. C. Afflerbach, M. R. Frazier, C. C. O’Hara, N. Jiang, and B. S. Halpern. 2017. Our path to better science in less time using open data science tools. Nature Ecology & Evolution 1:0160.

Marwick, B. 2017. Computational Reproducibility in Archaeological Research: Basic Principles and a Case Study of Their Implementation. Journal of Archaeological Method and Theory 24:424–450.

McPhillips, T., Song, T., Kolisnik, T., Aulenbach, S., Belhajjame, K., Bocinsky, R.K., Cao, Y., Cheney, J., Chirigati, F., Dey, S. and Freire, J., 2015. YesWorkflow: A User-Oriented, Language-Independent Tool for Recovering Workflow Information from Scripts. International Journal of Digital Curation, 10(1), pp.298-313.

Cao, Y., Jones, C., Cuevas-Vicenttín, V., Jones, M.B., Ludäscher, B., McPhillips, T., Missier, P., Schwalm, C., Slaughter, P., Vieglais, D. and Walker, L., 2016,June. DataONE: A Data Federation with Provenance Support. (extended preprint) In International Provenance and Annotation Workshop (pp. 230-234). Springer.

Ludäscher B, Chard K, Gaffney N, Jones M, Nabrzyski J, Stodden V, Turk M, Capturing the "Whole Tale" of Computational Research: Reproducibility in Computing Environments, Science Gateways Workshop, San Diego, 2016.

Resources links

Participants

  • Jamie Afflerbach
  • Kyle Bocinsky
  • Carl Boettiger
  • Emory Boose
  • Amber Budden
  • Peter Darch
  • Matt Harris
  • Linh Hoang
  • Xiaoliang Jiang
  • Chris Jones
  • Matt Jones
  • Eric Kansa
  • Josh London
  • Julie Lowndes
  • Bertram Ludäscher
  • Hui Lyu
  • Ben Marwick
  • Paulina Przystupa
  • Peter Slaughter
  • Pratik Srivastava
  • Dave Vieglais

Agenda

Day 1: Thursday, August 31, 2017

0715 - 0800 Breakfast

0800 - 1000 Welcome and Overviews (Room: Tamaya ABC)

  • 0800 - 0825 Overview of DataONE (Bill Michener, DataONE)

  • 0825 - 0845 Overview of Provenance (Bertram Ludäscher, UIUC)

    • Different notions and uses of provenance, reproducibility
  • 0845 - 0945 Overview of the Status of Provenance Tools (Matt Jones, NCEAS)

  • 0945 - 1000 Goals of Prov-a-thon (Dave Vieglais, DataONE)

1000 - 1030 Break

1030 - 1200 Introductions and Lightning Talks (Room: Eagle AB)

  • 1030 - 1050 Around the room introductions (Amber Budden, DataONE)

  • 1050 - 1150 Lightning talks: Provenance and Reproducible Workflows (Kyle Bocinsky, Whole Tale)

    • Kyle Bocinsky, Ben Marwick, Paulina Przystupa, Matt Harris, Eric Kansa, Carl Boettiger, Emory Boose, Josh London, Jamie Afflerbach, Julie Lowndes, Peter Darch (confirmed)
  • 1150 - 1200 Agenda review (Matt Jones)

1200 - 1300 Lunch

  • poster session featuring summer internships related to provenance (DataONE, Whole-Tale)
    • Xiaoliang Jiang, Linh Hoang, Hui Lyu, Pratik Srivastava

1300 - 1445 Provenance Tools I (Room: Eagle AB)

  • 1300 - 1400 Intro to the DataONE R provenance tools (Matt)

    • R libraries: dataone, datapack, recordr
  • 1400 - 1445 Intro to YesWorkflow (Bertram)

    • YW modeling exercise (Bertram)

1445 - 1515 Break

1515 - 1700 Provenance Tools II (Room: Eagle AB)

  • 1515 - 1700 Intro to the Whole Tale web tool (Bertram & Matt)

    • Hands on with WT tool, including importing data from DataONE

Day 2: Friday, September 1, 2017

0715 - 0800 Breakfast

0800 - 1000 Breakout Groups: Archaeology (Room: Eagle A), Environmental Science (Room: Eagle B)

  • Environmental Science (Jones)

    • Breakout agenda planning

    • Hands on provenance metadata writing activities, troubleshooting, usability

    • Identify future development directions (DataONE/YW/WT/rrtools/others?)

    • Discussion of barriers to reproducibility in environmental sciences

    • Planning for advocacy for reproducible research approaches in environmental science

  • Archaeology (Bocinsky)

    • Hands-on with WT/rrtools/opencontext/dataone — Building tales

    • Discussion of barriers to reproducibility in archaeology (generalizable to other disciplines; ideas below)

      • Lack of training in computational methods/reproducibility

      • Persistence of data hoarding/siloing

      • Data sensitivity & archaeological looting

      • Few “sticks” from journals/funding agencies/professional societies

      • Few “carrots” from journals/peers/tenure committees/funding agencies

  • Archaeology Goals (Bocinsky):

    • Tool assessment/usability feedback (YW/WT)

    • Identify future development directions (DataONE/YW/WT/rrtools/others?)

    • Create provenance records (DataONE)

    • Identify ways to promote reproducibility in the communities

    • Identify next steps/plans for further collaboration

    • How To Do Archaeological Science Using R book status update (Ben/Matt/Paulina/Kyle)

    • Intro and feedback on rrtools package (Ben)

    • Plan further advocacy for reproducibility in Archaeology (ideas below)

      • SAA committees (publications/curriculum)

      • SAA Events/forums/workshops?

      • Collaborations with open journals?

        • White paper on data access and reproducibility for arch. journals?

1000 - 1030 Break

1030 - 1200 Continued Breakout Sessions

1200 - 1300 Lunch

1300 - 1445 Continued Breakout Sessions

1445 - 1515 Break

1515 - 1700 Plenary: Reproducibility and Provenance for Science

  • Report back from breakout groups (10 minutes each)

  • Moderated Discussion (Kyle & Matt):

    • Evangelism and advocacy for reproducible research in general

    • The tool landscape supporting reproducible research

  • Next steps

    • Roadmap for DataONE and Whole Tale tool development

      • How can the working group(s) help?
    • How to get buy-in from data contributors at the user level

    • How to build a community pulling towards the same reproducible research goals

    • What do we collectively want to do next

  • Conclusion, establish report back mechanism

Day 3: Saturday, September 2, 2017

0730 - 0930 Breakfast: Reflections, follow-up

provathon-2017's People

Contributors

aebudden avatar bocinsky avatar ludaesch avatar mbjones avatar mrecos avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.