Code Monkey home page Code Monkey logo

adl-benchmarks-index's Introduction

Introduction

This repository is intended to maintain a list of common agreed-upon benchmark analysis tasks that can be used to exemplify, test, and compare different languages and approaches used for analysis. Also listed here are public data files available to run these benchmarks on and the repositories of actual implementations of these benchmarks.

Functionality benchmarks

  1. Plot the ETmiss of all events.
  2. Plot the pT of all jets.
  3. Plot the pT of jets with |η| < 1.
  4. Plot the ETmiss of events that have at least two jets with pT > 40 GeV.
  5. Plot the ETmiss of events that have an opposite-charge muon pair with an invariant mass between 60 and 120 GeV.
  6. For events with at least three jets, plot the pT of the trijet four-momentum that has the invariant mass closest to 172.5 GeV in each event and plot the maximum b-tagging discriminant value among the jets in this trijet.
  7. Plot the scalar sum in each event of the pT of jets with pT > 30 GeV that are not within 0.4 in ΔR of any light lepton with pT > 10 GeV.
  8. For events with at least three light leptons and a same-flavor opposite-charge light lepton pair, find such a pair that has the invariant mass closest to 91.2 GeV in each event and plot the transverse mass of the system consisting of the missing tranverse momentum and the highest-pT light lepton not in this pair.

For the motivations behind these benchmarks, see motivation.md. For a technical reference of the terms used in the benchmarks, see reference.md.

Input data files

Language implementations

Repository Language Description
opendata-benchmarks RDataFrame RDataFrame is a componenent of ROOT that provides a high-level interface for analyzing TTrees and other data formats. Each task is solved with a simpler syntax useful in interpreted ROOT macros as well as a fully compiled C++ syntax for best performance.
nail NAIL (Natual Analysis Implementation Language)
groot Go Part of the Go-HEP project, groot is a pure Go package that provides read/write access to ROOT files
coffea Python + Numpy Coffea builds on numpy and awkward-array for columnar data analysis in Python
bamboo Python + RDataFrame The bamboo analysis framework provides a high-level Python interface to RDataFrame (technically an embedded domain-specific language)
Rumble JSONiq (an XQuery dialect for JSON data) Most data in ROOT files can be exposed in the JSON data model and can thus be processed by JSONiq. This implementation is targeted to be run on Rumble, a JSONiq implementation on top of Spark, but could be run by any other JSONiq processor.
BigQuery BigQuery's dialect of SQL SQL is arguably the most wide-spread language for querying structured data. Since SQL:1999, it supports arrays and structured types and is thus, in principle, suited for typical HEP analyses, though not many implementations support these features. BigQuery's dialect is based on SQL:2011, supports the mentioned features, and has a few additional language constructs that make queries more concise.
PrestoDB PrestoDB's dialect of SQL Like BigQuery, Presto has some support for arrays and structured types; however, it only has limited support for nested queries and a more verbose syntax than BigQuery.
Amazon Athena Athena's dialect of SQL Athena is a fully-managed Query-as-a-Service system based on PrestoDB with attractive scalability and pricing but a few more limitations than Presto (most importantly, no support for user-defined functions).
SQL++ (AsterixDB) SQL++ AsterixDB is a Big Data platform specialized for semi-structured data. Its query language is thus designed to deal with nested data intuitively.

Adding new benchmarks, data, or implementations

  • Additional benchmarks or public data files can be suggested as GitHub issues on this project to start a discussion within the HSF Data Analysis Working Group community.
  • Suggested modifications to the layout of this repository are also welcome as new GitHub issues.
  • If you would like to add a repository with a new implementation of the benchmarks, go ahead and submit a pull request with the proposed changes.

adl-benchmarks-index's People

Contributors

eguiraud avatar ingomueller-net avatar masonproffitt avatar mat-adamec avatar pieterdavid avatar sbinet avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.