Code Monkey home page Code Monkey logo

dataknots.jl's Introduction

DataKnots.jl

DataKnots is a Julia library for querying data with an extensible, practical and coherent algebra of query combinators.

Documentation Build Status Process
Stable Documentation Development Documentation Linux/OSX Build Status Windows Build Status Code Coverage Status Chat on Gitter Open Issues MIT License

DataKnots is designed to let data analysts and other accidental programmers query and analyze complex structured data.

Showcase

Let's take some Chicago public data and convert it into a DataKnot.

using DataKnots, CSV

employee_csv_file = """
    name,department,position,salary
    "JEFFERY A","POLICE","SERGEANT",101442
    "NANCY A","POLICE","POLICE OFFICER",80016
    "JAMES A","FIRE","FIRE ENGINEER-EMT",103350
    "DANIEL A","FIRE","FIRE FIGHTER-EMT",95484
    "BRENDA B","OEMC","TRAFFIC CONTROL AIDE",64392
    """ |> IOBuffer |> CSV.File

chicago = DataKnot(:employee => employee_csv_file)

We could then query this data to return employees with salaries greater than their department's average.

using Statistics: mean

@query chicago begin
    employee
    group(department)
    keep(avg_salary => mean(employee.salary))
    employee
    filter(salary > avg_salary)
end
#=>
  │ employee                                         │
  │ name       department  position           salary │
──┼──────────────────────────────────────────────────┼
1 │ JAMES A    FIRE        FIRE ENGINEER-EMT  103350 │
2 │ JEFFERY A  POLICE      SERGEANT           101442 │
=#

In this example, nouns, such as employee, department and salary, are query primitives. The verbs, such as group, keep, mean and filter are query combinators. Query expressions, such as group(department), are built from existing queries by applying these combinators.

Queries could also be constructed with pure Julia code, without using macros. The query above could be equivalently written:

using Statistics: mean

chicago[It.employee >>
        Group(It.department) >>
        Keep(:avg_salary => mean.(It.employee.salary)) >>
        It.employee >>
        Filter(It.salary .> It.avg_salary)]
#=>
  │ employee                                         │
  │ name       department  position           salary │
──┼──────────────────────────────────────────────────┼
1 │ JAMES A    FIRE        FIRE ENGINEER-EMT  103350 │
2 │ JEFFERY A  POLICE      SERGEANT           101442 │
=#

Objectives

DataKnots implements an algebraic query interface of Query Combinators. This algebra’s elements, or queries, represent relationships among class entities and data types. This algebra’s operations, or combinators, are applied to construct query expressions.

We seek to prove that this query algebra has significant advantages over the state of the art:

  • DataKnots is a practical alternative to SQL with a declarative syntax; this makes it suitable for use by domain experts.

  • DataKnots' data model handles nested and recursive structures (unlike DataFrames or SQL); this makes it suitable for working with CSV, JSON, XML, and SQL databases.

  • DataKnots has a formal semantic model based upon monadic composition; this makes it easy to reason about the structure and interpretation of queries.

  • DataKnots is a combinator algebra (like XPath but unlike LINQ or SQL); this makes it easier to assemble queries dynamically.

  • DataKnots is fully extensible with Julia; this makes it possible to specialize it into various domain specific query languages.

Support

At this time, while we welcome feedback and contributions, DataKnots is not yet usable for general audiences.

Our development chat is currently hosted on Gitter: https://gitter.im/rbt-lang/rbt-proto

Current documentation could be found at: https://rbt-lang.github.io/DataKnots.jl/stable/

dataknots.jl's People

Contributors

clarkevans avatar xitology avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.