Code Monkey home page Code Monkey logo

activecube's Introduction

Activecube: Multi-Dimensional Queries with Rails

Activecube is the library to make multi-dimensional queries to data warehouse, such as:

Cube.slice(
    date: cube.dimensions[:date][:date].format('%Y-%m'),
    currency: cube.dimensions[:currency][:symbol]
).measure(:count).
when(cube.selectors[:currency].in('USD','EUR')).to_sql

Cube, dimensions, metrics and selectors are defined in the Model, similary to ActiveRecord.

Activecube uses Rails ActiveRecord in implementation.

In particular, you have to define all tables, used in Activecube, as ActiveRecord tables.

Installation

Add this line to your application's Gemfile:

gem 'activecube'

And then execute:

$ bundle

Or install it yourself as:

$ gem install activecube

Usage

Basic steps to use ActiveCube are:

  1. Define your database schema in models, as you do with Rails
  2. Setup connection properties to data warehouse in config/database.yml. You can use multiple connections if you use Rails 6 or higher
  3. Define cubes in models, sub-classed from Activecube::Base. Look spec/models/test/transfers_cube.rb as example
  4. Make queries to the cubes

Check spec/cases/activecube_spec.rb for more examples.

Cube definition

Cube defined using the following attributes:

  • table specifies, which physical database tables can be considered to query
    table TransfersCurrency
    table TransfersFrom
    table TransfersTo
  • dimension specifies classes used for slicing the cube
    dimension date: Dimension::Date,
              currency: Dimension::Currency

Or

    dimension date: Dimension::Date
    dimension currency: Dimension::Currency
  • metric specifies which results expected from the cube queries
    metric amount: Metric::Amount,
           count: Metric::Count

Or

    metric amount: Metric::Amount
    metric count: Metric::Count
  • selector is a set of expressions, which can filter results
    selector currency: CurrencySelector,
             transfer_from: TransferFromSelector,
             transfer_to: TransferToSelector

Table definition

Tables are defined as regular active records, with additional optional attribute 'index':

index 'currency_id', cardinality: 4

which means that the table has an index onm currency_id field, with average number of different entries of 10,000 ( 10^4). This creates a hint for optimizer to build queries.

Indexes can span multiple fields, as

index ['currency_id','date'], cardinality: 6

Note, that if you created combined index in database, you most probable will need to define all indexed combinations, for example:

index ['currency_id'], cardinality: 4
index ['currency_id','date'], cardinality: 6

You can require using index in some cases. If required: true added, the table will be used only in case when this field is used in query metric, dimension or selector.

index ['currency_id'], cardinality: 4, required: true

Query language

You use the cube class to create and execute queries.

Queries can be expressed as Arel query, SQL or executed against the database, returning results.

The methods used to contruct the query:

  • slice defines which dimensions slices the results
  • measure defines what to measure
  • when defines which selectors to apply
  • desc, asc, offset, limit are for ordering and limiting result set

After the query contructed, the following methods can be applied:

  • to_sql to generate String SQL query from cube query
  • to_query to generate Arel query
  • query to execute query and return ResultSet

Managing Connections

You can control the connection used to construct and execute query by ActiveRecord standard API:

ApplicationRecord.connected_to(database: :data_warehouse) do
      cube = My::TransfersCube
      cube.slice(
              date: cube.dimensions[:date][:date].format('%Y-%m'),
              currency: cube.dimensions[:currency][:symbol]
      ).measure(:count).query
    end

will query using data_warehouse configuraton.

Alternatively you can use the method provided by activecube. It will make the connection for the model or abstract class, which is super class for your models:

My::TransfersCube.connected_to(database: :data_warehouse) do |cube|
      cube.slice(
              date: cube.dimensions[:date][:date].format('%Y-%m'),
              currency: cube.dimensions[:currency][:symbol]
      ).measure(:count).query
    end

How it works

When you construct and execute cube query with any outcome ( sql, Arel query or ResultSet), the same sequence of operations happen:

  1. Cube is collecting the query into a set of objects from the chain method call;
  2. Query is matched against the physical tables, the tables are selected that can serve the query or its part. For example, one table can provide one set of metrics, and the other can provide remaining;
  3. If possible, the variant is selected from all possible options, which uses indexes with the most cardinality
  4. Query is constructed using Arel SQL engine ( included in ActiveRecord ) using selected tables, and possibly joins
  5. If requested, the query is converted to sql ( using Arel visitor ) or executed with database connection

Optimization

The optimization on step #3 try to minimize the total cost of execution:

Formula min max)

where

Formula cost

Optimization is done using the algorithm, which checks possible combinations of metrics and tables.

Development

After checking out the repo, run bin/setup to install dependencies. Then, run rake spec to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and tags, and push the .gem file to rubygems.org.

RSPec tests

To run tests, you need clickhouse server installation. Tests use database 'test' that have to be created in clickhouse as:

CREATE DATABASE test;

Check credentials for connection in spec/spec_helper.rb file. By default clickhouse must reside on "clickhouse" server name, port 8123 with the default user access open.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/bitquery/activecube. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.

License

The gem is available as open source under the terms of the MIT License.

Code of Conduct

Everyone interacting in the Activecube project’s codebases, issue trackers, chat rooms and mailing lists is expected to follow the code of conduct.

activecube's People

Contributors

astudnev avatar ivanov17andrey avatar wittyjudge avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

jbwl bunsdev

activecube's Issues

undefined method `connected_to' for MyCube:Module

Hello, this is a very interesting project. Trying to implement it in my Rails 6 App isn't straightforward. Is there any other documentation than the Spec example? I am having a hard time making it work outside of the example app.

The specific problem I'm running into is:

undefined method connected_to' for MyCube:Module` when trying to query my cube. I'm not sure if my Cube class needs to be inside a module?

It would be great if you could extend the ReadMe to show the setup (what files go where) in a Rails app. Thank you for any help.

Run rake spec error

spec/spec_helper.rb config:

ActiveRecord::Base.configurations = HashWithIndifferentAccess.new(
    default: {
      adapter: 'clickhouse',
      host: 'localhost',
      port: 8123,
      database: 'test',
      username: 'default',
      password: nil
    }
  )

  ActiveRecord::Base.establish_connection(:default)
end

run "sudo rake spec"
return error:

Failures:

  1) Activecube context executes in context
     Failure/Error: ActiveRecord::MigrationContext.new(MIGRATIONS_PATH, ActiveRecord::Base.connection.schema_migration).up
     
     ArgumentError:
       wrong number of arguments (given 3, expected 2)
     # /var/lib/gems/3.0.0/gems/activerecord-6.0.2.1/lib/active_record/connection_adapters/abstract/schema_definitions.rb:260:in `initialize'
     # /var/lib/gems/3.0.0/bundler/gems/clickhouse-activerecord-7d638fee3548/lib/active_record/connection_adapters/clickhouse/schema_statements.rb:95:in `new'
     # /var/lib/gems/3.0.0/bundler/gems/clickhouse-activerecord-7d638fee3548/lib/active_record/connection_adapters/clickhouse/schema_statements.rb:95:in `create_table_definition'
     # /var/lib/gems/3.0.0/gems/activerecord-6.0.2.1/lib/active_record/connection_adapters/abstract/schema_statements.rb:295:in `create_table'
     # /var/lib/gems/3.0.0/gems/activerecord-6.0.2.1/lib/active_record/schema_migration.rb:33:in `create_table'
     # /var/lib/gems/3.0.0/gems/activerecord-6.0.2.1/lib/active_record/migration.rb:1206:in `initialize'
     # /var/lib/gems/3.0.0/gems/activerecord-6.0.2.1/lib/active_record/migration.rb:1061:in `new'
     # /var/lib/gems/3.0.0/gems/activerecord-6.0.2.1/lib/active_record/migration.rb:1061:in `up'
     # ./spec/cases/activecube_spec.rb:3:in `block (2 levels) in <top (required)>'

Clarify Managing Connections in Readme

I am using this interesting project in a Rails app. But I'm not sure why the "Managing Connections" section in the Readme:
https://github.com/bitquery/activecube#managing-connections is necessary.
If I implement it like described it will hard-select the Cube's connection for my rails app and subsequent queries to another model using a different connection and / or database (i.e. with a MySQL connection) will fail with "Table does not exist".

The Cube query works without explicitly setting the connection because the connection is already determined in the cube's table via ActiveRecord - so I was wondering if I'm missing something important or why setting the connection is necessary as shown in the Readme.

Thank you for the help and clarification!

ORDER BY WITH FILL supported for date histogram queries?

I could not find an answer in the docs for this question - when I create a Query with ActiveCube for a date histogram, I would like to use the very convenient option to fill empty days in the result. Clickhouse offers this: https://clickhouse.com/docs/en/sql-reference/statements/select/order-by#order-by-expr-with-fill-modifier

Is it possible now to add WITH FILL to the Cube's ORDER BY statement? If not, I would like to request this feature.

Thank you for this great library!

undefined method `limit_by`

I am trying to apply a limit_by to my cube. I'm having a hard time figuring out the arguments to pass.
My dimension is called product_id, and I want to limit it to 10 results:

I tried:

cube = cube.limit_by(['product_id'],[10])
and
cube = cube.limit_by(['product_id',10])
Both fail with:
undefined method limit_by for #<Arel::SelectManager>

Looking at the source, I also tried
cube = cube.limit_by([{ each: ['productid'],limit: [10]}])
Which also fails with
undefined method limit_by for #<Arel::SelectManager>

Could you please give an example how to use the limit_by option?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.