Code Monkey home page Code Monkey logo

purplequery's Introduction

PurpleQuery

Overview

PurpleQuery is a partial implementation of BigQuery for testing.  All table creation and querying happens in-memory. The parser is a straightforward translation of the published BigQuery syntax, and the backend is based on Pandas.  The provided API client is a drop-in replacement for the Google Python BigQuery client.

PurpleQuery also integrates smoothly with Verily's analysis-py-utils BigQuery wrapper, which provides a simplified API for interacting with BigQuery.

PurpleQuery is compatible with Python 2 and 3.

Installation

pip install git+https://github.com/verilylifesciences/purplequery.git

Supported Features

PurpleQuery supports

  • WITH ... SELECT [DISTINCT] [EXCEPT] [REPLACE] ... FROM ... JOIN ... WHERE ... GROUP BY ... ORDER BY ... LIMIT ...
  • INNER, OUTER, CROSS joins
  • UNION ALL
  • Analytic functions
  • Arithmetic expressions

Functions including

  • ARRAY
  • ARRAY_AGG
  • CASE
  • CAST
  • CONCAT
  • COUNT
  • EXISTS
  • EXTRACT
  • IF
  • (NOT) IN
  • IS (NOT) NULL
  • MAX
  • MIN
  • MOD
  • NOT
  • ROW_NUMBER
  • STRUCT
  • SUM
  • TIMESTAMP

Unsupported features

  • UNNEST columns (UNNEST array expressions are supported)
  • many functions
  • set operations besides UNION ALL
  • UPDATE and other mutation operations in queries (these are supported via the Python API)
  • Window identifiers and window frames for analytic function calls

Usage

from google.cloud.bigquery import Dataset, DatasetReference, Table, TableReference
from google.cloud.bigquery.job import QueryJobConfig
from google.cloud.bigquery.schema import SchemaField
from purplequery.client import Client

bq_client = Client('my_project')

dataset_ref = DatasetReference('my_project', 'dataset1')
bq_client.create_dataset(Dataset(dataset_ref))
table = Table(TableReference(dataset_ref, 'table1'),
              [SchemaField(name='num', field_type='INTEGER'),
               SchemaField(name='ID', field_type='STRING')])
bq_client.create_table(table)
bq_client.insert_rows(table, [{'num': 1, 'ID': 'first'},
                              {'num': 2, 'ID': 'second'}])
job = bq_client.query('SELECT * FROM `my_project.dataset1.table1`',
                      QueryJobConfig())
rows = [list(row.values()) for row in job.result()]

print rows
[[1, 'first'], [2, 'second']]

Usage with bigquery_wrapper

To use with the Verily BigQuery wrapper, use the alternate_bq_client_class parameter to use PurpleQuery instead, and the self.bq_client object available to test methods will use PurpleQuery instead of the Google BigQuery client.

from verily.bigquery_wrapper import bq_test_case
from client import Client as FakeClient

class MyTest(bq_test_case.BQTestCase):

    @classmethod
    def setUpClass(cls, use_mocks=False):
        super(Mytest, cls).setUpClass(
            use_mocks=use_mocks,
            alternate_bq_client_class=FakeClient)

    ...

License

See the license

Contributing

See instructions for contributing to this project

Troubleshooting

PurpleQuery is not a complete implementation of BigQuery. For (rare) cases where an SQL statement is not understood by PurpleQuery's parser, the result will be an exception Could not fully parse query, followed by the tokens following the section that could be parsed. Typically the beginning of this section will be the syntactic element that PurpleQuery does not understand. See grammar.py for the grammar understood; feel free to extend it!

NotImplementedError is raised when a syntactic element is parsed but is not implemented in the backend, e.g. a function. See evaluatable_node.py to add a new function or other expression element.

Running tests

Clone the repository:

git clone https://github.com/verilylifesciences/purplequery.git

Build the docker container:

docker build -t purplequery purplequery/

Run the tests in the docker container:

docker run --rm -ti -v `pwd`/purplequery:/pq -w /pq purplequery /pq/run_tests.sh

purplequery's People

Contributors

calbach avatar ebreck avatar laurendukes avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.