Code Monkey home page Code Monkey logo

compassql's Introduction

CompassQL

CompassQL is a visualization query language that powers chart specifications and recommendations in Voyager 2.

As described in our vision paper and Voyager 2 paper, a CompassQL query is a JSON object that contains the following components:

  • Specification (spec) for describing a collection of queried visualizations. This spec's syntax follows a structure similar to Vega-Lite's single view specification. However, spec in CompassQL can have enumeration specifiers (or wildcards) describing properties that can be enumerated.1

  • Grouping/Nesting method names (groupBy and nest) for grouping queried visualizations into groups or hierarchical groups.

  • Ranking method names (orderBy and chooseBy) for ordering queried visualizations or choose a top visualization from the collection.

  • Config (config) for customizing query parameters.

Internally, CompassQL engine contains a collection of constraints for enumerating a set of candidate visualizations based on the input specification, and methods for grouping and ranking visualization.

For example, the following CompassQL query has one wildcard for the mark property. The system will automatically generate different marks and choose the top visual encodings based on the effectiveness score.

{
  "spec": {
    "data": {"url": "data/cars.json"},
    "mark": "?",
    "encodings": [
      {
        "channel": "x",
        "aggregate": "mean",
        "field": "Horsepower",
        "type": "quantitative"
      },{
        "channel": "y",
        "field": "Cylinders",
        "type": "ordinal"
      }
    ]
  },
  "chooseBy": "effectiveness"
}

The examples/specs directory contains a number of example CompassQL queries.

To understand more about the structure of a CompassQL Query, look at the Query interface declaration.

  • A query's spec property implements SpecQuery interface, which follows the same structure as Vega-Lite's UnitSpec (single view specification) but most of SpecQuery's properties have -Query suffixes to hint that its instance is a query that can contain wildcards to describe a collection of specifications.
  • Since multiple encoding channels can be a wildcard, the encoding object in Vega-Lite is flatten as encodings which is an array of Encoding in CompassQL's spec.

Usage

Given a row-based array of data object, here are the steps to use CompassQL:

  1. Specify a query config (or use an empty object to use the default configs)
var opt = {}; // Use all default query configs

For all query configuration properties, see src/config.ts.

  1. Build a data schema.
var schema = cql.schema.build(data);

The data property is a row-based array of data objects where each object represents a row in the data table (e.g., [{"a": 1, "b":2}, {"a": 2, "b": 3}]).

You can reuse the same schema for querying the same dataset multiple times.

  1. Specify a query. For example, this is a query for automatically selecting a mark:
var query = {
  spec: {
    data: { url: "node_modules/vega-datasets/data/cars.json" },
    mark: "?",
    encodings: [
      {
        channel: "x",
        aggregate: "mean",
        field: "Horsepower",
        type: "quantitative",
      },
      {
        channel: "y",
        field: "Cylinders",
        type: "ordinal",
      },
    ],
  },
  chooseBy: "effectiveness",
};
  1. Execute a CompassQL query.
var output = cql.recommend(query, schema);
var result = output.result; // recommendation result

The result object is an instance of SpecQueryModelGroup (ResultGroup<SpecQueryModel>), which is a root of the output ordered tree. Its items property can be either an array of SpecQueryModel or an array of SpecQueryModelGroup (for hierarchical groupings).

The SpecQueryModel is an class instance of a SpecQuery with helper methods. Note that, in the result, all of spec query models are completely enumerated and there would be no wildcard left.

  1. Convert instances of SpecQueryModel in the tree, using SpecQueryModel's toSpec() class method and the mapLeaves method.
var vlTree = cql.result.mapLeaves(result, function (item) {
  return item.toSpec();
});
  1. Now you can use the result. In this case, the tree has only 2 levels (the root and leaves). We can just get the top visualization by accessing the 0-th item.

For a full source code, please see index.html.

var topVlSpec = vlTree.items[0];

Note for Developers

  • The root file of our project is src/cql.ts, which defines the top-level namespace cql for the compiled files. Other files under src/ reflect namespace structure. All methods for cql.xxx will be in either src/xxx.ts or src/xxx/xxx.ts. For example, cql.util.* methods are in src/util.ts, cql.query is in src/query/query.ts.

  • TODO: constraints

    • List in Vy2 paper supplement..

Development Instructions

You can install dependencies with:

yarn install

You can use the following npm commands such as

npm run build
npm run lint
npm run test
npm run cover       // see test coverage  (see coverage/lcov-report/index.html)
npm run watch       // watcher that build, lint, and test
npm run test-debug  // useful for debugging unit-test with vscode
npm run clean       // useful for wiping out js files that's created from other branch

(See package.json for Full list of commands.)

To play with latest CompassQL in the vega-editor, use branch cql-vl3 in kanitw's fork, which has been updated to use Vega-Lite 3, Vega 5, and CompassQL ^0.21.1. (For CompassQL 0.7 or older, use branch compassql, which uses Vega-Lite 1.x).

Make sure to link CompassQL to the editor

cd COMPASSQL_DIR
npm link

cd VEGA_EDITOR_DIR
npm run vendor -- -l compassql

(You might want to link your local version of Vega-Lite as well.)

Main API

The main method is cql.recommend, which is in src/recommend.ts.

Directory Structure

  • examples - Example CompassQL queries
    • examples/specs – All JSON files for CompassQL queries
    • examples/cql-examples.json - A json files listing all CompasssQL examples that should be shown in Vega-editor.
  • src/ - Main source code directory.
    • src/cql.ts is the root file for CompassQL codebase that exports the global cql object. Other files under src/ reflect namespace structure.
    • All interface for CompassQL syntax should be declared at the top-level of the src/ folder.
  • test/ - Code for unit testing. test's structure reflects src's' directory structure. For example, test/constraint/ test files inside src/constraint/.
  • typings/ - TypeScript typing declaration for dependencies. Some of them are downloaded from the TypeStrong community.

Pro-Tip

  • When you add a new source file to the project, don't forget to add the file to files in tsconfig.json.

compassql's People

Contributors

akshatsh avatar dependabot-preview[bot] avatar dependabot[bot] avatar domoritz avatar donghaoren avatar espressoroaster avatar felixcodes avatar haldenl avatar jstcki avatar kanitw avatar leibatt avatar light-and-salt avatar mattwchun avatar oigewan avatar p42-ai[bot] avatar peter-gy avatar rileychang avatar ssharif6 avatar tafsiri avatar vlandham avatar yhoonkim avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

compassql's Issues

MVP for Enumerate

  • enumerate answers based on input CompassQL query
    • check if the constraint is enabled (in the option)
    • generate fields -- read from schema
  • support two types of constraints
    • encoding constraint (constraint for one encoding mappings)
    • spec constraint (constraint that involves multiple encoding mappings or involves relationship between mark and encoding)
  • determine order in a way that automatically adding count still works
    • noRepeatedField --> '*'
  • Remember which field we assign for later reference

Missing Constraints

  • channelsSupportRoles
  • omitShapeWithBin (channel supports role?)
  • omitShapeWithTimeDimension (channel supports role?)
  • omitBarWithSize
  • omitRawBar/Area

Revise old compass constraints

Not sure if we should add the following

  • maxCardinalityForAutoAddOrdinal #70
  • alwaysAddHistogram
  • consistentAutoQ -- if aggregate for all Q are "*" -- give all of them same level of aggregation. (already have omitRawContinuousFieldForAggregatePlot)

Add missing core tests

enumerator.test.ts

For each of these properties:

  • aggregate
  • timeUnit
  • field
  • type
  1. Write a test that enumerate all valid values
  • aggregate
  • timeUnit
  • field
  • type

hint: turn config.verbose = true

  1. Write a test that enumerate both valid and invalid values (and test that the output contains only valid values)
  • aggregate
    • To see relevant constraints, look at constraints/{spec|encoding}.ts
      • look at properties of each constraint
      • look at a few ones that contain Property.AGGREGATE

(LATER)

  1. Write a test that enumerate all valid values
  • bin -- bin is the most complicated -- ping me to explain about it
  1. Write a test that enumerate both valid and invalid values (and test that the output contains only valid values)
  • To see relevant constraints, look at constraints/{spec|encoding}.ts
  • timeUnit
  • field
  • type

Other Files

Run npm run cover and see coverage report -- add more tests for uncovered constraints

Refactor Bin to Support Bin Parameter

Currently in EncodingQuery, it's

bin?: boolean | EnumSpec<boolean> | ShortEnumSpec;

However, bin can have parameter too and I don't want mixing up between boolean and object here.

So I'm thinking

bin?: BinQuery

with the following interface

interface BinQuery {
  enable: boolean | EnumSpec<boolean> | ShortEnumSpec;
  maxbins: number | EnumSpec<number> | ShortEnumSpec;
  ... // other params
}

Any thoughts? @domoritz

Data-driven occlusion test

Right now we just say aggregate has no occlusion, while raw has occlusion -- that's not always correct.

Enumerate Stack

  • Stack
  • Stack constraint (don't enumerate non-summing aggregate for stack)

Add JSON schema

  • Generate JSON Schema for CompassQL schema

Look at this line in Vega-Lite
https://github.com/vega/vega-lite/blob/master/package.json#L35

Do the same for Query.

  • Add Tests to validate all examples

In Vega-Lite, we have a test that validates all example specs so that both its input and output validates JSON schema.

  • Validates input CompassQL query (each example json files)
  • For each example query, run the query method in query.ts and check the output. For each SpecQueryModel in the output convert them into Vega-Lite specs (call .toSpec()) and validates Vega-Lite output.)

Make sure that the example test is excluded from test coverage.
(See Vega-Lite's package.json)

This spec generates duplicated output

{
  "mark": {
    "mode:": "pick/enum"
    "values": [""]
  },
  "encodings": [
    {
      "channel": "x",
      "field": "Cylinders",
      "type": "quantitative"
    },{
      "autoCount": true
    }
  ]
}

Deal with text table.

In older Compass, we add a few hacks for recommending text table.

With the new label and tile, we need to revise how we deal with this.

Add statistical profiling

  • 1D
  • 2D
  • Need to think what to add

Refactor constraints

Specs

  • hasAppropriateGraphicTypeForMark
  • omitRawBarLineArea
  • omitRawTable

Distinguish high-cardinality strings from nominal fields

Fields with too high cardinality takes up a lot of space and can be slow to render.

  • add a flag isKeyLike (or some better name) to schema

We might want to consider a few options:

  • distinguish between categories (low cardinality) and text (high cardinality) as they serve different purpose in data analysis anyway.
    • Check if the cardinality is above X% (50%?) of the overall data count and above minimum threshold (e.g., 40)

Maybe check if "if the cardinality is above ~80% of the overall data count" or some similar criteria

  • Add a constraint that excludes fields with too high cardanality from being added automatically.

Split generate.ts into two files

Right now enumerator stuff are in generate.ts.
However, this makes generate.test.ts unduly long.

Therefore, we should extract enumerator.ts from generate.ts

Constraint propertyPrecedence

  1. Prevent duplicate output if autoCount comes after channel in propertyPrecedence

Basically, whenever, autoCount is false, we shouldn't even assign it to a channel.

We have to either add Logic to prevent autoCount to come after channel in the propertyPrecedence
or make answerSet in generate really a set to prevent duplication

  1. Prevent nested property output from coming before its parent

Syntax for nested grouping

Nested grouping is very important for understanding structure / debugging output results.
(I'm currently flooded by transposes of the visualizations.)

Therefore we need a good syntax for nested grouping.

Suppose I want to hierarchical grouping that first group by dataQueryKey then by encodingKey.

  • For each subgroup (by encodingKey), I want to order the subgroup's items by rankFn1.
  • For each group (by dataQueryKey), I want to order the group's items (which are subgroups based on encodingKey) by rankFn2.
  • Finally, I want to order groups by rankFn3.

For example, rankingFn1 = rankingFn2 = "effectiveness". rankFn3 can be some data enumeration order. The ranking function will rank groups by calculating score for the top-item in each list.

Suppose

spec = {
    "data": {"url": "data/cars.json"},
    "mark": "?",
    "encodings": [
      {
        "channel": "?",
        "field": "Cylinders",
        "type": "ordinal"
      },{
        "channel": "?",
        "bin": "?",
        "aggregate": "?",
        "field": "Horsepower",
        "type": "quantitative"
      }
    ]
  }

Here are a few alternative queries:

a) Nested version

{
  spec: spec, 
  group/groupings: { 
    // This case, definitely start with top-level grouping key. 
    by: 'dataKey',
    // if we want one output for each group, we can replace this orderItemBy with chooseBy
    orderItemBy: 'rankingFn2' 
    subgroup/subgroupings: {
      by: 'encodingKey',
      orderItemBy: 'rankingFn1'     
    }
  }],
  orderBy: 'rankingFn3'   
}

b) Array-based

{
  spec: spec, 
  // should the first one be the top-level one or the subgroup one -- current it's the subgroup one
  group/groupings: [{ 

     groupBy: 'encodingKey',
     // if we want one output for each group, we can replace this orderItemBy with chooseBy
     orderItemBy: 'rankingFn1'  
  },{
     groupBy: 'dataKey',
     orderItemBy: 'rankingFn2'  
  }],
  orderBy: 'rankingFn3'   // or orderGroupBy?
}

@jheer @domoritz any preference for a. or b. (or other options) / minor wordings?

I am not married to of these yet. Other ideas are welcomed.
I'm leaning toward the nested version because it's seems clearer which one is the top-level grouping.

Improve Ranking

  • Channel, Cardinality
  • Penalize over encoding

Test

  • TxT
  • TxQ
  • QxT > Q

Enumerate Scale Properties

Scale

Background

  • Look at description and changes of #27 to see the infrastructure for adding nested property (bin.maxbins) -- note that I might miss something in the description, but if that's the case, you'll notice problem as you debug.

1st step Scale.type

  • add scale.type (one PR)
    • understand what scale.type means from Vega-Lite docs
    • Add stuff like in #27
    • spec constraints (add to spec.ts)
      • omitBarAreaForLogScale -- don't use bar and area mark for log scale.
    • encoding constraints
      • dataTypeMatchesScaleType -- look at
      • omitBinForLogScale (originally vega/compass#151)
    • Add Example Query to examples/
    • add test for enumerate,
    • add test forgenerate
    • add test for all new constraints

Scale.*

Repeat the process for other scale properties (one PR for each)

  • add ones that are required by other tasks
    • type
      • clamp: Q, T
      • exponent: pow
      • round: Q, T
        • accept types of values depending on scale type
    • zero --> zero doesn't play well with [ ScaleType.ORDINAL, LOG, TIME, UTC]. I don't think I'm missing anything else...
    • bandSize
      • #93
      • bandSize must be at least 0
    • range
      • #101
      • values must contain two or more values.
    • domain
    • round
    • clamp
      • must have continuous domain / continuous domain (quantitative and time types only)
    • nice
      • similar to clamp.. quantitatiev and time.
    • exponent
    • useRawDomain

--- LATER ---

  • padding
    • works with channel.x, channel.y --> uses pixels
    • ??? padding (0, 1) for rangeBands ??? -- LATER

Refactor / Additional Test

  • Extract and test hasRequiredPropertyAsEnumSpec in satisfy of EncodingConstraintModel and SpecConstraintModel

Replicating Compass

Gen

  • aggregate.test.ts
  • encodings.test.ts

Run npm run cover and see coverage report -- add more tests for uncovered constraints

Don't bin Q-field add autoCount if there are already dimension in the spec

For example,

{
  "spec": {
    "data": {"url": "data/cars.json"},
    "mark": "?",
    "encodings": [
      {
        "channel": "?",
        "field": "Cylinders",
        "type": "nominal"
      },{
        "channel": "?",
        "field": "Origin",
        "type": "ordinal"
      },{
        "channel": "?",
        "bin": "?",
        "aggregate": "?",
        "field": "Acceleration",
        "type": "quantitative"
      }
    ]
  },
  "groupBy": "data",
  "config": {
    "autoAddCount": true
  }
}

has this group group: Cylinders,n|Origin,o|bin(Acceleration,q)|count(*,q) that contains a visualization like this one:

vega_editor

{
  "data": {
    "url": "data/cars.json"
  },
  "mark": "point",
  "encoding": {
    "y": {
      "field": "Cylinders",
      "type": "nominal"
    },
    "x": {
      "field": "Origin",
      "type": "ordinal"
    },
    "row": {
      "bin": true,
      "field": "Acceleration",
      "type": "quantitative"
    },
    "size": {
      "aggregate": "count",
      "field": "*",
      "type": "quantitative"
    }
  }
}

Cardinality Based Constraints

  • determine input format for cardinality in the schema
  • maxCardinalityForFacets
  • maxCardinalityForColor
  • maxCardinalityForShape
  • minCardinalityForBin

Refactor

  • Consistent Variable Name
    • encodingQ => encQ
    • property => prop
  • EnumSpecIndex.timeunit => timeUnit

cc: @ZeningQu

Aggregate Plot with Facet the only group-by should be rated worse

e.g.,

{
  "data": {
    "url": "data/cars.json"
  },
  "mark": "point",
  "encoding": {
    "row": {
      "field": "Cylinders",
      "type": "nominal"
    },
    "x": {
      "aggregate": "mean",
      "bin": false,
      "field": "Horsepower",
      "type": "quantitative"
    },
    "y": {
      "aggregate": "mean",
      "bin": false,
      "field": "Acceleration",
      "type": "quantitative"
    }
  }
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.