Code Monkey home page Code Monkey logo

mosaic's Introduction

Mosaic: An Extensible Framework for Linking Databases and Interactive Views

  • ๐Ÿ“ˆ Explore massive datasets
    Visualize, select, and filter datasets with millions or billions of records.
  • ๐Ÿš€ Flexible deployment
    Build data-driven web apps, or interact with data directly in Jupyter notebooks.
  • ๐Ÿ› ๏ธ Interoperable & extensible
    Create new components that seamlessly integrate across selections and datasets.
  • ๐Ÿฆ† Powered by DuckDB
    Mosaic pushes computation to DuckDB, both server-side and in your browser via WebAssembly.

Mosaic is an extensible architecture for linking data visualizations, tables, input widgets, and other data-driven components, leveraging a backing database for scalable processing of both static and interactive views. With Mosaic, you can visualize and explore millions and even billions of data points at interactive rates.

The key idea is to have interface components "publish" their data needs as declarative queries that can be managed, optimized, and cross-filtered by a coordinator that proxies access to DuckDB.

Learn more about Mosaic at the documentation site, or read the Mosaic research paper.

If referencing Mosaic, please use the following citation:

@article{heer2024mosaic,
  title={Mosaic: An Architecture for Scalable \& Interoperable Data Views},
  author={Heer, Jeffrey and Moritz, Dominik},
  journal={IEEE Transactions on Visualization and Computer Graphics},
  year={2024},
  volume={30},
  number={1},
  pages={436-446},
  doi={10.1109/TVCG.2023.3327189}
}

Repository Structure

This repository contains a set of related packages.

Note: For convenience, the vgplot package re-exports much of the mosaic-core, mosaic-sql, mosaic-plot, and mosaic-inputs packages. For most applications, it is sufficient to either import @uwdata/vgplot alone or in conjunction with @uwdata/mosaic-spec.

Core Components

  • mosaic-core: The core Mosaic components. A central coordinator, parameters and selections for linking scalar values or query predicates (respectively) across Mosaic clients, and filter groups with optimized index management. The Mosaic coordinator can send queries either over the network to a backing server (socket and rest clients) or to a client-side DuckDB-WASM instance (wasm client).
  • mosaic-sql: An API for convenient construction and analysis of SQL queries. Query objects then coerce to SQL query strings.
  • mosaic-inputs: Standalone data-driven components such as input menus, text search boxes, and sortable, load-on-scroll data tables.
  • mosaic-plot: An interactive grammar of graphics implemented on top of Observable Plot. Marks (plot layers) serve as individual Mosaic clients. These marks can push data processing (binning, hex binning, regression) and optimizations (such as M4 for line/area charts) down to the database. This package also provides interactors for linked selection, filtering, and highlighting using Mosaic Params and Selections.

Applications

  • vgplot: A visualization grammar API for building interactive Mosaic-powered visualizations and dashboards. This package provides convenient, composable methods that combine multiple Mosaic packages (core, inputs, plot, etc.) in an integrated API. This API re-exports much of the mosaic-core, mosaic-sql, mosaic-plot, and mosaic-inputs packages, enabling use in a stand-alone fashion.
  • mosaic-spec: Declarative specification of Mosaic-powered applications as JSON or YAML files. This package provides a parser and code generation framework for reading specifications in a JSON format and generating live Mosaic visualizations and dashboards using the vgplot API.
  • duckdb-server: A Python-based server that runs a local DuckDB instance and support queries over Web Sockets or HTTP, returning data in either Apache Arrow or JSON format.
  • widget: A Jupyter widget for Mosaic. Given a declarative specification, will generate web-based visualizations while leveraging DuckDB in the Jupyter kernel. Create interactive Mosaic plots over Pandas and Polars data frames or DuckDB connections.

Miscellaneous

  • mosaic-duckdb: A Promise-based Node.js API to DuckDB, along with a data server that supports transfer of Apache Arrow and JSON data over either Web Sockets or HTTP. Due to persistent quality issues involving the Node.js DuckDB client and Arrow extension, we recommend using the Python-based duckdb-server package instead. However, we retain this package for both backwards compatibility and potential future use as quality issues improve.
  • vega-example: A proof-of-concept example integrating Vega-Lite with Mosaic for data management and cross-view linking.

Build and Usage Instructions

To build and develop Mosaic locally:

  • Clone https://github.com/uwdata/mosaic.
  • Run npm i to install dependencies.
  • Run npm test to run the test suite.
  • Run npm run build to build client-side bundles.

To run local interactive examples:

  • Run npm run dev to launch a local web server and view examples. By default, the examples use DuckDB-WASM in the browser. For greater performance, launch and connect to a local DuckDB server as described below below.

To launch a local DuckDB server:

  • Install hatch, if not already present.
  • Run npm run server to launch the duckdb-server. This runs the server in development mode, so the server will restart if you change its code.

To use Mosaic with DuckDB Python in Jupyter Notebooks:

To use Mosaic with DuckDB-WASM in Observable Notebooks:

mosaic's People

Contributors

jheer avatar domoritz avatar dependabot[bot] avatar frtennis1 avatar spren9er avatar mingfang avatar kwonoh avatar rickiesmooth avatar manzt avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.