Code Monkey home page Code Monkey logo

zeno's People

Contributors

cabreraalex avatar dependabot[bot] avatar erica-w-fu avatar gitter-badger avatar joshjzhou avatar ks-create avatar neubig avatar sparkier avatar stevenyh3 avatar tianqi-wu avatar willeppy avatar xnought avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

zeno's Issues

Parallelize expensive operations

Primarily:

Data processing/Model running with caching:

  • Preprocessing
  • Transformations
  • Model inference

DataFrame operations:

  • Data slicing

create insights view for report tab overview

Users should be shown some automated "insights" about their slices/reports when they first land on the reports tab. This can include new failures, regressions, anomalies, etc. in tabular and graphical form.

Filtering in Discovery

  • The filtering does not get reset when filter nodes are removed
  • Lasso select filtering is not possible
  • Metrics are not recomputed on filtered space from lasso

tabular view

create a view for tabular data.

Maybe consider making a default table view for Zeno generally that shows the values for all columns? idk.

modular instance views

The least generalizable piece of Zeno is the instance view.

It would be great to make this modular, where people can install e.g. zeno-pointcloud and zeno then has an option to take in pointcloud data and visualize it without changing the zeno code itself.

support multimodal models

the Python API breaks down substantially when trying to use a multimodal model. Need to re-design and update the API to directly support multimodal models.

Projection is Slow on Start and Generally Bad

  • On start takes minutes to initially start, then subsequent calls are fast (are parametric UMAP JIT compiling functions?)
  • The results are not the best (modify the parametric UMAP or switch to ScVis or other)

Metrics/slices dependent on other slices/transforms

For example, what if I want to find all images with dogs, and then filter for the dogs that are "red". The first slicing function has to indicate the bounding boxes the second slicing function should focus on.

slow startup

Initial running of Zeno is super slow, even before running any computation. We should figure out why.

Labeler Breaks Backend

  • When creating the polgyon on an intermediate node in the pipeline it errors out
  • When creating one with the same name it breaks the frontend and backend

preserve preferences in sample view

Since we re-render the whole sample view each time we change the table, preferences are re-written, such as showing the mask or just groundtruth for segmentation.

We should find a way to preserve these settings across re-renders of the sample view.

Scatter issues in Discovery

  • rotating messes up the region labeler polygon
  • Filtering to new spaces and reporjection is jarring (maybe tween the points atleast for reprojection?)

group predicates for slice creation

Currently you can only do single joins for filters, e.g. (A AND B OR C AND D)
We need to be able to make more complex filters such as (A AND C) OR !(D)

scalability issues

Generally keeping track of issues that will start to come up as we scale to larger datasets:

  • Vega binning for histograms is quite slow, especially if done N times across all metadata
  • Calculating metrics on the fly for every interaction/slice can be slow
  • Requesting metrics using lists of IDXs can be slow if IDX lists go into millions
  • Running projections, e.g. UMAP can be painfully slow, likely need approximations or smart caching.

Batch load data for memory limits

Can't load in all the data at once, e.g. 100,000 images into memory. Have to batch it for both preprocessing and prediction. ALSO remove from slice API, everything should be metadata.

jupyter notebook interface

Instantiate Zeno, pass in functions, and run it - will open new page with UI.

This will require lots of thinking about how to structure the API - do we pass functions in? try to read the notebook itself?

support text and datetime metadata columns

Some metadata columns may be dates or text. We should have a cell visualization for them that provides some overview and filtering.

  • text - regex filtering and examples results/count of results
  • datetime - calendar selection

grid/confusion matrix view

Either in exploration or analysis tab, let people create grid view that crosses slices like a confusion matrix to see metrics at different intersections.

Support creating ad-hoc slices in UI

Users should be able to select a group of instances and create a new slice. We should also support some views that help people expand their slices such as nearest neighbors and embeddings.

fix slice editing

two main bugs:

  • allow names to be changed -> requires deleting old slice from map and adding new one
  • update metric when slice edited -> requires deleting result and requesting again

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.