The motion from dm4ml

Create `motion test` command

Users should be allowed to write tests for application logic

Prune records that correspond to duplicated records

Whenever duplicating a record, the old record typically isn't written to. So there are null-valued records that exist. Figure out how to cleanly dispose of them.

Cache features when there are dependent models (3+ models)

Log trigger executions in a separate table

Think about sanity checking LLM output

Regenerating model outputs when pushing a new model version

Right now, after fit is called, model outputs aren't regenerated. We need to do this.

Get rid of relational model

See if fashion pipeline can be effectively rewritten with one schema instead of 2

Spec out train-once and run views

There are 2 views I am thinking of:

train-once
run

For each view, describe the operations a user can run in the UI

--

Explore

Goal: enable developers to build the pipeline. Quick iteration on models, can slice and dice intermediates to inspect the data as they please. Success: time it takes to build a pipeline is less than half the time it takes without motion.

Key aspects: runs training ops only once using the minimum number of data points specified. Checkpoints inputs and outputs for every transform. allows user to run each transform, as well as whole pipeline.

3 types of cells:

type: feature or label type creation
transform: create a transform, set dependencies, can execute it
free: forks/copies state and allows users to perform any read operations on intermediate state. this is kind of like a jupyter cell/free-for-all.

Test

Goal: simulate deployment on a batch of data. Understand performance as it would be, deployed. Success: when user deploys the pipeline to prod, they don't see a big performance drop soon after.

Key aspects: auto-runs retraining whenever models are getting stale. Users cannot add type or transform cells here. They simply run the pipeline on the ids they care about. They can also evaluate a function to measure performance.

2 types of cells:

free: same as above
evaluator: takes predictions and labels, and computes an evaluation metric. can either be a function or a class with state (to do incremental maintenance).

Swap out backends

Try redis and polars

Write query method to avoid a join

Currently we have a call like:

results = store.con.execute(
            "SELECT fashion.query.query, fashion.query.text_suggestion, fashion.catalog.permalink, fashion.catalog.img_url, fashion.query.img_score FROM fashion.query JOIN fashion.catalog ON fashion.query.img_id = fashion.catalog.id WHERE fashion.query.img_id IS NOT NULL AND fashion.query.query_id = ? ORDER BY fashion.query.img_score ASC",
            (query_id,),
        ).fetchdf()

Handle DB migrations seamlessly

We want to be able to support schema migrations---for example, a user adding a new key to a relation. Ideally the user does not have to run a separate command; Motion should automatically detect when a schema doesn't match up, and migrate to match up.

It would be good to prompt the user to confirm they want to migrate the schema.

Design run view in Figma

Change backend to pyarrow

Create explore/train-once view in web app

Components:

Documentation

Requirements.txt
Schema documentation
Store methods (common ones one will use)
Transform class
end to end example (including how to log feedback)

Don't need to make it official yet.

Handle complex types in getter/setters

Sample app: personalized news Q&A

Triggers:

scraper: scrapes recent headlines
chatbot: puts recent headlines in prompt and pings LLM, along with user's interests

Think about deployment

shouldFit should not return context and true/false?

Modify example project to fit current motion framework

The example project is currently outdated.

Change all `id` to `identifier`

Figure out how to store data

Persist DB

Move the image embedding computation into fit instead of transform in Retrieval

Make motion a python library

Get rid of derived_id check that probably slows down the DB

Incorporating feedback

Figure out how to take in new labels or feedback from the user. Also figure out how to create an evaluator.

Design train-once view in Figma

Refactor Retrieval in fashion application

Make setter method handle many keys and values

Write `get` method to access DB

Basic get method
Get method with guardrail (e.g., can't peek into future)
Copy method (takes in id, doesn't execute triggers)

Change `schema.py` from `dataclass` to `pydantic`

Helps with validation and removes the need for python 3.10

Write fashion pipeline

Write components
Write scrapers
Write streamlit dashboard
Write fine tuning

[EPIC] Generative Fashion

Use a generative image model like stable diffusion to:

generate for a user's prompt

Needs fine-tuning on fashion images. We can use our catalog and fine-tune on text_suggestion, image pairs that the user likes.

Create abstract batch inference method in the transform class

Productionization

Handle authentication and allow for nginx and/or gunicorn serving.

Fail gracefully if cron triggers fail

Right now the thread will stop if there's a failure in a cron thread. Make it fail gracefully.

[EPIC] Create representation of someone's closet

Steps:

User uploads camera roll + photo of themselves
Model prunes set of images for images that have outfits of the user
Find k most "worn" items for the user and store this
Find similar clothing times to what users already have

Models:

Face recognition to filter camera roll for pics of user
Outfit segmentation to find most "worn" items
Retrieval model to find neighboring outfits (scraped from online catalog) that the user doesn't own

First pass w/o fine-tuning

Allow users to force fit methods to run synchronously

Default can be async

Save model state when motion stops

It looks like the on-disk version of duckdb is too slow. We may want to save the data directly in arrow tables and use duckdb to query. When a session shuts down, we need to:

persist data/table information
persist trigger state

Unify set and setMany

Clean up example fashion application

Rename component to transform
Rename fit and transform methods to fit and infer
Make a file for each transform

Document the transform lifecycle

Write basic documentation for transform lifecycle
What should be async? Fit? How to coordinate?

dm4ml / motion Goto Github PK

motion's People

Contributors

Stargazers

Watchers

Forkers

motion's Issues

Explore

Test

Recommend Projects

Recommend Topics

Recommend Org