Comments (30)
Thanks @Suor. I don't think a user can do this in dvc now even with custom templates since only a single series is expected. It has come up before and makes sense as a useful feature.
There are a couple ways I can imagine achieving this within dvc:
- Have training and validation loss in separate files and allowing
dvc plots diff
between the two (or more) files. See iterative/dvc#5808 for a discussion/proposal on that. - Have training and validation loss in the same file, supporting more than one y-axis field, and adding a template for multi-series plots.
Both sound potentially useful. A couple of reasons I'd probably prioritize the first approach:
- It seems easier to implement quickly since it's just adding an option to diff between file paths instead of revisions.
- DVCLive is currently setup to have a single series per plots file and to separate training and validation into different paths.
cc @pared
from studio-support.
Hi everyone! My team also used to see the losses of both training and test in the same graph for each iteration (or epoch). I did some tests with current releases, and I believe the problem is with the studio because running dvc plot show
properly shows the plots on the local browser. Here is what I've done so far.
This is my custom template file:
multi_loss.json:
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"data": {
"values": "<DVC_METRIC_DATA>"
},
"title": "<DVC_METRIC_TITLE>",
"width": 300,
"height": 300,
"mark": {
"type": "line",
"point": {
"filled": false,
"fill": "white"
}
},
"encoding": {
"x": {
"field": "<DVC_METRIC_X>",
"type": "quantitative",
"title": "<DVC_METRIC_X_LABEL>"
},
"y": {
"field": "<DVC_METRIC_Y>",
"type": "quantitative",
"title": "<DVC_METRIC_Y_LABEL>",
"scale": {
"zero": false
}
},
"color": {
"field": "stage",
"type": "nominal",
"legend": {"disable": false},
"scale": {}
}
}
}
This is my plot definition in dvc.yaml
- plots/losses.csv:
cache: false
title: Train/Test losses
template: multi_loss
x: epoch
y: loss
And this is my sample csv file:
stage,epoch,loss
train,1,4.7
train,2,3.5
train,3,2.2
train,4,2.1
train,5,1.1
train,6,1.0
train,7,0.4
test,1,14.7
test,2,13.5
test,3,12.2
test,4,12.1
test,5,11.1
test,6,11.0
test,7,8.4
This configuration shows the graph properly on both dvc plot show
output and in vega editor:
However, the problem is, the studio wants to group the plots by revision and overrides two keys in the template. Here is what vega editor shows when I click on "Open in Vega Editor" from the studio:
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"data": {
"values": [
{"loss": "4.7", "epoch": "1", "stage": "train", "rev": "ab8f6b3"},
{"loss": "3.5", "epoch": "2", "stage": "train", "rev": "ab8f6b3"},
{"loss": "2.2", "epoch": "3", "stage": "train", "rev": "ab8f6b3"},
{"loss": "2.1", "epoch": "4", "stage": "train", "rev": "ab8f6b3"},
{"loss": "1.1", "epoch": "5", "stage": "train", "rev": "ab8f6b3"},
{"loss": "1.0", "epoch": "6", "stage": "train", "rev": "ab8f6b3"},
{"loss": "0.4", "epoch": "7", "stage": "train", "rev": "ab8f6b3"},
{"loss": "14.7", "epoch": "1", "stage": "test", "rev": "ab8f6b3"},
{"loss": "13.5", "epoch": "2", "stage": "test", "rev": "ab8f6b3"},
{"loss": "12.2", "epoch": "3", "stage": "test", "rev": "ab8f6b3"},
{"loss": "12.1", "epoch": "4", "stage": "test", "rev": "ab8f6b3"},
{"loss": "11.1", "epoch": "5", "stage": "test", "rev": "ab8f6b3"},
{"loss": "11.0", "epoch": "6", "stage": "test", "rev": "ab8f6b3"},
{"loss": "8.4", "epoch": "7", "stage": "test", "rev": "ab8f6b3"}
]
},
"title": "Train/Test losses",
"width": "container",
"height": 200,
"mark": {"type": "line", "point": {"filled": false, "fill": "white"}},
"encoding": {
"x": {"field": "epoch", "type": "quantitative", "title": "epoch"},
"y": {
"field": "loss",
"type": "quantitative",
"title": "loss",
"scale": {"zero": false}
},
"color": {
"field": "stage",
"type": "nominal",
"legend": {"disable": true},
"scale": {"domain": ["ab8f6b3"], "range": ["#13adc7"]}
}
},
"padding": {"bottom": 5, "left": 5, "right": 5, "top": 5}
}
It overrides legend
and scale
keys in color
section and because of that vega shows only ${stage}== "ab8f6b3"
. If I change the stage of some rows to ab8f6b3
, vega plots those rows.
I think we need a way to tell the studio that I don't want to compare these plots between revisions.
from studio-support.
For reference, see this proposal from @Suor in iterative/dvc#5980 (reply in thread):
plots:
- train_vs_val:
x: epoch
y: loss
data: [train_loss.csv, val_loss.csv]
- val_f1.csv:
x: epoch
y: [f1_class_0, f1_class_1]
y_label: f1
# data is absent, using key value: val_f1.csv
# Separately plot since we have TWO plots, even though with the same data file
- scores_acc:
x: epoch
y: acc
data: scores.csv
- scores_auc:
x: epoch
y: auc
data: scores.csv
from studio-support.
Thanks everyone! I couldn't try the fix @dberenbaum suggested. I'll write back as soon as I can.
from studio-support.
Yes, perfect timing since @pared and I were just discussing it when this user posted! @pared is starting work on it now and will share the plans with @Suor and @shcheklein soon. Let us know if there's anyone else to include for initial feedback.
from studio-support.
Update. It has been implemented on the DVC side and we are looking now into this on the Studio side.
For docs, see this please # Combine multiple data sources.
here: https://dvc.org/doc/user-guide/visualizing-plots
from studio-support.
@dberenbaum @shcheklein @tapadipti Since now top-level plots are supported by Studio, can this issue be closed?
from studio-support.
@jorgeorpinel do we officially support such scenario in DVC? If yes maybe we should add this to docs.
from studio-support.
This seems like a common enough case. Making it easier in dvc also makes sense, i.e. providing a more generic template and some extra dvc plots modify
options. What do you think @dberenbaum? Maybe even adding a possibility to use custom keys within plot props to be passed to template and rendered there - this is to enable users writing their own custom templates, which are generic/reusable.
from studio-support.
No obvious way to save this type of plot configuration for future use since config is currently tied to a path.
There is one obvious way - make a notion of a plot independent and make a separate entry in dvc.yaml
for plots, refer data files, templates and props there for each plot. This was briefly discussed when we implemented plots initially, but it was easier to implement attaching props to data files, also it was argued that that one is more intuitive and closer to how people operate.
No guarantees that the plots config (what if they specify different templates, x-axes, etc.?) or underlying data are compatible.
Same as now. We show error when trying to plot both in DVC and Studio. This is also complicated by the fact that data file might change over time, i.e. some columns may disappear or be renamed or props changed, which also means even if props and columns are consistent within a commit they might not be across them.
Unclear how to diff between revisions work if there are already multiple series on the plot.
We can use facets either by revision or by y
. Alternatively use different line styles. But we come into territory of combinatorial explosions, variability and user preferences here.
Any command line syntax solution, which avoids saving to dvc.yaml
, won't show up in Studio. This will mean we would need to invent out own UI and store things ourselves, while in command line people will need to use bash scripts or history to replot things.
from studio-support.
For the record one more user was asking about this feature:
i want to plot loss and val_loss data in same graph on dvc studio. how do i command plots modify?
cc @dberenbaum - after the images if we have capacity, let's try to think together if we should improve this on the DVC side first or do a custom wizard (potentially with an ability to save its state back into repo) on the Studio side. I think there were some good suggestions on the DVC end and they didn't look too heavy.
Prioritizing this since, plots are p1
for us at the moment.
from studio-support.
@jorgeorpinel do we officially support such scenario in DVC? If yes maybe we should add this to docs.
all the data, i.e. all CSV columns, are passed to vega, so if you hardcode field names there then you can do anything
Sorry for a very late reply on that but I also have the impression it's curently possible with a custom template. Can you confirm @pared ? If so it would definitely be nice to have an advance example in https://dvc.org/doc/command-reference/plots if you guys want to contribute a draft! Probably not essential though, especially as this discussion is ongoing and there may be a better way in the near future.
from studio-support.
@cagdasbas we have pushed a fix to Studio, so now you should be able to see plots with multiple metrics using the approach suggested by @dberenbaum (your workaround + facet). Hope this helps while we are working on this feature.
@Suor good suggestion, however I am not sure how we will merge plots from multiple selected commits if they don't have a rev field. Let's discuss it internally.
from studio-support.
@jorgeorpinel
It seems to be possible, though I think that this won't be an issue after iterative/dvc#5980
from studio-support.
As far as I understand you may achieve it with custom templates within DVC. You'll need to write a little vega json or probably copy default one and add something there. Once you'll have it, assigned it to your data file with dvc plots modify --template
and saved it to git both dvc plots show
and Studio will show it like this. You might need to hardcode field names(s) for y axes into template though, so you custom template would be of a limited reuse.
from studio-support.
Yes, you are definetely right @Suor. I was just wondering that if this is gonna be the main UI for DVC it might be necessary to find an easier way to achieve this target, specially if you pretend to move users from another popular tools as MLFlow that allow this task.
I can also add that using custom templates could end up in a lot of boilerplate too... you would have to move the same code over and over between repositories so it might be better to enable this feature here.
from studio-support.
Plots in Studio are in an early phase now, we basically show whatever DVC shows. That is the question we haven't resolved yet how far do we want to move away from DVC and what types of things we should add here as opposed to both here and into DVC. Plots in DVC are also evolving.
from studio-support.
Thinking of this, some template stored on Studio side - provided by platform or by user or generated via UI - linked to any CSV or JSON or other datafile is a valid use case on its own.
from studio-support.
Plots in Studio are in an early phase now, we basically show whatever DVC shows. That is the question we haven't resolved yet how far do we want to move away from DVC and what types of things we should add here as opposed to both here and into DVC. Plots in DVC are also evolving.
I see, I was not conscious of this debate.
Thinking of this, some template stored on Studio side - provided by platform or by user or generated via UI - linked to any CSV or JSON or other datafile is a valid use case on its own.
Yes that would be a plausible solution for keeping studio and DVC "synchronized". Maybe it will make things a bit difficult in the future, I am thinking about the difficulty of handling all those automatically generated files for all plausible combinations... Anyway I get your point, it seems a larger discussion is needed here
from studio-support.
I don't think a user can do this in dvc now even with custom templates since only a single series is expected
As far as I can see all the data, i.e. all CSV columns, are passed to vega, so if you hardcode field names there then you can do anything. If data comes from several files then it's not possible though since data file being the plot is part of how even dvc.yaml
stores it. So option 2 is way easier to implement I believe.
from studio-support.
We could do both or figure out what makes more sense for users. Do they want to be able to plot across files or within one file, and which is a better UI?
Different files
- UI might look like
dvc plots diff --no-index model1_roc.tsv rev:model2_roc.tsv
. - Natural for something like training and validation data that might be stored separately.
- No obvious way to save this type of plot configuration for future use since config is currently tied to a path.
- No guarantees that the plots config (what if they specify different templates, x-axes, etc.?) or underlying data are compatible.
Same file
- UI might look like
dvc plots modify -y train -y val
(feel free to suggest something different). - Natural for something like multiclass roc plots that would be stored in one dataset.
- Unclear how to diff between revisions work if there are already multiple series on the plot.
Combined approach
@pared has suggested a syntax like dvc plots show -y file.csv -y rev1:file.csv -y rev2:file.csv
.
- Covers both scenarios.
- Need to verify how this works (I think the column names are missing unless I'm misunderstanding).
- Like comparing different files, it's unclear if there's a way to save this type of plot config.
- Might add complexity for simple scenarios.
from studio-support.
Any command line syntax solution, which avoids saving to dvc.yaml, won't show up in Studio. This will mean we would need to invent out own UI and store things ourselves, while in command line people will need to use bash scripts or history to replot things.
Also, that does not seem to make too much sense from DVC perspective. I mean, thats the point of version control, to save things for later use.
The problem here is that on one hand, we would like DVC commands to provide tight integration with git and revisions, so that we can easily compare some assets (that was the initial driving force behind plots, and hence the behaviour of diffing only files with same name) and now we would like to compare different files from different revisions. The latter approach concept does not go well with the former.
make a notion of a plot independent and make a separate entry in dvc.yaml
If we want to satisfy both ideas, that seems to be the only way - maybe we should store just plot configuration and require user to provide data for particular revisions:files when they use plots?
from studio-support.
now we would like to compare different files from different revisions
We may be getting ahead of ourselves here. I haven't yet heard of (nor can I think of) a use case where comparing different files from different revisions is actually needed. Doing one or the other may be sufficient.
make a notion of a plot independent and make a separate entry in dvc.yaml
Having a plots
section at the top level of dvc.yaml
might happen, but I think the keys are still likely to be file paths for now. If we want to fully decouple plots configuration from file paths altogether, I'm not sure exactly how that should look or whether it's worthwhile. It's probably a separate discussion that goes beyond combining metrics.
DVC could add support for both diffing between files and showing multi-column plots within a file.
Diffing between file paths:
# dvc.yaml
plots:
- train_loss.csv:
x: epoch
y: loss
- val_loss.csv:
x: epoch
y: loss
dvc plots diff --no-index train_loss.csv val_loss.csv
plots a diff just like comparing revisions.--no-index
is not an intuitive name, so open to other suggestions even though it would break git consistency.- Throw an error if configs don't match.
- Plotting this in Studio doesn't seem much different to me than existing diff plots.
Plotting multiple columns within a file:
# dvc.yaml
plots:
- loss.csv:
template: multiline
x: epoch
y:
- train
- val
dvc plots show
plots both lines on the same plot using https://vega.github.io/vega-lite/docs/repeat.html.dvc plots diff
makes a facet grid of the plots (similar to confusion matrix).
from studio-support.
So I guess we are discussing here versatility vs user experience. We move targeting data from file_name to column_name. The question is whether there will come time when someone wants to compare val_loss
with train_loss
. Then we will be back to discussing very generic approach. Which now is not even dvc plot diff revision:file_path revision2:file_path
but even dvc plot revision:file_path:column revision2:file_path2:column2
from studio-support.
For the record, we got one more request for this:
https://discord.com/channels/485586884165107732/841856466897469441/892320977323712553
Brief summary, read the whole post for the details:
Hi everyone! I searched a little but couldn't find anything so wanted to ask. I want to create a custom vega template to see multiple lines in a single graph but it seems that studio doesn't allow it. What I want to do is see both training and validation loss on a single graph. I've created a custom template and it both works on vega online editor and dvc plot show shows it properly. But studio appends two key to the template and it messes up the graph.
from studio-support.
@cagdasbas Thanks for the detailed info! This is a nice way workaround for getting training and validation onto the same plot. We hope to make this easier than needing a custom template in the future, but glad dvc plots show
is at least working for you.
If you need a quick fix, I think you could adjust your template to add a facet, like:
{
"$schema": "https://vega.github.io/schema/vega-lite/v4.json",
"data": {
"values": "<DVC_METRIC_DATA>"
},
"title": "<DVC_METRIC_TITLE>",
"facet": {
"field": "rev",
"type": "nominal"
},
"spec": {
"width": 300,
"height": 300,
"mark": {
"type": "line",
"point": {
"filled": false,
"fill": "white"
}
},
"encoding": {
"x": {
"field": "<DVC_METRIC_X>",
"type": "quantitative",
"title": "<DVC_METRIC_X_LABEL>"
},
"y": {
"field": "<DVC_METRIC_Y>",
"type": "quantitative",
"title": "<DVC_METRIC_Y_LABEL>",
"scale": {
"zero": false
}
},
"color": {
"field": "stage",
"type": "nominal",
"legend": {"disable": false},
"scale": {}
}
}
}
}
from studio-support.
@ssachkovskaya probably switching off color rewriting if the field there is not rev should work here. And probably a good idea overall. I.e. we don't mess with a template unless it is what we expect.
from studio-support.
One more request from the user:
I really enjoy using dvc, and for me, there is one thing that might improve its notoriety and its popularity in the community, and I really really want to know if it is already inside or in question as a dev improvement on the stack : You might wonder what is all about ! Take a look at this picture, it is a really common picture in ml, but not in dvc nor dvc studio, I guess. Will DVC plots command accept two columns against formally precision of labels and/or title ? Perhaps it is already possible, but not in the doc, I guess. Please leave me a comment.
https://discordapp.com/channels/485586884165107732/563406153334128681/927839356654346262
@dberenbaum @pared are there plans to implement the proposal?
from studio-support.
@dberenbaum @pared could you also include me in the plan/discussion for this. Thanks.
from studio-support.
@tapadipti I am currently working on that, what would you like to know?
from studio-support.
Related Issues (20)
- Can't see any live updates HOT 17
- Filter commits by commit message HOT 4
- How can I access the model files from Studio model registry? HOT 28
- After re-adding a project to Studio I still see old commits HOT 1
- No permission to fetch remote data HOT 12
- New model never shows up in Studio HOT 14
- How is the model registry in Studio supposed to be used? HOT 7
- Show missing data for pushed experiments HOT 15
- Add support for shap interactive plots HOT 3
- Do we need to track DVCLive files for Studio? HOT 3
- How can a dev pushing to the repo provide his Studio token to the Github action for CML? HOT 3
- DVCLive metrics don't show up in Studio's column selection HOT 26
- New model never leaves "queued" state HOT 7
- 5000 column limit HOT 6
- Support dark mode in Studio
- Azure Blob connection in Studio HOT 2
- Chat window is partially covered by project table headers
- DVC tracked images don't appear in Studio HOT 9
- Missing parent commit (bnraches) for DVC experiments HOT 2
- `dvclive` stops sending data to studio during training HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from studio-support.