Code Monkey home page Code Monkey logo

Comments (30)

dberenbaum avatar dberenbaum commented on June 8, 2024 4

Thanks @Suor. I don't think a user can do this in dvc now even with custom templates since only a single series is expected. It has come up before and makes sense as a useful feature.

There are a couple ways I can imagine achieving this within dvc:

  1. Have training and validation loss in separate files and allowing dvc plots diff between the two (or more) files. See iterative/dvc#5808 for a discussion/proposal on that.
  2. Have training and validation loss in the same file, supporting more than one y-axis field, and adding a template for multi-series plots.

Both sound potentially useful. A couple of reasons I'd probably prioritize the first approach:

  1. It seems easier to implement quickly since it's just adding an option to diff between file paths instead of revisions.
  2. DVCLive is currently setup to have a single series per plots file and to separate training and validation into different paths.

cc @pared

from studio-support.

cagdasbas avatar cagdasbas commented on June 8, 2024 4

Hi everyone! My team also used to see the losses of both training and test in the same graph for each iteration (or epoch). I did some tests with current releases, and I believe the problem is with the studio because running dvc plot show properly shows the plots on the local browser. Here is what I've done so far.

This is my custom template file:

multi_loss.json:
{
    "$schema": "https://vega.github.io/schema/vega-lite/v4.json",
    "data": {
        "values": "<DVC_METRIC_DATA>"
    },
    "title": "<DVC_METRIC_TITLE>",
    "width": 300,
    "height": 300,
    "mark": {
        "type": "line",
        "point": {
            "filled": false,
            "fill": "white"
        }
    },
    "encoding": {
        "x": {
            "field": "<DVC_METRIC_X>",
            "type": "quantitative",
            "title": "<DVC_METRIC_X_LABEL>"
        },
        "y": {
            "field": "<DVC_METRIC_Y>",
            "type": "quantitative",
            "title": "<DVC_METRIC_Y_LABEL>",
            "scale": {
                "zero": false
            }
        },
        "color": {
            "field": "stage",
            "type": "nominal",
            "legend": {"disable": false},
            "scale": {}
        }
    }
}

This is my plot definition in dvc.yaml

      - plots/losses.csv:
          cache: false
          title: Train/Test losses
          template: multi_loss
          x: epoch
          y: loss

And this is my sample csv file:

stage,epoch,loss
train,1,4.7
train,2,3.5
train,3,2.2
train,4,2.1
train,5,1.1
train,6,1.0
train,7,0.4
test,1,14.7
test,2,13.5
test,3,12.2
test,4,12.1
test,5,11.1
test,6,11.0
test,7,8.4

This configuration shows the graph properly on both dvc plot show output and in vega editor:
image

However, the problem is, the studio wants to group the plots by revision and overrides two keys in the template. Here is what vega editor shows when I click on "Open in Vega Editor" from the studio:

{
  "$schema": "https://vega.github.io/schema/vega-lite/v4.json",
  "data": {
    "values": [
      {"loss": "4.7", "epoch": "1", "stage": "train", "rev": "ab8f6b3"},
      {"loss": "3.5", "epoch": "2", "stage": "train", "rev": "ab8f6b3"},
      {"loss": "2.2", "epoch": "3", "stage": "train", "rev": "ab8f6b3"},
      {"loss": "2.1", "epoch": "4", "stage": "train", "rev": "ab8f6b3"},
      {"loss": "1.1", "epoch": "5", "stage": "train", "rev": "ab8f6b3"},
      {"loss": "1.0", "epoch": "6", "stage": "train", "rev": "ab8f6b3"},
      {"loss": "0.4", "epoch": "7", "stage": "train", "rev": "ab8f6b3"},
      {"loss": "14.7", "epoch": "1", "stage": "test", "rev": "ab8f6b3"},
      {"loss": "13.5", "epoch": "2", "stage": "test", "rev": "ab8f6b3"},
      {"loss": "12.2", "epoch": "3", "stage": "test", "rev": "ab8f6b3"},
      {"loss": "12.1", "epoch": "4", "stage": "test", "rev": "ab8f6b3"},
      {"loss": "11.1", "epoch": "5", "stage": "test", "rev": "ab8f6b3"},
      {"loss": "11.0", "epoch": "6", "stage": "test", "rev": "ab8f6b3"},
      {"loss": "8.4", "epoch": "7", "stage": "test", "rev": "ab8f6b3"}
    ]
  },
  "title": "Train/Test losses",
  "width": "container",
  "height": 200,
  "mark": {"type": "line", "point": {"filled": false, "fill": "white"}},
  "encoding": {
    "x": {"field": "epoch", "type": "quantitative", "title": "epoch"},
    "y": {
      "field": "loss",
      "type": "quantitative",
      "title": "loss",
      "scale": {"zero": false}
    },
    "color": {
      "field": "stage",
      "type": "nominal",
      "legend": {"disable": true},
      "scale": {"domain": ["ab8f6b3"], "range": ["#13adc7"]}
    }
  },
  "padding": {"bottom": 5, "left": 5, "right": 5, "top": 5}
}

The studio plot is:
image

It overrides legend and scale keys in color section and because of that vega shows only ${stage}== "ab8f6b3". If I change the stage of some rows to ab8f6b3, vega plots those rows.
image

I think we need a way to tell the studio that I don't want to compare these plots between revisions.

from studio-support.

dberenbaum avatar dberenbaum commented on June 8, 2024 3

For reference, see this proposal from @Suor in iterative/dvc#5980 (reply in thread):

plots:
- train_vs_val:
    x: epoch
    y: loss
    data: [train_loss.csv, val_loss.csv]


- val_f1.csv:
    x: epoch
    y: [f1_class_0, f1_class_1]
    y_label: f1
    # data is absent, using key value: val_f1.csv

# Separately plot since we have TWO plots, even though with the same data file
- scores_acc:
    x: epoch
    y: acc
    data: scores.csv
- scores_auc:
    x: epoch
    y: auc
    data: scores.csv

from studio-support.

cagdasbas avatar cagdasbas commented on June 8, 2024 3

Thanks everyone! I couldn't try the fix @dberenbaum suggested. I'll write back as soon as I can.

from studio-support.

dberenbaum avatar dberenbaum commented on June 8, 2024 3

Yes, perfect timing since @pared and I were just discussing it when this user posted! @pared is starting work on it now and will share the plans with @Suor and @shcheklein soon. Let us know if there's anyone else to include for initial feedback.

from studio-support.

shcheklein avatar shcheklein commented on June 8, 2024 3

Update. It has been implemented on the DVC side and we are looking now into this on the Studio side.

For docs, see this please # Combine multiple data sources. here: https://dvc.org/doc/user-guide/visualizing-plots

from studio-support.

ssachkovskaya avatar ssachkovskaya commented on June 8, 2024 2

@dberenbaum @shcheklein @tapadipti Since now top-level plots are supported by Studio, can this issue be closed?

from studio-support.

Suor avatar Suor commented on June 8, 2024 1

@jorgeorpinel do we officially support such scenario in DVC? If yes maybe we should add this to docs.

from studio-support.

Suor avatar Suor commented on June 8, 2024 1

This seems like a common enough case. Making it easier in dvc also makes sense, i.e. providing a more generic template and some extra dvc plots modify options. What do you think @dberenbaum? Maybe even adding a possibility to use custom keys within plot props to be passed to template and rendered there - this is to enable users writing their own custom templates, which are generic/reusable.

from studio-support.

Suor avatar Suor commented on June 8, 2024 1

No obvious way to save this type of plot configuration for future use since config is currently tied to a path.

There is one obvious way - make a notion of a plot independent and make a separate entry in dvc.yaml for plots, refer data files, templates and props there for each plot. This was briefly discussed when we implemented plots initially, but it was easier to implement attaching props to data files, also it was argued that that one is more intuitive and closer to how people operate.

No guarantees that the plots config (what if they specify different templates, x-axes, etc.?) or underlying data are compatible.

Same as now. We show error when trying to plot both in DVC and Studio. This is also complicated by the fact that data file might change over time, i.e. some columns may disappear or be renamed or props changed, which also means even if props and columns are consistent within a commit they might not be across them.

Unclear how to diff between revisions work if there are already multiple series on the plot.

We can use facets either by revision or by y. Alternatively use different line styles. But we come into territory of combinatorial explosions, variability and user preferences here.

Any command line syntax solution, which avoids saving to dvc.yaml, won't show up in Studio. This will mean we would need to invent out own UI and store things ourselves, while in command line people will need to use bash scripts or history to replot things.

from studio-support.

shcheklein avatar shcheklein commented on June 8, 2024 1

For the record one more user was asking about this feature:

i want to plot loss and val_loss data in same graph on dvc studio. how do i command plots modify? cc @dberenbaum - after the images if we have capacity, let's try to think together if we should improve this on the DVC side first or do a custom wizard (potentially with an ability to save its state back into repo) on the Studio side. I think there were some good suggestions on the DVC end and they didn't look too heavy.

Prioritizing this since, plots are p1 for us at the moment.

from studio-support.

jorgeorpinel avatar jorgeorpinel commented on June 8, 2024 1

@jorgeorpinel do we officially support such scenario in DVC? If yes maybe we should add this to docs.
all the data, i.e. all CSV columns, are passed to vega, so if you hardcode field names there then you can do anything

Sorry for a very late reply on that but I also have the impression it's curently possible with a custom template. Can you confirm @pared ? If so it would definitely be nice to have an advance example in https://dvc.org/doc/command-reference/plots if you guys want to contribute a draft! Probably not essential though, especially as this discussion is ongoing and there may be a better way in the near future.

from studio-support.

ssachkovskaya avatar ssachkovskaya commented on June 8, 2024 1

@cagdasbas we have pushed a fix to Studio, so now you should be able to see plots with multiple metrics using the approach suggested by @dberenbaum (your workaround + facet). Hope this helps while we are working on this feature.

@Suor good suggestion, however I am not sure how we will merge plots from multiple selected commits if they don't have a rev field. Let's discuss it internally.

from studio-support.

pared avatar pared commented on June 8, 2024 1

@jorgeorpinel
It seems to be possible, though I think that this won't be an issue after iterative/dvc#5980

from studio-support.

Suor avatar Suor commented on June 8, 2024

As far as I understand you may achieve it with custom templates within DVC. You'll need to write a little vega json or probably copy default one and add something there. Once you'll have it, assigned it to your data file with dvc plots modify --template and saved it to git both dvc plots show and Studio will show it like this. You might need to hardcode field names(s) for y axes into template though, so you custom template would be of a limited reuse.

from studio-support.

mmeendez8 avatar mmeendez8 commented on June 8, 2024

Yes, you are definetely right @Suor. I was just wondering that if this is gonna be the main UI for DVC it might be necessary to find an easier way to achieve this target, specially if you pretend to move users from another popular tools as MLFlow that allow this task.

I can also add that using custom templates could end up in a lot of boilerplate too... you would have to move the same code over and over between repositories so it might be better to enable this feature here.

from studio-support.

Suor avatar Suor commented on June 8, 2024

Plots in Studio are in an early phase now, we basically show whatever DVC shows. That is the question we haven't resolved yet how far do we want to move away from DVC and what types of things we should add here as opposed to both here and into DVC. Plots in DVC are also evolving.

from studio-support.

Suor avatar Suor commented on June 8, 2024

Thinking of this, some template stored on Studio side - provided by platform or by user or generated via UI - linked to any CSV or JSON or other datafile is a valid use case on its own.

from studio-support.

mmeendez8 avatar mmeendez8 commented on June 8, 2024

Plots in Studio are in an early phase now, we basically show whatever DVC shows. That is the question we haven't resolved yet how far do we want to move away from DVC and what types of things we should add here as opposed to both here and into DVC. Plots in DVC are also evolving.

I see, I was not conscious of this debate.

Thinking of this, some template stored on Studio side - provided by platform or by user or generated via UI - linked to any CSV or JSON or other datafile is a valid use case on its own.

Yes that would be a plausible solution for keeping studio and DVC "synchronized". Maybe it will make things a bit difficult in the future, I am thinking about the difficulty of handling all those automatically generated files for all plausible combinations... Anyway I get your point, it seems a larger discussion is needed here

from studio-support.

Suor avatar Suor commented on June 8, 2024

I don't think a user can do this in dvc now even with custom templates since only a single series is expected

As far as I can see all the data, i.e. all CSV columns, are passed to vega, so if you hardcode field names there then you can do anything. If data comes from several files then it's not possible though since data file being the plot is part of how even dvc.yaml stores it. So option 2 is way easier to implement I believe.

from studio-support.

dberenbaum avatar dberenbaum commented on June 8, 2024

We could do both or figure out what makes more sense for users. Do they want to be able to plot across files or within one file, and which is a better UI?

Different files

  • UI might look like dvc plots diff --no-index model1_roc.tsv rev:model2_roc.tsv.
  • Natural for something like training and validation data that might be stored separately.
  • No obvious way to save this type of plot configuration for future use since config is currently tied to a path.
  • No guarantees that the plots config (what if they specify different templates, x-axes, etc.?) or underlying data are compatible.

Same file

  • UI might look like dvc plots modify -y train -y val (feel free to suggest something different).
  • Natural for something like multiclass roc plots that would be stored in one dataset.
  • Unclear how to diff between revisions work if there are already multiple series on the plot.

Combined approach

@pared has suggested a syntax like dvc plots show -y file.csv -y rev1:file.csv -y rev2:file.csv.

  • Covers both scenarios.
  • Need to verify how this works (I think the column names are missing unless I'm misunderstanding).
  • Like comparing different files, it's unclear if there's a way to save this type of plot config.
  • Might add complexity for simple scenarios.

from studio-support.

pared avatar pared commented on June 8, 2024

Any command line syntax solution, which avoids saving to dvc.yaml, won't show up in Studio. This will mean we would need to invent out own UI and store things ourselves, while in command line people will need to use bash scripts or history to replot things.

Also, that does not seem to make too much sense from DVC perspective. I mean, thats the point of version control, to save things for later use.

The problem here is that on one hand, we would like DVC commands to provide tight integration with git and revisions, so that we can easily compare some assets (that was the initial driving force behind plots, and hence the behaviour of diffing only files with same name) and now we would like to compare different files from different revisions. The latter approach concept does not go well with the former.

make a notion of a plot independent and make a separate entry in dvc.yaml

If we want to satisfy both ideas, that seems to be the only way - maybe we should store just plot configuration and require user to provide data for particular revisions:files when they use plots?

from studio-support.

dberenbaum avatar dberenbaum commented on June 8, 2024

now we would like to compare different files from different revisions

We may be getting ahead of ourselves here. I haven't yet heard of (nor can I think of) a use case where comparing different files from different revisions is actually needed. Doing one or the other may be sufficient.


make a notion of a plot independent and make a separate entry in dvc.yaml

Having a plots section at the top level of dvc.yaml might happen, but I think the keys are still likely to be file paths for now. If we want to fully decouple plots configuration from file paths altogether, I'm not sure exactly how that should look or whether it's worthwhile. It's probably a separate discussion that goes beyond combining metrics.


DVC could add support for both diffing between files and showing multi-column plots within a file.

Diffing between file paths:

# dvc.yaml
plots:
- train_loss.csv:
    x: epoch
    y: loss
- val_loss.csv:
    x: epoch
    y: loss
  • dvc plots diff --no-index train_loss.csv val_loss.csv plots a diff just like comparing revisions.
  • --no-index is not an intuitive name, so open to other suggestions even though it would break git consistency.
  • Throw an error if configs don't match.
  • Plotting this in Studio doesn't seem much different to me than existing diff plots.

Plotting multiple columns within a file:

# dvc.yaml
plots:
- loss.csv:
    template: multiline
    x: epoch
    y:
    - train
    - val

from studio-support.

pared avatar pared commented on June 8, 2024

So I guess we are discussing here versatility vs user experience. We move targeting data from file_name to column_name. The question is whether there will come time when someone wants to compare val_loss with train_loss. Then we will be back to discussing very generic approach. Which now is not even dvc plot diff revision:file_path revision2:file_path but even dvc plot revision:file_path:column revision2:file_path2:column2

from studio-support.

shcheklein avatar shcheklein commented on June 8, 2024

For the record, we got one more request for this:

https://discord.com/channels/485586884165107732/841856466897469441/892320977323712553

Brief summary, read the whole post for the details:

Hi everyone! I searched a little but couldn't find anything so wanted to ask. I want to create a custom vega template to see multiple lines in a single graph but it seems that studio doesn't allow it. What I want to do is see both training and validation loss on a single graph. I've created a custom template and it both works on vega online editor and dvc plot show shows it properly. But studio appends two key to the template and it messes up the graph.

from studio-support.

dberenbaum avatar dberenbaum commented on June 8, 2024

@cagdasbas Thanks for the detailed info! This is a nice way workaround for getting training and validation onto the same plot. We hope to make this easier than needing a custom template in the future, but glad dvc plots show is at least working for you.

If you need a quick fix, I think you could adjust your template to add a facet, like:

{
    "$schema": "https://vega.github.io/schema/vega-lite/v4.json",
    "data": {
        "values": "<DVC_METRIC_DATA>"
    },
    "title": "<DVC_METRIC_TITLE>",
    "facet": {
        "field": "rev",
        "type": "nominal"
    },
    "spec": {
        "width": 300,
        "height": 300,
        "mark": {
            "type": "line",
            "point": {
                "filled": false,
                "fill": "white"
            }
        },
        "encoding": {
            "x": {
                "field": "<DVC_METRIC_X>",
                "type": "quantitative",
                "title": "<DVC_METRIC_X_LABEL>"
            },
            "y": {
                "field": "<DVC_METRIC_Y>",
                "type": "quantitative",
                "title": "<DVC_METRIC_Y_LABEL>",
                "scale": {
                    "zero": false
                }
            },
            "color": {
                "field": "stage",
                "type": "nominal",
                "legend": {"disable": false},
                "scale": {}
            }
        }
    }
}

from studio-support.

Suor avatar Suor commented on June 8, 2024

@ssachkovskaya probably switching off color rewriting if the field there is not rev should work here. And probably a good idea overall. I.e. we don't mess with a template unless it is what we expect.

from studio-support.

shcheklein avatar shcheklein commented on June 8, 2024

One more request from the user:

I really enjoy using dvc, and for me, there is one thing that might improve its notoriety and its popularity in the community, and I really really want to know if it is already inside or in question as a dev improvement on the stack : You might wonder what is all about ! Take a look at this picture, it is a really common picture in ml, but not in dvc nor dvc studio, I guess. Will DVC plots command accept two columns against formally precision of labels and/or title ? Perhaps it is already possible, but not in the doc, I guess. Please leave me a comment.

https://discordapp.com/channels/485586884165107732/563406153334128681/927839356654346262

@dberenbaum @pared are there plans to implement the proposal?

from studio-support.

tapadipti avatar tapadipti commented on June 8, 2024

@dberenbaum @pared could you also include me in the plan/discussion for this. Thanks.

from studio-support.

pared avatar pared commented on June 8, 2024

@tapadipti I am currently working on that, what would you like to know?

from studio-support.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.