iterative / studio-support Goto Github PK
View Code? Open in Web Editor NEW❓ DVC Studio Issues, Question, and Discussions
Home Page: https://studio.iterative.ai
❓ DVC Studio Issues, Question, and Discussions
Home Page: https://studio.iterative.ai
I was delighted by the new update that hid commits that do not change metrics compared to the default branch.
Unfortunately, cosmetic commits and so on are still shown! I'd suggest showing commits that change metrics compared to the previous commit, instead of compared to the head of the default branch! This would declutter the UI a LOT.
Another option would be to only show branch heads by default, and expand them on request. Right now only expanded branches have any metrics shown.
Eg.: I often create a new feature, train a model to see if it works, then I need to change a bunch of stuff to make CI/CD & unit tests work.
I'd like to use CML / Studio on a vanilla Git repo that uses Gerrit for code review
After force import of this repo https://github.com/iterative/yolov5 I can see the error on that authentication to azure blob storage wasn't successful.
But then it seems like Studio don't recognise this is necessary (no hint in settings), and when I add the credentials manually, the same error is show again after Force Import.
Checked this in dev environment too.
When working on some deep learning problems, it's common to have experiments running for hours (or even days). Some alternatives to Studio
(i.e. wandb or mlflow) have some sort of "view" for individual experiments that is being updated as the training goes on.
Within the "iterative ecosystem" there is the possibility of using dvclive
with DVC
in order to "see a plot for metrics logged during the model training".
I think it would be a good feature to have something similar (or even that same .html) integrated inside Studio
.
The user can't open the view settings page after setting the custom set of chosen columns.
The issue was created from the Papercups support channel, intended to be closed after the fix is shipped.
Steps to reproduce:
Error 400: Request failed message shows up in red in the top right.
CML already supports comments that link to public tensorboards on tensorboard.dev, and they have an open issue for self-hosted tensorboards: iterative/cml#607.
It would be nice to provide:
I just made a commit to my github repository, and tried to run a new Experiment in Studio.
I have set up a Github Action
which will run CML workflow
in a self-hosted
server. By Running a new Experiment in Studio, I expect that it will make a commit to my github repository
, and trigger the Action to run the training process. But then i met the problem.
The ScreenShot below describes my operations.
I guess the problem occurs when commiting to github, because my github repository didn`t receive any update.
Maybe i should set up a token somewhere to provide access to my private repository?
Any suggestions are welcomed !
How is access set up for organizations with GitHub? Adding this as we had a question from a prospect: "If we have SAML/SSO setup with our GitHub Enterprise Organization, do we also need SAML/SSO setup for DVC Studio, or is access based on repo access within GitHub?"
Related #20.
After the initial problem, the user has experienced the dangling branch with an experiment that is not even present in the repository. Force import is not helping the situation. It might be related to the GitLab runner logic handling after the experiment run, but it needs to be researched more.
The bug occurs when DVC Studio tries to show branch that has no unique commits that differ form branch that current branch was created from.
Let me give you an example:
So, looks like "Some branch" is nested in Master and all commits of "Some branch" belongs to master. And DVC Studio shows it that way.
But this isn't correct. "Some branch" is a separate branch in terms of git understanding, but DVC Studio thinks about it as a commit of Master. Therefore search and Filter features of DVC Studio isn't operate correctly because they can't operate with branch that recognized as a commit.
Hi,
I can't currently see any of my repos. And then if I click on Configure Git integration settings
(see below)
I'm directed to this page, but then when I click on the Configure
button. The link returns a 404 response.
Any idea on what's the issue / why the link does not work? (I have granted Iterative Studio access to my github account)
Thank you.
I have a very large number of metrics that I filter down to 12 using the tracking scope (those that contain the word weighted):
However, sometimes I see columns in the view which are not selected in the tracking scope.
I remove them in the columns menu, but after every PR they come back and I need to remove them again.
In case there a lot of branches which have nothing related to dvc, View becomes very polluted. If I not mistaken, you can't do anything useful or get any useful information from these branches (though maybe having .dvc in the default repo branch allows to track metrics in files in other branches, IDK about this).
To get an example, import this repo: https://github.com/iterative/yolov5
Or see the View https://studio.iterative.ai/user/aguschin/views/yolov5-ypd0c4rbtj
It has only one branch which has .dvc and dvc.yaml and it is shown at the bottom.
My guess that it should be related to https://github.com/iterative/viewer/issues/1372
The current situation in which this situation happened: there was a large repo on which dvc was applied. To try out dvc, one specific branch was created.
Also, if there are situations in which these branches could be useful, we can hide this option in Settings.
When a data file has been imported/updated using the --rev
option of dvc import
/dvc update
, a rev
subfield is being added to the .dvc
file (See Example: Importing and updating fixed revisions).
I think that, in cases where it exists, showing or allowing to show the value of rev
could be more useful than size
which is the one being currently displayed in the "column view".
I have a private view that tracks a project with > 200 tracked metrics/params. For obvious reasons, the table does not show all of these metrics/data/params in the table, but there appears to be a bug with updating the "Tracking Scope" settings in this situation. I have tried numerous times to adjust the settings to <10 values, or even clear out all tracking scope, but every time the view updates/re-imports after clicking save, it still comes back trying to display all of the possible values (> 200) again and shows the warning to reduce scope.
When Parameters
have numerical values, they are still recognized as a string.
This causes the sorting result to don't match what I would expect:
The capture comes from applying sorting to the max_features
param in the Studio Demo Project:
After a recent release, a React error is being thrown on certain views:
https://reactjs.org/docs/error-decoder.html/?invariant=31&args%5B%5D=object%20with%20keys%20%7Bdir%2C%20target%2C%20feature_names%7D - indicating we're incorrectly processing some data.
Currently we show all commits in Studio regardless of the changes in anything we track (parameters, metrics, outputs), but it is a common case when there are a lot of commits which aren't related to any changes in ML stuff. To name a few: changes in CI workflow files, changing documentation, editing configuration files which aren't related to ML model (.gitignore for example), etc, etc.
It may be the case that having more clean commit history showing only the commits which introduce changes will be more convenient for the user. As we already have "Delta mode" this is not very important, but I suppose there could be situations in which such option would be convenient.
related to #15
If you have no .dvc in the master branch, Studio won't parse it, for example:
Suggestion: add an option to select some branch to be "default", probably in the repo settings in Studio.
Motivation: This could be inconvenient if you already have a repo without dvc and now are trying out dvc in some other branch, to later create a PR. Of course, as a workaround, you could fork repo and add dvc in master branch, but in this case you lose all CI/CD set up (runners, variables, etc) -- and if you want to use dvc in CI, you will need to deal with setting up CI/CD in your local repo which could be a waste of time.
For me it's unclear how frequent this scenario is, but it's definitely exists. Feel free to add +1 and leave a comment if you need this functionality or suggest how to deal with this more gracefully.
Currently, while I create multiple views for the same repository, I am seeing the same name for them.
The use case where different names might be beneficial is when I need to share my experiments with different stakeholders across the board:
I am using DVC Studio to track metrics of my project. The outputs of the model training stage are
When I run the training with the full dataset, DVC Studio breaks, anddoes not display the confusion matrix, and some of the metrics are missing too. (screenshot attached). the UI says that one big file failed to parse. Everything shows up in GitHub & CML though.
When I limit the size of the dataset, everything works as expected.
I suspect that the issue is that the confusion matrix input file and the test set predictions are too big for DVC Studio (they are about 10 - 20 MB, around 100-200k lines at most), so Studio does not load them.
Feature request:
Keep up the great work, I love your products! :)
BR,
András K
If I change mandatory columns in my project, the change has no effect, and I get the same UI ass before the changes. On the other hand, if I delete the view and create a new one with different mandatory fields, it works.
The user has reported that it is not impossible to save that changes in the Tracking scope. This is followed by the error notification.
A user reported a problem with plots not being able to load a custom template of theirs. Have taken a preliminary look, but couldn't figure out a cause immediately. Will continue investigating tomorrow.
Meta:
'Failed experiment' section is kinda annoying to be on the board all the time. Eventually amount of failed experiments is getting bigger. It would be nice to have some 'delete' button to remove it.
I currently manage different models (for example small
, medium
and large
). The way that I do this is by having a branch for each model type. Since these models serve different purposes (accuracy/runtime tradeoff) I need to maintain top performing models for each of these types. It would be great to have a mode where DVC studio only displays the HEAD
of each branch to get a quick comparison accross different branches. This differs from the current view that displays the last 3 commits from each branch. Ideally this would also be a toggleable viewing mode.
I'm having an issue with plots, when I click on "Show plots" I got a "An error has occurred: Field 'Metric' does not exist in provided data." error.
Is it something I should fix on my end or is it a known issue?
When I try to open gitlab view the following error is prompted:
Something went wrong
c[p] is undefined
The view have to be displayed.
Error instead of a view.
Import only personal projects.
I'm not sure how this would be implemented or if it's possible. But It would be greate to have git notes show up through the UI. Ideally the would be editable as well. Documentation of git notes
can be found here. This would make tracking experiments through studio a more holistic experience. git notes have some advantages over tags (limited in length and formatting) an commit messages (immutable).
If you have a number of filters set up, sometimes it is nice to be able to turn off a certain filter but at the moment you need to remove it so you lose it completely. It would be nice to be able to disable the filter without deleting it. Thanks!
I have a plots csv file which creates a confusion matrix. It plots correctly when I use dvc plots show ./confmat.csv
locally. My dvc yaml looks like:
plots:
- ./confmat.csv:
cache: false
template: confusion
x: actual
y: predicted
However, it does not plot correctly in dvc studio, all I get is an empty plot:
Thanks.
The issue happens on the view page and results in the full page error.
The current studio seems to only support AWS s3. Support for other s3-compatible APIs (like minIO) would be awesome.
The current Trends
view it's not very friendly to use when multiple metrics are selected and used to generate the Trends.
There is not much room to compare and visualize and I find myself constantly scrolling.
I think that it would be nice to expand Trends
in order to have a UX similar to the current state of the Show Plots
view.
Background use case:
I have a dataset that gets updated every ~day. The dataset has ~20 classes and I generate a metric for each class (counting the current number of instances in the dataset). I find Trends
really useful to visualize the evolution of the dataset but the current view has some limitations (described above).
If a user runs experiments through dvc exp run
workflow, they may push experiments to GitHub or another git server (via dvc exp push
). It would be great to see the results of these experiments in Studio.
This may be useful in several scenarios:
Studio already supports Gitlab.com, but it would be very helpful if it supported publically-accessible but self-hosted Gitlab instances as well!
Hello, we have a dvc repository with 2 remotes and intentionally no default remote. We need that because we have a data residency constraint by region.
Would it be possible to specify multiple data remote by projects?
It would be nice to have support for multiline text field for experiment configuration.
For example if I need to pass several elements to config in a list, I must write them in one tiny line which is not very convenient.
Here's an example:
callbacks:
- callback: 'TensorBoard'
args:
log_dir: '../logs'
- callback: 'ModelCheckpoint'
args:
filepath: 'data/checkpoints/ckpt_{epoch:03d}'
save_best_only: True
monitor: 'val_output_1_map'
Would be nice if link to the associated repo opens up a new tab rather than losing your dvc studio page.
It would be intresting to get support from other git repos such as Azure.
I really look forward to try Studio!
In the Add a view
page I can't find some of the repositories that I have on Gitlab. I tried to search the missing repositories using the search bar at the top of the page, but I didn't find any of them.
I'm the owner of the missing Gitlab repository so I don't think is a permission issue.
In the Add a view
page all the repositories from Gitlab have to be displayed.
Only few repositories from Gitlab are showen in the Add a view
page
Add a view
In my Gitlab account I've access to a lot of projects. Maybe the issue is caused by a limit in the Gitlab API: I saw on the Gitlab API documentation that the API uses a pagination system (https://docs.gitlab.com/ee/api/#pagination).
In #24 we discovered that GitHub Enterprise Cloud has an IP allow-list feature. Administrators of organizations can configure an allow list of IP addresses that can access resources in their organization.
We currently do not publish such a list of IP addresses. Publishing one and supporting it requires a commitment from Iterative + has operational challenges and costs.
This issue exists to gauge interest in publishing such a list. Please leave a 👍 or ❤️ reaction in case you would like this.
Hi,
When using the comparison mode between two commits I've noticed when using a long list of metrics scrolling makes the column headers disapear. I have to keep scrolling up to remind myself which columns are for which commit. It would be nice to have them float over the data like in the main view showing commit histories.
When a new commit is added, a popup shows up at the top of the page asking to refresh the view. This popup causes the bottom scrollbar to be pushed off the bottom of the screen. This means that you first have to scroll down in order to scroll left or right on the metrics panel. This is in contrast to the "Review tracking scope" popup at the top that does not push the bar off the bottom. I think the desired behaviour is to have the view resize to accommodate the popup rather than requiring scrolling.
Given that:
DVC Studio uses your regular CI/CD setup (e.g. GitHub Actions) to run the experiments.
I think it would be nice to display the status of the CI/CD workflows in order to be able to better monitor the experiments directly from Studio.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.