Code Monkey home page Code Monkey logo

Comments (5)

MateoLostanlen avatar MateoLostanlen commented on September 1, 2024 1

@Akilditu has sarted a Dataset Repo, I guess of these stuff should move there

from pyro-vision.

TekayaNidham avatar TekayaNidham commented on September 1, 2024

Good idea, that would work

from pyro-vision.

frgfm avatar frgfm commented on September 1, 2024

Hey @MateoLostanlen @blenzi @x0s @Akilditu ,

While refactoring the datasets module, I ended up wondering whether some things should be removed because of their misalignment with the repo's purpose. Generally speaking, the purpose of the datasets module in this repo is to make for each dataset its source accessible (using processed annotations if need be) and offering a torchvision dataset. I believe there are some features linked to wildfire that are not aligned with this (considering that for now, the source of wildfire is not even accessible publicly). Some details below:

openfire

  • the URL needs to be moved to a release attachments

video_utils

  • FireLabeler: for me, this helps to structure the dataset before exposing it publicly, but not after. In my opinion, this should be removed from the library.
  • FrameExtractor: this is really helpful, but we could make it much better. First, do we want something to do this statically (creating image files from video files) or dynamically (used by the dataloader)? Second, whatever the first answer is, we need to make it compatible for other video datasets (perhaps there are interesting things we could use in https://github.com/pytorch/vision/blob/master/torchvision/datasets/video_utils.py). I would argue that considering its name, the parameters of the constructor should be sampling parameters (frames / sec, and max_frames for instance), and its call arguments should be a video file (or already read). What do you think?

utils

  • with some regex, we could refactor or remove the name + extension resolution

wildfire
In this module, I believe we need to make some decisions that users won't have to do later including: decide on some criteria using metadata to discard invalid samples (frames), all the remaining frames make the dataset (imbalanced perhaps but still). Next we need to select sampling (none, origin_proportion, positive_ratio), and finally train/val/test split (this last part can be done in the same fashion as with sklearn). In the short term:

  • WildFireDataset: the class gives access to a non-processed entity. First we need to decide (or split) whether this is an Image dataset or a video dataset. Assuming we go with images, is it using image files or video files (sampled into frames). Additionally, we give access to "target_names" which should be used to decide whether we keep/reject samples before making it public. We could set a confidence threshold and discard video samples that do not match it.
  • WildFireSplitter: rereading the code, this should be a function. Additionally, I'm starting to think that sklearn utilities could do the same job considering the nature of the split.
  • computeSubset: this should be refactored and integrated in the Dataset object to be used with the sampling argument in the constructor
  • ExhaustSplitStrategy: can someone remind me the purpose of it?

I do believe it would bring much more value to upload a clean subset of the dataset similar to the one of @MateoLostanlen to have a stable user-friendly and shareable dataset. What do you think?

from pyro-vision.

frgfm avatar frgfm commented on September 1, 2024

Any feedback @MateoLostanlen @blenzi @fe51 @Akilditu ? :)

from pyro-vision.

frgfm avatar frgfm commented on September 1, 2024

Closed by #136 & #138

from pyro-vision.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.