Code Monkey home page Code Monkey logo

visualnexus's Introduction

Multi-Modality

VisualNexus

GitHub license GitHub stars GitHub issues GitHub pull requests

VisualNexus is an open-source training pipeline designed to facilitate the segmentation and labeling of visual datasets using a single model. It allows you to generate richly detailed labeled datasets for downstream fine-tuning of models. With VisualNexus, you can streamline the process of data preparation and enhance the efficiency of training computer vision models.

Getting Started

To get started with VisualNexus, please follow the introductary instructions below or at the bottom.

  1. Clone the VisualNexus repository:
git clone https://github.com/kyegomez/VisualNexus.git
cd VisualNexus
  1. Install the required dependencies:
pip install -r requirements.txt
  1. cd VisualNexus then cd models

  2. Enter in your dataset in either sag_img.py and sag_video.py

python3 sag_img.py
  1. The labeled dataset will be stored in the specified output folder, ready for downstream fine-tuning.

Key Features

  • Seamless segmentation and labeling of visual datasets.
  • Efficient data preprocessing and augmentation techniques.
  • Support for multiple input data formats.
  • Richly detailed labeled datasets for downstream model training.
  • Easy integration with existing computer vision workflows.
  • Share with friends feature to spread awareness across various social media platforms.

Architecture

VisualNexus employs a simple yet effective architecture to perform segmentation and labeling tasks. The pipeline consists of the following steps:

  1. Data Input: VisualNexus accepts input visual datasets in various formats, including images, videos, or any other compatible data format.

  2. Segmentation and Labeling: The dataset is processed using a pre-trained segmentation model, which accurately identifies and segments objects or regions of interest within the visuals. This step produces richly detailed labels for each data instance.

  3. Dataset Storage: The labeled dataset is stored in a structured format, ensuring the preservation of original data along with the corresponding segmentation and labeling information. This format facilitates easy integration with downstream model training processes.

  4. Downstream Fine-tuning: The generated labeled dataset can be seamlessly used for fine-tuning other computer vision models. The labeled data provides valuable annotations, enabling the models to learn from high-quality labeled examples and improve their performance.

Detailed Instructions:

For Images Only

If you are looking to use VisualNexus for only images, the SAG_IMG class is suitable for this purpose. Follow these steps:

  1. Import the SAG_IMG class from visualnexus.py:
from visualnexus import SAG_IMG
  1. Create an instance of SAG_IMG with the necessary parameters. These parameters include the paths to your images, model parameters, and optional settings. Below is an example:
image_file_paths = load_hf_dataset(dataset_name="your_dataset") # Replace "your_dataset" with your dataset's name
img_seg = SAG_IMG(image_file_paths)
  1. Call the segment method on the SAG_IMG instance to perform the segmentation:
img_seg.segment()
  1. If you want to save the resulting dataset, call the create_dataset method:
img_seg.create_dataset()

For Videos Only

If you are dealing with videos, the SAG_VID class can be used. Follow these steps:

  1. Import the SAG_VID class from visualnexus.py:
from visualnexus import SAG_VID
  1. Create an instance of SAG_VID with the necessary parameters. These parameters include the paths to your videos, model parameters, and optional settings. Below is an example:
video_file_paths = load_hf_dataset(dataset_name="your_dataset") # Replace "your_dataset" with your dataset's name
vid_seg = SAG_VID(video_file_paths)
  1. Call the segment method on the SAG_VID instance to perform the segmentation:
vid_seg.segment()
  1. If you want to save the resulting dataset, call the create_dataset method:
vid_seg.create_dataset()

For Both Images and Videos

If your dataset contains both images and videos, the SAG_MEDIA class can be used. Follow these steps:

  1. Import the SAG_MEDIA class from visualnexus.py:
from visualnexus import SAG_MEDIA
  1. Create an instance of SAG_MEDIA with the necessary parameters. These parameters include the paths to your images and videos, model parameters, and optional settings. Below is an example:
image_file_paths = load_hf_dataset(dataset_name="your_dataset") # Replace "your_dataset" with your dataset's name
video_file_paths = load_hf_dataset(dataset_name="your_dataset") # Replace "your_dataset" with your dataset's name
media_seg = SAG_MEDIA(image_file_paths, video_file_paths)
  1. Call the segment method on the SAG_MEDIA instance to perform the segmentation:
media_seg.segment()
  1. If you want to save the resulting dataset, call the create_dataset method:
media_seg.create_dataset()

Now your dataset is ready for downstream tasks like model fine-tuning. Please note that you will need to replace "your_dataset" with the actual name of your dataset when loading the images and videos.

For more details, please refer to the documentation in the visualnexus.py script. The script is designed to be flexible and customizable, so feel free to modify it according to your needs.

Share with Friends

Help us spread awareness about VisualNexus by sharing it on various social media platforms:

Let's share VisualNexus and help the community benefit from this open-source training pipeline for visual dataset segmentation and labeling!

Contributing

Contributions are welcome! If you have any ideas, suggestions, or bug reports, please feel free to open an issue or submit a pull request.

License

This project is licensed under the MIT License.

Acknowledgements

We would like to express our gratitude to the open-source community for their invaluable contributions and inspiration.

Contact

Email kye at [email protected]

2 Pipelines

2 potential pipelines, what do you think? We should make one for robotic datasets

  • Infinigen -> Segment Anything Video -> Very Rich and Detailed Dataset.

  • Segment Anything for Image and or Video -> Iterate over Dataset and segment-> Very Rich and Detailed Dataset structured dataset for pretraining?

  • Video Pipeline vidoe instruction dataset -> Segment Anything Video -> Orca-Like explain your reasoning instruction tuning creation -> super SOTA visual pretraining dataset for vision models

  • from discord chat in LAION with Christop

*"This looks wonderful
reclaimer — Today at 7:33 AM
We could make like an multi modal dataset with images or video dataset like Orca’s where the model explains its multi modal reasoning 
Christoph — Today at 10:13 AM
Yes, exactly. Or we take lavar captions and Feed them in GPT 4 to reason based on the Text about the Image contents, the causes, consequeces.... 😉
reclaimer — Today at 11:41 AM
We could also sprinkle in video segment anything to show the model the segmentations with detailed reasoning explanations"

Roadmap

  • Creaet documentation for all classes

  • Create examples for Robotic Vision, Medical Imaging, Autonomous Driving, Satellite Imagery:

  • Update examples

visualnexus's People

Contributors

kyegomez avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

biovbreed

visualnexus's Issues

Request to add to FastSAM's REAEDMD.md

Hey @kyegomez,

I'm the co-author of FastSAM, thank you very much for your promising Pipeline and for choosing FastSAM to help with this Pipeline!

Would you like to link to FastSAM's README.md to show everyone this promising application?

Upvote & Fund

  • We're using Polar.sh so you can upvote and help fund this issue.
  • We receive the funding once the issue is completed & confirmed by you.
  • Thank you in advance for helping prioritize & fund our backlog.
Fund with Polar

for potential pipelines, I support second one

2 Pipelines
2 potential pipelines, what do you think? We should make one for robotic datasets

1. Infinigen -> Segment Anything Video -> Very Rich and Detailed Dataset.

2. Segment Anything for Image and or Video -> Iterate over Dataset and segment-> Very Rich and Detailed Dataset structured dataset for pretraining?

I support 2, segment anything (SAM) from facebook was show a power in image tasks, now fine-tune and acceleration of SAM are two ways for future, so creating more and more good quantity dataset is fine work based on fine-tune SAM with hand labeling work, so Pipelines of labeling work in future will be based on cycle of fine-tune SAM workflow, handing label work change to fix the edge of labels created from SAM

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.