Code Monkey home page Code Monkey logo

datasettag's Introduction

Dataset Tag

Dataset Tag is a free and open-source tool designed to make the process of image captioning for machine learning datasets a breeze. No longer will you ever dread the thought of captioning a dataset for training a LoRA or a DreamBooth model! The software makes the process of tagging the various elements in an image semi-automatic, ensuring accurate and consistent captions, crucial for the quality of the trained artificial neural network.

Dataset Tag

Features

  • Predefined Categories: Includes various categories such as:
    • Type: photo, illustration, drawing, portrait, render, anime, etc
    • Subject: man, woman, mountain, trees, forest, fantasy scene, cityscape, etc.
    • Shot: full body shot, cowboy shot, medium shot, medium close-up shot, close-up shot, extreme close-up shot, etc
    • Perspective: from above, from below, from front, from behind, from side, three-quarters view, rear three-quarters view, overhead, forced perspective, upside down, etc
    • Pose: laying, sitting, standing, leaning, walking, running, jumping, posing, etc
    • Location: on couch, on chair, in front of mirror, at desk, on street, etc
    • Action: eating, reading, resting, playing, etc
    • Gaze: looking at viewer, looking up, looking down, looking sideways, three-quarter gaze, rear three-quarter gaze, looking sideways and upwards, etc
    • Mouth: open mouth, closed mouth, slightly open mouth, etc
    • Mouth Action: smirk, slight smile, smile, laughing, grinning, etc
    • Hair: long hair, short red hair, curly blond hair, etc
    • Limbs: bent knee, crossed legs, arms raised above head, arms extended sideways, left palm on forehead, right arm on belly, holding books, etc
    • Subject Description: white hat, blue shirt, silver necklace, sunglasses, pink shoes, silver bracelet, green jacket, etc
    • Scenery: indoors, outdoors, etc
    • Scene Description: flowers wallpaper, chair, table, lamp, beach, sand, water, shore, etc
    • Lighting: sunset, strong shadows, warm orange light, night, etc
    • Miscellaneous: any tag that does not fit in any of the above categories
  • Any category can have its tags removed, updated, and new tags can be added.
  • Thumbnail Previews: Browse through your dataset images with thumbnail previews, for easy access.
  • Interactive Tagging: Click on an image and select from pre-existing tags across categories, in order to automatically form a standardized caption.
  • Consistent Formatting: Captions are always arranged in the same category order, for consistency.
  • Memory Feature: Remembers both the tags in each category and the tags used for each image, allowing for easy edits later.
  • Replicability: It ofers the option of copying the trigger words and/or the selected tags using the system's cache memory, and pasting them on other selected images (easy copy/paste of tags among images).
  • Draggable Panel Columns: Each column has a grid splitter, which can be used to resize them in any ratio you desire.
  • Theme: Light and Dark theme support.

Compiling from Source

This is an optional step, there are releases that can be used as-is. But for those interested in compiling Dataset Tag from the source code, here's a step-by-step guide to get you started. This process requires a basic understanding of software development tools and the command line.

Prerequisites

Before you begin, ensure that you have the following installed on your system:

  1. .NET 7.0 SDK: Dataset Tag is built using .NET 7.0. Download and install the .NET 7.0 SDK from Microsoft's official .NET download page.
  2. Git: To clone the repository, you'll need Git installed. You can download it from git-scm.com.

Cloning the Repository

  1. Open a terminal or command prompt.
  2. Clone the Dataset Tag repository using Git:
git clone https://github.com/BinaryAlley/DatasetTag

Building the Application

  1. Navigate to the cloned repository's directory:
cd DatasetTag/src
  1. Build the application using the .NET CLI:
dotnet build

Running the Application from Source

After successfully building the application, you can run it directly from the source:

dotnet run --project DatasetTag/DatasetTag.csproj

Packaging for Distribution

To create a distributable package of the application, run the following command:

cd DatasetTag
dotnet publish -c Release -r [Runtime Identifier]

Replace [Runtime Identifier] with your target platform's runtime identifier, such as win-x64, linux-x64, or osx-x64.

The executable file should be located in the bin\Release\net7.0\[Runtime Identifier] directory, starting from the root directory where the repository was cloned.

Running the Application

Dataset Tag was built using Avalonia UI, so it is cross-platform and can be run on Linux, Windows, macOS. The process varies slightly depending on the operating system.

Linux, macOS

  1. Make sure you have the .NET 7.0 runtime installed. The process can differ based on your particular distro.
  2. Open a terminal and navigate to the directory where the application is located.
  3. Run the application using this command:
    dotnet DatasetTag.dll
  4. Depending on your Linux distribution, you might need to install additional dependencies. For instance, if you encounter issues related to libSkiaSharp, you might need to install specific libraries like libice6, libsm6, and libfontconfig1. These dependencies are required for SkiaSharp, which is used by Avalonia for rendering:
     sudo apt install libice6
     sudo apt install libsm6
     sudo apt install libfontconfig1

Windows

  1. Navigate to the directory where the application is located.
  2. Double click the DatasetTag.exe file to run the application.

Usage

  1. Launch the application.
  2. For the Input Path field, browse and select the directory containing your images dataset, or manually paste the path in the text box.
  3. Press the Refresh button. At this moment, the previews of the images in the specified input directory will be displayed.
  4. Click on any thumbnail, to select that particular image. A larger preview should be displayed on the left column of the window.
  5. Write a trigger word, if you are training a character. This is very important, as it will be the word associated with the character you are trying to train (it will become the word you use to "activate" the character in your prompts). This is NOT necessary when training a style, as styles are supposed to be applied every time you use the model or LoRA, without a trigger word.
  6. Start clicking on any tags in any category that describe the details of the selected image. Order does not matter, as the software will always arrange them in the same categories order. Some categories only allow you to select a single tag (example: Shot - you cannot have an image where the subject is in both close-up and full-body shot).
  7. If some category does not contain a tag that you want, you can type that tag in the corresponding text field, and then click Add, or press Enter.
  8. If there are elements that you feel do not belong to any category, you can add them using the Miscellaneous category.
  9. If there are tags that you want to remove from the selected caption tags, or even tags that you want to remove from the predefined categories, you can check the Remove tags checkbox, or press the Control key (does not work on Linux) - while holding it or while the checkbox is checked, each tag will display a close icon, which you can click to remove that tag. Beware, this action cannot be undone.
  10. If you want to edit a particular tag, double click it, instead of single click. This will make that tag enter edit mode, where you can change its text. When the focus is moved outside its input field, it will exit edit mode, and the changes will be saved.
  11. While adding tags, the software will automatically create the caption text from the trigger word (if any) and the selected tags.
  12. When you are satisfied, click on the Save button. This will create a text file with the same name as the selected image, containing the caption. The file will be saved in the same directory where the image is, and it will override an existing caption file!
  13. The software also creates a file named captions.json in the images dataset directory. This file is the one that enables the program to "remember" what each image had as assigned tags and categories. You can remove it if you wish, but the program will treat that image as uncaptioned, the next time you select it.

Contributing

This software welcomes community contributions. All forms of input, be it code, bug reports, or feature suggestions, are appreciated.

Acknowledgments and Credits

Captioning tag categories idea

The idea for this software started to take shape after reading this Reddit post on captioning datasets. Unfortunately, the user that wrote it deleted their account, so I can only credit them by referencing the original post.

Icon

The application's icon was taken from OnlineWebFonts. The Close icon was taken from OnlineWebFonts. Both are licensed under CC BY 4.0.

Application type

The application was built using Avalonia UI, for easy cross-platform deployment.

Images thumbnails

The library used for displaying the thumbnails is SixLabors ImageSharp.

I greatly appreciate the creators and contributors for providing these assets, packages and technologies.

License

This project is licensed under the GPLv3.0. See the LICENSE file for details.

datasettag's People

Contributors

binaryalley avatar

Stargazers

 avatar raf avatar  avatar  avatar  avatar vongoethe avatar  avatar  avatar Nic avatar  avatar  avatar  avatar Ray Wang avatar Patryk Tabiś avatar Alex avatar baikong avatar  avatar Aaron Fang avatar Krishna Praveen avatar ill. avatar  avatar Brian Zalewski avatar Paragoner avatar  avatar mktn avatar 竹村順吾 avatar  avatar ainewsto avatar  avatar Jean-Philippe Deblonde avatar Yehonatan Yosefi avatar  avatar Philip Roberts avatar toyxyz avatar Christopher Crockett avatar  avatar E2GO avatar

Watchers

 avatar  avatar

Forkers

mustafabukulmez

datasettag's Issues

Executable?

Hi I am able to start the application from command line but I don't find the DatasetTag.exe in any folder? Where is it supposed to be ? Thanks

Copy paste tags from image to image

Hey, I just found this helpful tool and i am more than pleased with how it works. Finally a tagger with categories that I can use as a checklist 👍

Only thing which would be an awesome enhancement:
The ability to copy the tags from a selected image and then select another image and paste the copied tags. This would speed up tagging with similar images a lot.

Also a small quality of life change would be to adjust the background colors to a more neutral dark grey. Doesn't need to be a fully dark mode, but tagging 1000+ images with these bright colors in the background (the buttons are totally fine) will give me eyestrain 😄

Otherwise keep up the good work!

Unable to start on Linux

Unable to start the app on Linux, specifically PopOS (based on Ubuntu).

Unhandled exception. System.NullReferenceException: Object reference not set to an instance of an object.
   at DatasetTag.MainWindow.PopulateDefaultCategories() in /home/<<user>>/src/DatasetTag/src/DatasetTag/MainWindow.axaml.cs:line 427
   at DatasetTag.MainWindow.MainWindow_Opened(Object sender, EventArgs e) in /home/<<user>>/src/DatasetTag/src/DatasetTag/MainWindow.axaml.cs:line 1335
   at Avalonia.Controls.TopLevel.OnOpened(EventArgs e) in /_/src/Avalonia.Controls/TopLevel.cs:line 494
   at Avalonia.Controls.WindowBase.OnOpened(EventArgs e) in /_/src/Avalonia.Controls/WindowBase.cs:line 199
   at Avalonia.Controls.Window.ShowCore(Window parent) in /_/src/Avalonia.Controls/Window.cs:line 675
   at Avalonia.Controls.Window.Show() in /_/src/Avalonia.Controls/Window.cs:line 599
   at Avalonia.Controls.ApplicationLifetimes.ClassicDesktopStyleApplicationLifetime.ShowMainWindow() in /_/src/Avalonia.Controls/ApplicationLifetimes/ClassicDesktopStyleApplicationLifetime.cs:line 129
   at Avalonia.Controls.ApplicationLifetimes.ClassicDesktopStyleApplicationLifetime.Start(String[] args) in /_/src/Avalonia.Controls/ApplicationLifetimes/ClassicDesktopStyleApplicationLifetime.cs:line 118
   at Avalonia.ClassicDesktopStyleApplicationLifetimeExtensions.StartWithClassicDesktopLifetime[T](T builder, String[] args, ShutdownMode shutdownMode) in /_/src/Avalonia.Controls/ApplicationLifetimes/ClassicDesktopStyleApplicationLifetime.cs:line 209
   at DatasetTag.Program.Main(String[] args) in /home/<<user>>/src/DatasetTag/src/DatasetTag/Program.cs:line 12
Aborted (core dumped)

If I were to guess from my own development on dotnet for linux I would assume this is something like Linux doesn't give the window resolution like Windows does so you have to set some defaults, but that is just a wild shot in the dark.

[Request] The user should be able to train without a trigger word for style training

For styles, it's generally better not to use a trigger word (or specify photo/painting), as you want it to assume that everything should look like it does in your style, every time. Same goes for certain elements in a concept, like if it should always show the full body, you don't want to tag "full body", because you don't want to give the lora a choice in doing that. Currently, when you try to simply type a description, it won't let you save it because you're missing a trigger word.

Part of this should also ensure that any regular tags or miscellaneous tags don't have a space and comma at the beginning of the prompt. It should just start with whatever tag you put in there, miscellaneous or regular.

Thanks for considering this! It would make the tool useful for more training approaches, and I don't think it would take anything away from those who use it as currently designed.

EDIT: I made the change, and it seems to be working okay? I guess I should make a branch? I've never done this before.

Unable to start on MacOS

An error occurred when running on MacOS.

  • Mac mini 2023
    • Apple M2
    • Ventura 13.6.1
$ dotnet DatasetTag.dll
Unhandled exception. System.NullReferenceException: Object reference not set to an instance of an object.
   at DatasetTag.MainWindow.HandleWindowStateChanged(WindowState state) in A:\work\DatasetTag\src\DatasetTag\MainWindow.axaml.cs:line 1261
   at Avalonia.Native.WindowImpl.WindowEvents.Avalonia.Native.Interop.IAvnWindowEvents.WindowStateChanged(AvnWindowState state) in /_/src/Avalonia.Native/WindowImpl.cs:line 63
   at Avalonia.Native.Interop.Impl.__MicroComIAvnWindowEventsVTable.WindowStateChanged(Void* this, AvnWindowState state) in /_/src/Avalonia.Native/Interop.Generated.cs:line 4248
--- End of stack trace from previous location ---
   at Avalonia.Native.PlatformThreadingInterface.RunLoop(CancellationToken cancellationToken) in /_/src/Avalonia.Native/PlatformThreadingInterface.cs:line 59
   at Avalonia.Threading.Dispatcher.MainLoop(CancellationToken cancellationToken) in /_/src/Avalonia.Base/Threading/Dispatcher.cs:line 61
   at Avalonia.Controls.ApplicationLifetimes.ClassicDesktopStyleApplicationLifetime.Start(String[] args) in /_/src/Avalonia.Controls/ApplicationLifetimes/ClassicDesktopStyleApplicationLifetime.cs:line 120
   at Avalonia.ClassicDesktopStyleApplicationLifetimeExtensions.StartWithClassicDesktopLifetime[T](T builder, String[] args, ShutdownMode shutdownMode) in /_/src/Avalonia.Controls/ApplicationLifetimes/ClassicDesktopStyleApplicationLifetime.cs:line 209
   at DatasetTag.Program.Main(String[] args) in A:\work\DatasetTag\src\DatasetTag\Program.cs:line 12
zsh: abort      dotnet DatasetTag.dll

LLM integration?

First of all, thanks for making and sharing this tagging tool. I was also familiar with that Reddit post of tagging. It offers great insights. Your tools is very helpful. Wonder if you are planning to integrate LLM to help populate the categories based on the target images and then use your tools to further organize each captioning. Something like https://github.com/jiayev/GPT4V-Image-Captioner ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.