Dataset Tag

Dataset Tag is a free and open-source tool designed to make the process of image captioning for machine learning datasets a breeze. No longer will you ever dread the thought of captioning a dataset for training a LoRA or a DreamBooth model! The software makes the process of tagging the various elements in an image semi-automatic, ensuring accurate and consistent captions, crucial for the quality of the trained artificial neural network.

Features

Predefined Categories: Includes various categories such as:
- Type: photo, illustration, drawing, portrait, render, anime, etc
- Subject: man, woman, mountain, trees, forest, fantasy scene, cityscape, etc.
- Shot: full body shot, cowboy shot, medium shot, medium close-up shot, close-up shot, extreme close-up shot, etc
- Perspective: from above, from below, from front, from behind, from side, three-quarters view, rear three-quarters view, overhead, forced perspective, upside down, etc
- Pose: laying, sitting, standing, leaning, walking, running, jumping, posing, etc
- Location: on couch, on chair, in front of mirror, at desk, on street, etc
- Action: eating, reading, resting, playing, etc
- Gaze: looking at viewer, looking up, looking down, looking sideways, three-quarter gaze, rear three-quarter gaze, looking sideways and upwards, etc
- Mouth: open mouth, closed mouth, slightly open mouth, etc
- Mouth Action: smirk, slight smile, smile, laughing, grinning, etc
- Hair: long hair, short red hair, curly blond hair, etc
- Limbs: bent knee, crossed legs, arms raised above head, arms extended sideways, left palm on forehead, right arm on belly, holding books, etc
- Subject Description: white hat, blue shirt, silver necklace, sunglasses, pink shoes, silver bracelet, green jacket, etc
- Scenery: indoors, outdoors, etc
- Scene Description: flowers wallpaper, chair, table, lamp, beach, sand, water, shore, etc
- Lighting: sunset, strong shadows, warm orange light, night, etc
- Miscellaneous: any tag that does not fit in any of the above categories
Any category can have its tags removed, updated, and new tags can be added.
Thumbnail Previews: Browse through your dataset images with thumbnail previews, for easy access.
Interactive Tagging: Click on an image and select from pre-existing tags across categories, in order to automatically form a standardized caption.
Consistent Formatting: Captions are always arranged in the same category order, for consistency.
Memory Feature: Remembers both the tags in each category and the tags used for each image, allowing for easy edits later.
Replicability: It ofers the option of copying the trigger words and/or the selected tags using the system's cache memory, and pasting them on other selected images (easy copy/paste of tags among images).
Draggable Panel Columns: Each column has a grid splitter, which can be used to resize them in any ratio you desire.
Theme: Light and Dark theme support.

Compiling from Source

This is an optional step, there are releases that can be used as-is. But for those interested in compiling Dataset Tag from the source code, here's a step-by-step guide to get you started. This process requires a basic understanding of software development tools and the command line.

Prerequisites

Before you begin, ensure that you have the following installed on your system:

.NET 7.0 SDK: Dataset Tag is built using .NET 7.0. Download and install the .NET 7.0 SDK from Microsoft's official .NET download page.
Git: To clone the repository, you'll need Git installed. You can download it from git-scm.com.

Cloning the Repository

Open a terminal or command prompt.
Clone the Dataset Tag repository using Git:

git clone https://github.com/BinaryAlley/DatasetTag

Building the Application

Navigate to the cloned repository's directory:

cd DatasetTag/src

Build the application using the .NET CLI:

dotnet build

Running the Application from Source

After successfully building the application, you can run it directly from the source:

dotnet run --project DatasetTag/DatasetTag.csproj

Packaging for Distribution

To create a distributable package of the application, run the following command:

cd DatasetTag
dotnet publish -c Release -r [Runtime Identifier]

Replace [Runtime Identifier] with your target platform's runtime identifier, such as win-x64, linux-x64, or osx-x64.

The executable file should be located in the bin\Release\net7.0\[Runtime Identifier] directory, starting from the root directory where the repository was cloned.

Running the Application

Dataset Tag was built using Avalonia UI, so it is cross-platform and can be run on Linux, Windows, macOS. The process varies slightly depending on the operating system.

Linux, macOS

Make sure you have the .NET 7.0 runtime installed. The process can differ based on your particular distro.
Open a terminal and navigate to the directory where the application is located.
Run the application using this command:
```
dotnet DatasetTag.dll
```
Depending on your Linux distribution, you might need to install additional dependencies. For instance, if you encounter issues related to libSkiaSharp, you might need to install specific libraries like libice6, libsm6, and libfontconfig1. These dependencies are required for SkiaSharp, which is used by Avalonia for rendering:
```
 sudo apt install libice6
 sudo apt install libsm6
 sudo apt install libfontconfig1
```

Windows

Navigate to the directory where the application is located.
Double click the DatasetTag.exe file to run the application.

Usage

Launch the application.
For the Input Path field, browse and select the directory containing your images dataset, or manually paste the path in the text box.
Press the Refresh button. At this moment, the previews of the images in the specified input directory will be displayed.
Click on any thumbnail, to select that particular image. A larger preview should be displayed on the left column of the window.
Write a trigger word, if you are training a character. This is very important, as it will be the word associated with the character you are trying to train (it will become the word you use to "activate" the character in your prompts). This is NOT necessary when training a style, as styles are supposed to be applied every time you use the model or LoRA, without a trigger word.
Start clicking on any tags in any category that describe the details of the selected image. Order does not matter, as the software will always arrange them in the same categories order. Some categories only allow you to select a single tag (example: Shot - you cannot have an image where the subject is in both close-up and full-body shot).
If some category does not contain a tag that you want, you can type that tag in the corresponding text field, and then click Add, or press Enter.
If there are elements that you feel do not belong to any category, you can add them using the Miscellaneous category.
If there are tags that you want to remove from the selected caption tags, or even tags that you want to remove from the predefined categories, you can check the Remove tags checkbox, or press the Control key (does not work on Linux) - while holding it or while the checkbox is checked, each tag will display a close icon, which you can click to remove that tag. Beware, this action cannot be undone.
If you want to edit a particular tag, double click it, instead of single click. This will make that tag enter edit mode, where you can change its text. When the focus is moved outside its input field, it will exit edit mode, and the changes will be saved.
While adding tags, the software will automatically create the caption text from the trigger word (if any) and the selected tags.
When you are satisfied, click on the Save button. This will create a text file with the same name as the selected image, containing the caption. The file will be saved in the same directory where the image is, and it will override an existing caption file!
The software also creates a file named captions.json in the images dataset directory. This file is the one that enables the program to "remember" what each image had as assigned tags and categories. You can remove it if you wish, but the program will treat that image as uncaptioned, the next time you select it.

Contributing

This software welcomes community contributions. All forms of input, be it code, bug reports, or feature suggestions, are appreciated.

Acknowledgments and Credits

Captioning tag categories idea

The idea for this software started to take shape after reading this Reddit post on captioning datasets. Unfortunately, the user that wrote it deleted their account, so I can only credit them by referencing the original post.

Icon

The application's icon was taken from OnlineWebFonts. The Close icon was taken from OnlineWebFonts. Both are licensed under CC BY 4.0.

Application type

The application was built using Avalonia UI, for easy cross-platform deployment.

Images thumbnails

The library used for displaying the thumbnails is SixLabors ImageSharp.

I greatly appreciate the creators and contributors for providing these assets, packages and technologies.

License

This project is licensed under the GPLv3.0. See the LICENSE file for details.

Unable to start on Linux

Unable to start the app on Linux, specifically PopOS (based on Ubuntu).

Unhandled exception. System.NullReferenceException: Object reference not set to an instance of an object.
   at DatasetTag.MainWindow.PopulateDefaultCategories() in /home/<<user>>/src/DatasetTag/src/DatasetTag/MainWindow.axaml.cs:line 427
   at DatasetTag.MainWindow.MainWindow_Opened(Object sender, EventArgs e) in /home/<<user>>/src/DatasetTag/src/DatasetTag/MainWindow.axaml.cs:line 1335
   at Avalonia.Controls.TopLevel.OnOpened(EventArgs e) in /_/src/Avalonia.Controls/TopLevel.cs:line 494
   at Avalonia.Controls.WindowBase.OnOpened(EventArgs e) in /_/src/Avalonia.Controls/WindowBase.cs:line 199
   at Avalonia.Controls.Window.ShowCore(Window parent) in /_/src/Avalonia.Controls/Window.cs:line 675
   at Avalonia.Controls.Window.Show() in /_/src/Avalonia.Controls/Window.cs:line 599
   at Avalonia.Controls.ApplicationLifetimes.ClassicDesktopStyleApplicationLifetime.ShowMainWindow() in /_/src/Avalonia.Controls/ApplicationLifetimes/ClassicDesktopStyleApplicationLifetime.cs:line 129
   at Avalonia.Controls.ApplicationLifetimes.ClassicDesktopStyleApplicationLifetime.Start(String[] args) in /_/src/Avalonia.Controls/ApplicationLifetimes/ClassicDesktopStyleApplicationLifetime.cs:line 118
   at Avalonia.ClassicDesktopStyleApplicationLifetimeExtensions.StartWithClassicDesktopLifetime[T](T builder, String[] args, ShutdownMode shutdownMode) in /_/src/Avalonia.Controls/ApplicationLifetimes/ClassicDesktopStyleApplicationLifetime.cs:line 209
   at DatasetTag.Program.Main(String[] args) in /home/<<user>>/src/DatasetTag/src/DatasetTag/Program.cs:line 12
Aborted (core dumped)

If I were to guess from my own development on dotnet for linux I would assume this is something like Linux doesn't give the window resolution like Windows does so you have to set some defaults, but that is just a wild shot in the dark.

binaryalley / datasettag Goto Github PK

datasettag's Introduction

Dataset Tag

Features

Compiling from Source

Prerequisites

Cloning the Repository

Building the Application

Running the Application from Source

Packaging for Distribution

Running the Application

Linux, macOS

Windows

Usage

Contributing

Acknowledgments and Credits

Captioning tag categories idea

Icon

Application type

Images thumbnails

License

datasettag's People

Contributors

Stargazers

Watchers

Forkers

datasettag's Issues

Recommend Projects

Recommend Topics

Recommend Org