Code Monkey home page Code Monkey logo

taggui's Introduction

TagGUI

TagGUI icon

Cross-platform desktop application for quickly adding and editing image tags and captions, aimed towards creators of image datasets for generative AI models like Stable Diffusion.

TagGUI screenshot

Features

  • Keyboard-friendly interface for fast tagging
  • Tag autocomplete based on your own most-used tags
  • Integrated Stable Diffusion token counter
  • Automatic caption and tag generation with models including CogVLM, LLaVA, WD Tagger, and many more
  • Option to load auto-captioning models in 4-bit for reduced VRAM usage
  • Batch tag operations for renaming, deleting, and sorting tags
  • Advanced image list filtering

Installation

The easiest way to use the application is to download the latest release from the releases page. Choose the appropriate file for your operating system, extract it wherever you want, and run the executable file inside. You may have to install 7-Zip to extract the files if you don't have it on your system. No additional dependencies are required.

  • macOS users: There is no macOS release because it requires a device running the OS, and I do not have one. You can still install and run the program manually (see below).
  • Linux users: You may need to install libxcb-cursor0. See this Stack Overflow answer.

Alternatively, you can install manually by cloning this repository and installing the dependencies in requirements.txt. Run taggui/run_gui.py to start the program. Python 3.11 is recommended, but Python 3.10 should also work.

Usage

Load the directory containing your images by clicking the Load Directory button in the center of the window (or File -> Load Directory). Tags are loaded from .txt files in the directory with the same names as the images. Any changes you make to the tags are also automatically saved to these .txt files.

Automatic Captioning

Auto-captioner screenshot

In addition to manual tagging, you can automatically generate captions or tags for your images inside TagGUI. GPU generation requires a compatible NVIDIA GPU, and CPU generation is also supported.

To use the feature, select the images you want to caption in the image list, then select the captioning model you want to use in the Auto-Captioner pane. If you have a local directory containing previously downloaded models, you can set it in File -> Settings to include the models in the model list. Click the Start Auto-Captioning button to start captioning. You can select multiple images to batch generate captions for all of them. It can take up to several minutes to download and load a model when you first use it, but subsequent generations will be much faster.

Captioning parameters

Prompt: Instructions given to the captioning model. Prompt formats are handled automatically based on the selected model.

Start caption with: Generated captions will start with this text.

Remove tag separators in caption: If checked, tag separators (commas by default) will be removed from the generated captions.

Discourage from caption: Words or phrases that should not be present in the generated captions. You can separate multiple words or phrases with commas (,). For example, you can put appears,seems,possibly to prevent the model from using an uncertain tone in the captions. The words may still be generated due to limitations related to tokenization. You can escape commas with backslashes (\,).

Include in caption: Words or phrases that should be present somewhere in the generated captions. You can separate multiple words or phrases with commas (,). You can also allow the captioning model to choose from a group of words or phrases by separating them with |. For example, if you put cat,orange|white|black, the model will attempt to generate captions that contain the word cat and either orange, white, or black. It is not guaranteed that all of your specifications will be met. You can escape commas and pipes with backslashes (\, and \|).

Tags to exclude (WD Tagger models): Tags that should not be generated, separated by commas.

Many of the other generation parameters are described in the Hugging Face documentation.

Advanced Image List Filtering

The basic functionality of filtering for images that contain a certain tag is available by clicking on the tag in the All Tags pane. In addition to this, you can construct more complex filters in the Filter Images box at the top of the Images pane.

Click here to see the full documentation for the filter syntax.

Filter criteria

These are the prefixes you can use to specify the filter criteria you want to apply:

  • tag:: Images that have the filter term as a tag
    • tag:cat will match images with the tag cat.
  • caption: Images that contain the filter term in the caption
    • The caption is the list of tags as a single string, as it appears in the .txt file.
    • caption:cat will match images that have cat anywhere in the caption. For example, images with the tag orange cat or the tag catastrophe.
  • name: Images that contain the filter term in the file name
    • name:cat will match images such as cat-1.jpg or large_cat.png.
  • path: Images that contain the filter term in the full file path
    • path:cat will match images such as C:\Users\cats\dog.jpg or /home/dogs/cat.jpg.
  • You can also use a filter term with no prefix to filter for images that contain the term in either the caption or the file path.
    • cat will match images containing cat in the caption or file path.

The following are prefixes for numeric filters. The operators = (== also works), !=, <, >, <=, and >= are used to specify the type of comparison.

  • tags: Images that have the specified number of tags
    • tags:=13 will match images that have exactly 13 tags.
    • tags:!=7 will match images that do not have exactly 7 tags (images with less than 7 tags or more than 7 tags).
  • chars: Images that have the specified number of characters in the caption
    • chars:<100 will match images that have less than 100 characters in the caption.
    • chars:>=30 will match images that have 30 or more characters in the caption.
  • tokens: Images that have the specified number of tokens in the caption
    • tokens:>75 will match images that have more than 75 tokens in the caption.
    • tokens:<=50 will match images that have 50 or fewer tokens in the caption.

Spaces and quotes

If the filter term contains spaces, you must enclose it in quotes (either single or double quotes). For example, to find images with the tag orange cat, you must use tag:"orange cat" or tag:'orange cat'. If you have both spaces and quotes in the filter term, you can escape the quotes with backslashes. For example, you can use tag:"orange \"cat\"" for the tag orange "cat". An alternative is to use different types of quotes for the outer and inner quotes, like so: tag:'orange "cat"'.

Wildcards

You can use the * character as a wildcard to match any number of any characters, and the ? character to match any single character. For example, tag:*cat will match images with tags like orange cat, large cat, and cat.

Combining filters

Logical operators can be used to combine multiple filters:

  • NOT: Images that do not match the filter
    • NOT tag:cat will match images that do not have the tag cat.
  • AND: Images that match both filters before and after the operator
    • tag:cat AND tag:orange will match images that have both the tag cat and the tag orange.
  • OR: Images that match either filter before or after the operator
    • tag:cat OR tag:dog will match images that have either the tag cat or the tag dog, or both.

The lowercase versions of these operators will also work: not, and, and or.

The operator precedence is NOT > AND > OR, so by default, NOT will be evaluated first, then AND, then OR. You can use parentheses to change this order. For example, in tag:cat AND (tag:orange OR tag:white), the OR will be evaluated first, matching images that have the tag cat and either the tag orange or the tag white. You can nest parentheses and operators to create arbitrarily complex filters.

Controls

  • ⭐ Previous / next image: Ctrl+Up / Down (just Up / Down also works in some cases)
  • Jump to the first untagged image: Ctrl+J
  • Focus the Filter Images box: Alt+F
  • Focus the Add Tag box: Alt+A
  • Focus the Image Tags list: Alt+I
  • Focus the Search Tags box: Alt+S
  • Focus the Start Auto-Captioning button: Alt+C

Images pane

  • First / last image: Home / End
  • Select multiple images: Hold Ctrl or Shift and click the images
  • Select all images: Ctrl+A
  • Invert selection: Ctrl+I
  • Right-clicking on an image will bring up the context menu, which includes actions such as copying and pasting tags and moving or copying selected images to another directory.

Image Tags pane

  • Add a tag: Type the tag into the Add Tag box and press Enter
  • ⭐ Add the first tag suggested by autocomplete: Ctrl+Enter
  • Add a tag to multiple images: Select the images in the image list add the tag
  • Delete a tag: Select the tag and press Delete
  • Rename a tag: Double-click the tag, or select the tag and press F2
  • Reorder tags: Drag and drop the tags
  • Select multiple tags: Hold Ctrl or Shift and click the tags

All Tags pane

  • Show all images containing a tag: Select the tag (When Tag click action is set to Filter images for tag)
  • Add a tag to selected images: Click the tag (When Tag click action is set to Add tag to selected images)
  • Delete all instances of a tag: Select the tag and press Delete
  • Rename all instances of a tag: Double-click the tag, or select the tag and press F2

The Edit menu contains additional features for batch tag operations, such as Find and Replace (Ctrl+R) and Batch Reorder Tags (Ctrl+B).

taggui's People

Contributors

jhc13 avatar yggdrasil75 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

taggui's Issues

Feature Request: Allow us to set default model folder for all models in the settings menu.

Is there a chance that we can:

  • Place the model folder location in the settings menu and have this setting saved? So that each launch we don't have to manually place the folder location in the models box.
  • Have Tagui search that specific folder for the LLM models, similar to the way automatic1111 does.
  • Allow us to download the LLM model to the corresponding folder listed in the settings menu by default.

Feature request: add extra button to insert the last tag again

Scrolling though a huge list of images I want to repeatedly add the same tag.
E.g. in some pictures the person is wearing glasses in others not.
Right now I'm scrolling through the list and control+click to select all relevant images. Once that list is long enough (please see also #26 ) I create and add the tag.
Then I'm continue with the next images, but to give them the same tag I have to start typing.

A good improvement would be to have a shortcut button that adds the last created (selected?) tag to the images that are selected now

Im getting this issue when usign autocaption Error while downloading from https://cdn-lfs-us-1.huggingface.co/repos/32/e4

Best model for tagging?

I'd like to hear if anyone has done a comparative study of best model. As well as the best prompt. Currently my "Write visual tags for the current image, comma-separated" does not always work as intended.

Feature request: undo last image selection

To give many images a new identical tag I hold the control key and select image by image and at the end I write the new tag name and accept that it will be applied to these many images.
But when I do a clicking mistake, probably forget to hold down the control key, all my work is lost and I have to do the tedious selection again.

=> feature request: have a specialized undo button that is just resorting the last selection I did

Feature Request: Ban tokens/words

It would be very useful if we could create a list of banned tokens to keep it from from using low confidence words like "seems, appears, possibly, suggests" etc.

Help with install

Im interested for this UI to use the Blip2 model finally, but im having this issue with the automatic installer. Im using python 3.10.6. Dont know why this happen:

Traceback (most recent call last):
File "run_gui.py", line 6, in
File "", line 1176, in _find_and_load
File "", line 1147, in _find_and_load_unlocked
File "", line 690, in _load_unlocked
File "PyInstaller\loader\pyimod02_importers.py", line 385, in exec_module
File "widgets\main_window.py", line 18, in
File "", line 1176, in _find_and_load
File "", line 1147, in _find_and_load_unlocked
File "", line 690, in _load_unlocked
File "PyInstaller\loader\pyimod02_importers.py", line 385, in exec_module
File "widgets\blip_2_captioner.py", line 6, in
File "", line 1176, in _find_and_load
File "", line 1147, in _find_and_load_unlocked
File "", line 690, in load_unlocked
File "PyInstaller\loader\pyimod02_importers.py", line 385, in exec_module
File "torch_init
.py", line 133, in
raise err
OSError: [WinError 126] No se puede encontrar el módulo especificado. Error loading "C:\Users\enriq\Downloads\taggui-v1.2.0-windows\taggui-v1.2.0-windows\taggui\torch\lib\cufftw64_10.dll" or one of its dependencies

Packaged bitsandbytes do not work on Windows

Looks like the latest (1.9.0) Windows binary release does not include bitsandbytes with CUDA on Windows support.

There's no 4-bit checkbox on binary 1.9.0 release and trying to caption using llava-hf/llava-1.5-7b-hf on RTX 4060 Ti 16Gb:

Captioning... (device: cuda:0)
Traceback (most recent call last):
File "widgets\auto_captioner.py", line 435, in run
File "torch\utils\_contextlib.py", line 115, in decorate_context
File "transformers\generation\utils.py", line 1834, in generate
File "transformers\generation\utils.py", line 3592, in beam_sample
File "transformers\generation\utils.py", line 2966, in _temporary_reorder_cache
File "transformers\models\llava\modeling_llava.py", line 528, in _reorder_cache
File "transformers\models\llama\modeling_llama.py", line 1288, in _reorder_cache
File "transformers\models\llama\modeling_llama.py", line 1288, in <genexpr>
torch.cuda
.
OutOfMemoryError
:
CUDA out of memory. Tried to allocate 16.00 MiB. GPU 0 has a total capacty of 16.00 GiB of which 0 bytes is free.
Of the allocated memory 15.04 GiB is allocated by PyTorch, and 169.61 MiB is reserved by PyTorch but unallocated.
If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.
See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Got the 4-bit quantization of automatic captioning models to work on Windows by:

So something like this:

git clone https://github.com/jhc13/taggui
cd taggui
python -m venv venv
.\venv\Scripts\Activate.ps1
# pytorch with cuda support
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
# scipy first for bitsandbytes from jllllll.github.io
pip install scipy
pip install bitsandbytes --index-url=https://jllllll.github.io/bitsandbytes-windows-webui
# all of requirements.txt except for bitsandbytes, scipy, torch:
pip install accelerate==0.25.0 imagesize==1.4.1 Pillow==10.2.0 pyparsing==3.1.1 PySide6==6.6.1 transformers==4.36.2

[Feature Request] Add tag filter to edit tag list

After using taggui for 2000 images, I was facing a issue.
When I want to added tag to the tag list, I am not sure if it is exist or some simular tag is exist or not.
It is pretty hard to check by go throughing a long tag list.

Second, I also suggested to add the ability to choose image preview size in left menu.
Sometime, it is too big and easy to miss some images when scrolling.

[Feature Request] Dark Mode

Hello!

I would love it if we could get an option in the settings menu to enable dark mode. I saw some images on the main repo page with it enabled and it looked great. I’m up late and the regular mode is a bit bright for me.

Auto-captioner error: module 'tensorflow' has no attribute 'Tensor'

Tried using the auto-captioner feature for the first time. Kept the default settings, and after it downloaded a bunch of models it shows this error in the GUI (nothing in the console). It wont let me copy/paste the whole error from GUI, so I typed the relevant parts here:

Loading THUDM/cogagent-vqa-hf...
A matching Triton is not available, some optimizations will not be enabled
Traceback (most recent call last):
...
... _backends.py line 408, in is_appropriate_type
return isinstance(tensor, (self.tf.Tensor.self.tf.Variable))
AttributeError:
module 'tensorflow' has no attribute 'Tensor'

I have an RTX 3060 12GB, Windows 10.

If I can't get this working, where is the download directory so I can delete the 34GB of models that were downloaded? edit: found the models at: C:\Users\<USER>\.cache\huggingface\hub\models--THUDM--cogagent-vqa-hf

What are NPZ files?

[Report has been formatted to make it more relevant to TagUI devs/users]
I love your tool and think it is hands-down the best available for captioning. However, I had some issues running Kohya_SS with the generated caption files generated by the software.

Kohya_SS (I learned today) by default uses a format called '.caption', I've never heard of, rather than '.txt'. I wanted you to be aware that for users of the console-run CLI, using caption files with the extension '*.txt' will still train with a 'non-terminal' error that results in trained models having no text encoding. The error is very small and as a result, I failed to notice it for 3 months...

Here's a what it looks like buried in the other console messages after running :

Using DreamBooth method.
prepare images.
found directory [to/dataset/folders]\img\100_traing_data_XL contains 88 image files
No caption file found for 88 images. Training will continue without captions for these images. If class token exists, it will be used. / 88枚の画像にキャプションファイルが見つかりませんでした。これらの画像についてはキャプションなしで学習を 続行します。class tokenが存在する場合はそれを使います。

This is what happens, even with text files for every image. I'm pretty sure this only occurs if you don't use the GUI, but I could be wrong.

For others who also use sd-scripts to train directly from the command line, you just add this to your other arguments.

--caption_extension="txt"

It has hard to find a working set of arguments, so here's mine, if anyone wants something that works with a 12GB 3080. Just be sure to change the model dir, vae dir, training dir, output dir, logging dir AND name it by changing '--output_name="Some_Waifu_XL"' to whatever the name of your model should be.

python.exe sdxl_train_network.py --enable_bucket --min_bucket_reso=256 --max_bucket_reso=2048 --pretrained_model_name_or_path=C:/SD/models/Stable-diffusion/dynavisionXLAllInOneStylized_release0557Bakedvae.safetensors --train_data_dir=C:/Users/[yourusername]/Pictures/Input/train/img --caption_extension="txt" --resolution=1024,1024 --output_dir=C:\\Users\\[yourusername]\\Pictures\\Input\\train\\model --logging_dir=C:\\Users\\[yourusername]\\Pictures\\Input\\train\\log --network_alpha=1 --save_model_as=safetensors --network_alpha="1" --save_model_as=safetensors --network_module=networks.lora --text_encoder_lr=0.0004  --unet_lr=0.0004 --network_dim=8  --output_name="Some_Waifu_XL" --lr_scheduler_num_cycles="1" --cache_text_encoder_outputs  --no_half_vae --full_bf16 --learning_rate="0.0004"  --lr_scheduler="constant" --train_batch_size="1"  --max_train_steps="5100" --save_every_n_epochs="1"  --mixed_precision="bf16" --save_precision="bf16"  --cache_latents --cache_latents_to_disk  --optimizer_type="Adafactor" --optimizer_args scale_parameter=False relative_step=False warmup_init=False --max_data_loader_n_workers="0" --bucket_reso_steps=64 --mem_eff_attn  --gradient_checkpointing  --xformers --bucket_no_upscale --network_train_unet_only --vae=C:/SD/models/VAE/fixFP16ErrorsSDXLLowerMemoryUse_v10.safetensors

I also get an error, if I don't first delete the NPZ files created by TagUI. I was just wondering if someone could explain how/why these files are created and what purpose they serve.

Keep up the great work!

[Feature Request] Remove/Rename Tag from Filtered Image Selection Only

Hey, firstly, I deeply appreciate this tool, it's made my workflow so much smoother and I'm grateful you've shared it. On that note, I had a question that I think is similar to #16 but my use case may be more specific or may even already exist, I just can't find it.

I love the ability to filter images by tag(s) on the fly and add new tags to them, I was wondering if it was possible to rename or delete them as well but affecting only the filtered images? If not, is it possible to add something similar? Thank you!

Model Request: Support for the new CogAgent visual language models

Hello!

Thank you so much for this wonderful app! I would love to see support for the new CogAgent visual language models.

cogagent-vqa-hf
cogagent-chat-hf

I downloaded these myself and tried linking to them in the settings but I get the following error:

Unrecognized processing class in I:\Taggui\_taggui\models\cogagent-vqa-hf. Can't instantiate a processor, a tokenizer, an image processor or a feature extractor for this model. Make sure the repository contains the files of at least one of those processing classes.

Feature request/bug: Autocaption crashes on illegal file types.

If illegal/unsupported file types are present in folders it will crash autocaptioning. Example:
image

When the autocaptioner gets to the zip file it will crash.

Might add a filter that only attempts to load file extensions that are expected to work, like .jpg, .bmp, .webp, etc. Or, could use a try/catch when using PIL to load to skip any files that fail to load.

CogVLM doesn't seem to load correctly in 4-bit

Hi, thanks for your great work.

CogVLM doesn't seem to load correctly in 4-bit.

According to CogVLM's README
For INT4 quantization: 1 * RTX 3090(24G) (CogAgent takes ~ 12.6GB, CogVLM takes ~ 11GB)

But when I check “Load in 4-bit” and load CogVLM, it still takes up close to 32GB VRAM.

I think the problem may be here

You can check the code from CogVLM. They load the model in 4-bit like this.

Requirements should be Python3.11 or higher + Linux/X11 comments

Python3.11 introduced a new StrEnum that is not available in 3.10

It is used in taggui, hence python3.10 cannot be used.
Python3.10 is currently the default in quite a few distributions. If it's a small change, this could make it easier to get it to run.
I've made a python3.11 virtualenv that seems to work fine; though i had to make sure libxcb-cursor0 is installed to get x11-backend for Qt to work properly.
The error for this is very unobvious.

Next to this, there might be some gentle fixes that'd make getting it to run on Linux a bit easier:

  • No explanation about how to use virtualenv/pip3 for 'generic users'
  • Running 'taggui/run_gui.py' wont work, it requires you to start it as 'python3 taggui/run_gui.py' as it doesnt include #! to declare interpreter.
  • Comments about XCB/Wayland debugging (just a pointer to some other webpage) could be handy
    For the rest, still checking out the tool. Thanks for the work already!

failed to execute script

Traceback (most recent call last):
File "run_gui.py", line 24, in
File "run_gui.py", line 18, in run_gui
File "widgets\main_window.py", line 153, in init
File "widgets\main_window.py", line 530, in restore
File "widgets\main_window.py", line 184, in load_directory
File "models\image_list_model.py", line 100, in load_directory
File "pathlib.py", line 1059, in read_text
File "encodings\cp1252.py", line 23, in decode
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 116: character maps to

Blip captioning sometimes produces characters outside what this app will accept. The larger issue is once a directory that contains caption files that have offending characters has produced this error, you can no longer open the app unless you rename/relocate the offending directory, which is fine, unless you forget which directory you last attempted. Please provide a fix or workaround. Where does the app store the record of last directory attempted?

How can I restore default preferences?

If I changed the advanced settings, is it possible to restore everything to default? I tried deleting everything related to the installation and reinstalling, it didn't help.

Support for .webp images

It would be nice to add support for opening folders that contain .webp images. Currently when trying to open one, it doesn't want to open it with no error message. When quitting and reopening the software, it throws an error window and quit the program. When reopening again, the software works again but it asks for a new folder.

Feature request: word wrap on tag editing

Captioned data:
image
Then on double click to edit, it reverts to single line:
image

This is very difficult to navigate when editing. I think it would be nice if it worked like the Prompt box, which has word wrap. I find myself wanting to tweak autocaptioned data slightly to correct errors but when the text scrolls it is difficult to navigate, even though I know to use keyboard shortcuts such as ctrl-arrow keys, home, end, etc.

I think carriage returns are generally not used in captions? Shift-enter doesn't seem to behave differently than just "enter" key so I don't think the application supports them in the caption data anyway. I would think word wrapping would not cause issues if it were to behave like the autocaption Prompt input box, like below:

image

ERROR: torch-2.1.2+cu121-cp311-cp311-win_amd64.whl is not a supported wheel on this platform

I am getting this error when installing the requirements.txt file. I am on Windows 10 with Python 3.10.9 and CUDA 11.8.

(venv) C:\Users\r\taggui>pip install -r requirements.txt
Ignoring bitsandbytes: markers 'platform_system != "Windows"' don't match your environment
Collecting bitsandbytes==0.41.2.post2 (from -r requirements.txt (line 11))
  Downloading https://github.com/jllllll/bitsandbytes-windows-webui/releases/download/wheels/bitsandbytes-0.41.2.post2-py3-none-win_amd64.whl (152.7 MB)
     ---------------------------------------- 152.7/152.7 MB 36.4 MB/s eta 0:00:00
Ignoring torch: markers 'platform_system != "Windows"' don't match your environment
ERROR: torch-2.1.2+cu121-cp311-cp311-win_amd64.whl is not a supported wheel on this platform.

(venv) C:\Users\r\taggui>pip install https://download.pytorch.org/whl/cu121/torch-2.1.2%2Bcu121-cp311-cp311-win_amd64.whl
ERROR: torch-2.1.2+cu121-cp311-cp311-win_amd64.whl is not a supported wheel on this platform.```

[Model Request] Anime Captioning model in WD14Tagger

Clear Image Filter Suggestion

Suggested to clear Search Tags input while Clear Image Filter clicked. Also, clear Search Tags after changing dir

Feature request: Copy tags from one image to another

Would be nice to be able to transfer tags from one image to another.
A common use case is tagging a set of images that are related to each other. Often many tags are common -- when I'm going down a list of images, I like to use the previous image's captions as a starting point.
Didn't find a way to do this in the gui.

PySide6-6.6.2 error on start (python 3.11.8 on windows)

After updating to df23360

Startup error on Windows using latest requirements.txt (PySide6-6.6.2), working normally using PySide6-6.6.1:

PS C:\Users\user\projects\taggui> .\venv\scripts\activate.ps1
(venv) PS C:\Users\user\projects\taggui> git pull
Already up to date.
(venv) PS C:\Users\user\projects\taggui> pip install --upgrade -r requirements.txt
...
Successfully installed PySide6-6.6.2 PySide6-Addons-6.6.2 PySide6-Essentials-6.6.2 shiboken6-6.6.2
(venv) PS C:\Users\user\projects\taggui> python .\taggui\run_gui.py
Traceback (most recent call last):
  File "signature_bootstrap.py", line 77, in bootstrap
  File "signature_bootstrap.py", line 93, in find_incarnated_files
  File "C:\Users\user\AppData\Local\Programs\Python\Python311\Lib\pathlib.py", line 871, in __new__
    self = cls._from_parts(args)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\user\AppData\Local\Programs\Python\Python311\Lib\pathlib.py", line 509, in _from_parts
    drv, root, parts = self._parse_args(args)
                       ^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\user\AppData\Local\Programs\Python\Python311\Lib\pathlib.py", line 493, in _parse_args
    a = os.fspath(a)
        ^^^^^^^^^^^^
TypeError: expected str, bytes or os.PathLike object, not NoneType
Fatal Python error: could not initialize part 2
Python runtime state: initialized

Current thread 0x00002d98 (most recent call first):
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 1233 in create_module
  File "<frozen importlib._bootstrap>", line 573 in module_from_spec
  File "<frozen importlib._bootstrap>", line 676 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1147 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1176 in _find_and_load
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1232 in _handle_fromlist
  File "C:\Users\user\projects\taggui\venv\Lib\site-packages\PySide6\__init__.py", line 64 in _setupQtDirectories
  File "C:\Users\user\projects\taggui\venv\Lib\site-packages\PySide6\__init__.py", line 124 in <module>
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 940 in exec_module
  File "<frozen importlib._bootstrap>", line 690 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1147 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1176 in _find_and_load
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1126 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1176 in _find_and_load
  File "C:\Users\user\projects\taggui\taggui\run_gui.py", line 4 in <module>

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, xxsubtype (total: 14)
(venv) PS C:\Users\user\projects\taggui> pip install --upgrade PySide6==6.6.1
...
Successfully installed PySide6-6.6.1 PySide6-Addons-6.6.1 PySide6-Essentials-6.6.1 shiboken6-6.6.1
(venv) PS C:\Users\user\projects\taggui> python .\taggui\run_gui.py
bin C:\Users\user\projects\taggui\venv\Lib\site-packages\bitsandbytes\libbitsandbytes_cuda121.dll
(venv) PS C:\Users\user\projects\taggui> # No errors with PySide6==6.6.1

[Feature Request] Multiple tag selection & exclusion

First of all, great job on the UI! Really easy to use and allows for fast tagging as advertised.

Now for my feature requests. It would be great to have the ability to select multiple tags for the filtering, as in, select all the images that have a certain set of tags. It would also be nice to be able to do exclusions based on the tags, that is, show all the images except for the ones with a particular (or several) tag(s).

Feature request: Vertical resize of Prompt field for autocaption.

image
The prompt window is pretty small by default and is difficult to navigate using the tiny scroll bar on the right. It can be expanded horizontally but isn't very space efficient to do so.

It would be nice to have a control to increase the vertical size of this box.

Bug: Images do not use EXIF rotation directive

image
On the left is Windows Explorer thumbnail view, on the right is Taggui.

These are images taken with a phone, and the way most phones store is to store directly and then put a tag in EXIF of the correct orientation. Unfortunately by default PIL ignores this and you have to explicitly call a function in PIL ImageOps to transform it.

import PIL.ImageOps as ImageOps
...
image = PIL.Image.open(path_name).convert('RGB')
image = ImageOps.exif_transpose(image)

Model Request: Moondream

Please add support for this model. https://github.com/vikhyat/moondream

An extra idea which may be feasible or unfeasible (I do not know) is maybe speculative decoding using a smaller model like this. https://arxiv.org/abs/2310.07177

My experience with speculative decoding in LLMs at least is that it greatly speeds up inference time, and perhaps doing the same thing with like cogvlm as a main model and moondream as a speculative decoding model could speed up captioning of large datasets.

Feate request: change images by Page Up / Page Down as well

Going though a huge list of images and working on the tags the meaning of arrow keys / cursor up and down is to change the selected tag in the top right window as it has the focus.
To get to the next image I have to click on the left and then I can use the a arrow keys to switch between images again.

I would be great when page up / down would always switch to the next image, that would save me multiple clicks - per image!

[error] You shouldn't move a model when it is dispatched on multiple devices.

Loading BLIP-2 model...
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards: 50%|##### | 1/2 [00:23<00:23, 23.80s/it]
Loading checkpoint shards: 100%|##########| 2/2 [00:35<00:00, 16.76s/it]
Loading checkpoint shards: 100%|##########| 2/2 [00:35<00:00, 17.82s/it]
You shouldn't move a model when it is dispatched on multiple devices.
Traceback (most recent call last):
File "widgets\blip_2_captioner.py", line 316, in run
File "accelerate\big_modeling.py", line 410, in wrapper
RuntimeError
:
You can't move a model that has some modules offloaded to cpu or disk.

I have selected "CPU" for blip 2 captioning, as my vram is 8GB. My system ram is 44GB so easily capable of handling blip2.

Feature request: wildcards in filter

It would be great when I could use wildcards (a * would be enough, a complete RegEx could be overkill) in the tag filter to filter the images shown

Unhandled exception

Traceback (most recent call last):
File "run_gui.py", line 24, in
File "run_gui.py", line 18, in run_gui
File "widgets\main_window.py", line 153, in init
File "widgets\main_window.py", line 530, in restore
File "widgets\main_window.py", line 184, in load_directory
File "models\image_list_model.py", line 100, in load_directory
File "pathlib.py", line 1059, in read_text
File "encodings\cp1252.py", line 23, in decode
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 321: character maps to

The above happens when using captionr to create blip captions. It appears to be related to characters used in the caption txt files. If I manually edit the caption files to remove certain suspected words or puncuation, taggui will load. Is there a limit to how many files this app can handle? Are there ASCII or nonASCII characters that do not work? I am pulling captions via blip from ViT-L-14/openai

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.