Loading article...

SoloTagger v0.12: Easier Prompt Editing

  SoloTagger is an LLM-based image captioning tool.  

Ever since I built this little tool, SoloTagger, it has become my main tool for dataset annotation. Recently all of my dataset labeling work has been done with it.

To create high quality LoRAs, high quality dataset annotations are essential. Generating captions with a large language model is similar to generating images with one. If you want good results, a well designed prompt is necessary.

Different datasets and different goals require different tagging styles, which means different prompts. Over the past few days I have been tweaking and optimizing the prompts used by SoloTagger. During this process I ran into an annoying issue. In the previous version all prompts were stored in a JSON file, and manually editing JSON files is something I really hate.

So SoloTagger v0.12 was born. 😄

## What's Changed
- Added a prompt.txt file to store prompts.
- The config.json file now only keeps two parameters: API_URL and models.

Prompt Editing Guide

The structure of the prompt.txt file looks like this:
```
==Natural Language==
Describe the image content in natural language, in no more than 200 words.

==Tags==
Describe the content of the image using tags.

==Titre==
Veuillez décrire le contenu de l'image en détail.

==Заголовок==
Пожалуйста, подробно опишите содержание изображения.

==标题==
请描述图片内容

==タイトル==
画像の内容を詳しく説明してください。
```

You can store multiple prompts in the file. Each prompt starts with a title in the format ==Title==, followed by the actual prompt content.

The title is only used as a display label so users can select different prompts. It is not sent to the language model.
The title must be written exactly in this format: ==Title==.

Below the title is the prompt content that will be sent to the model. You can write these prompts based on your own needs. Different prompts should be separated by a blank line.

When editing prompts, you do not need to worry about characters or formatting. Before sending the prompt, SoloTagger will automatically escape line breaks, quotation marks, and other characters when necessary.

The prompt.txt file uses UTF-8 encoding, so it supports multiple languages.

For detailed instructions on using SoloTagger and installing and setting up LM Studio on Windows, please see the previous article:
https://sololo.xyz/article/24-solotagger-local-joycaption-beta-one-gguf-setup-on-windows-via-lm-studio

Usage Notes

I deployed JoyCaption beta one and Qwen3.5 0.8B, 4B, and 9B on my laptop, all in GGUF format.
After a few days of testing and comparison, here are some of my personal impressions:
- Qwen3.5 0.8B is the smallest model, so it is the fastest. The tagging quality is not bad, but it is slightly weaker than the larger models.

- JoyCaption beta one and Qwen3.5 9B both handle tagging tasks very well, and the speed is also quite good.

- Qwen3.5 follows prompts more reliably.

- When using longer prompts, around 500 words, Qwen3.5 9B produces better outputs and can even outperform JoyCaption.



Download: SoloTagger_v0.12.zip

You Might Also Like View All
Cover: SoloCropper: Efficient Human Image Cropping for Dataset Preparation
SoloCropper: Efficient Human Image Cropping for Dataset Preparation
Cover: Gemma 4: A Powerful New Weapon for SoloTagger
Gemma 4: A Powerful New Weapon for SoloTagger
Cover: Z Image LoRA Commissions Now Open
Z Image LoRA Commissions Now Open
Cover: Price Update for Model Orders
Price Update for Model Orders
Cover: SoloTagger v0.20: Improved Caption Output Quality
SoloTagger v0.20: Improved Caption Output Quality
Cover: SoloTagger: Local JoyCaption Beta One GGUF Setup on Windows via LM Studio
SoloTagger: Local JoyCaption Beta One GGUF Setup on Windows via LM Studio
Cover: Basic Workflow: FLUX.2-klein I2I v2.2  |  4-in-1 image editing
Basic Workflow: FLUX.2-klein I2I v2.2 | 4-in-1 image editing
Cover: Training Notes 2: Z Image LoRA - Phase Recap
Training Notes 2: Z Image LoRA - Phase Recap
Cover: The new version is coming. What surprises will it bring to the world this time?
The new version is coming. What surprises will it bring to the world this time?
Cover: Training Notes: My Take on Klein & Z Image
Training Notes: My Take on Klein & Z Image
Cover: Basic Workflow: Z Image T2I v1.3
Basic Workflow: Z Image T2I v1.3
Cover: Basic Workflow: FLUX.2-klein I2I v2.0  |  4-in-1 image editing
Basic Workflow: FLUX.2-klein I2I v2.0 | 4-in-1 image editing