
<span class="emp-dy"> SoloTagger is an LLM-based image captioning tool. </span>
Over the past couple of days I’ve been testing some of the smaller **Qwen3.5 models** on my laptop, including the **2B, 4B, and 9B versions**. I wanted to see whether these mini models could be useful for my workflow.
Initially, I planned to use them to tag datasets for LoRA training. After many attempts they technically worked, but the results were not very satisfying.
When it comes to image tagging, **JoyCaption** naturally comes to mind. So I casually made this simple little tool, let's call it: <span class="emp-dg"> SoloTagger </span>.
Running JoyCaption locally on a laptop isn’t super fast, but it’s still acceptable. For my tagging needs, the speed is good enough.
First, **SoloTagger** is just a simple Python script with two basic functions:
1. **Send images and tagging instructions to JoyCaption.**
2. **Receive the output from JoyCaption and save it into a TXT file.**
SoloTagger itself does **not** run the model. It relies on a third-party LLM runtime, such as **LM Studio**, **Ollama**, or other model runners. You can think of these tools as a **“player”** for large language models.
In this guide, I use **LM Studio** as the example because it has a graphical interface, works well on Windows, and is easy for regular users to get started with.
Also, I always try to **keep things simple**. So SoloTagger does not include unnecessary third-party libraries just for convenience. It has **zero external dependencies**. As long as you have Python installed, you can run it.
There are many ways and tools to tag images, and JoyCaption itself can be used in different ways. SoloTagger was just a quick idea I had and threw together as a small, simple tool.
If you want to run JoyCaption locally, you can use this method as a reference, or just download and use SoloTagger directly.
**I know that releasing a tagging tool now is quite outdated, not to mention how simple this tool is. However, this post serves as a record of my own daily use and can also act as an introductory guide for beginners.**
## I. Basic Hardware Configuration and LM Studio Installation
### A. Hardware Configuration
Still using my old partner, the low-spec laptop. The basic configuration is as follows:
- **CPU**: Intel Core Ultra 5 225H
- **RAM**: 32GB
- **GPU**: NVIDIA GeForce RTX 5060 Laptop
- **VRAM**: 8GB
### B. Download and Install LM Studio
LM Studio official website: [https://lmstudio.ai](https://lmstudio.ai)
I downloaded the **Windows version 0.4.6**. The installation is straightforward. Just keep pressing **Next** until it finishes.
When you start it for the first time after installation, it will ask if you want to download a model. The default small model it suggests **doesn't seem very useful**, so you don't need to download it if you don't want to.
LM Studio uses a folder called <span class="emp-y">App home directory</span> to store models and related data. The default location is:
`C:\Users\XXX\.lmstudio`
Here **XXX** is your Windows username.
If you want to move it somewhere else, the simplest way is to create a Symbolic Link as follows:
a. Move the entire <span class="emp-y">.lmstudio</span> folder to the location you want.
**Note: do not change the folder name. The dot before _lmstudio_ must stay.**
b. Press <span class="emp-dg">Win</span> + <span class="emp-dr">R</span>, type <span class="emp-y">cmd</span>, and open the command prompt.
c. Run the following command:
```
mklink /D C:\Users\XXX\.lmstudio D:\abc\.lmstudio
```
Replace **XXX** with your Windows username.
Replace **D:\abc** with the location where you want to store the folder.
After that, start **LM Studio** and you should see an interface like this.

Click <span class="emp-dg">New Chat</span> to open a conversation window with the model. Then click <span class="emp-dg">Pick a model</span> at the bottom and choose one of the models you downloaded to start chatting.
No extra setup is required. Once the model is downloaded, you can use it right away. The experience is basically the same as using LLMs through a typical web interface.

## II. Using JoyCaption in LM Studio
### A. Download the JoyCaption Model
On my laptop, LM Studio stores models in this folder:
`.lmstudio\models\lmstudio-community`.
Your path should be similar.
You can create different subfolders here to organize models. Folder names can be anything that helps you recognize them. For example, I created these three folders:
- JoyCaption-beta-one-GGUF
- Qwen3.5-4B-GGUF
- Qwen3.5-9B-GGUF
The latest JoyCaption version is **Beta One**. For running on a laptop, the **GGUF** version is usually the best choice. Pick a GGUF model that fits your hardware. I currently use **llama-joycaption-beta-one-hf-llava.i1-Q6_K.gguf**.
Just search **joycaption beta one gguf** on HuggingFace and you will easily find the models. For example, these two repositories provide many different builds:
https://huggingface.co/Mungert/llama-joycaption-beta-one-hf-llava-GGUF/tree/main
https://huggingface.co/mradermacher/llama-joycaption-beta-one-hf-llava-GGUF/tree/main
My suggestion is to choose a model slightly smaller than your available VRAM.
In general, smaller models run faster but produce lower quality results. If you want, you can try different versions and compare.
You also need to download an <span class="emp-y">mmproj</span> file. A simple way to think about it is that this file acts like the model’s **“eyes”**. Without it, the model cannot see images.
You can download the mmproj file from either of these repositories:
https://huggingface.co/JermemyHaschal/llama-joycaption-beta-one-hf-llava-gguf/tree/main
https://huggingface.co/concedo/llama-joycaption-beta-one-hf-llava-mmproj-gguf/tree/main
### B. Using the JoyCaption Model
Once you have downloaded the two files above, you can start using JoyCaption in LM Studio.
After loading the model, click the <span class="emp-dg">+</span> button in the lower right corner to send an image to the model.

If you click the <span class="emp-dg">+</span> button and don’t see the <span class="emp-dg">Attach image</span> option, the most likely reason is that the **mmproj file is missing or not working properly**.
JoyCaption is designed specifically for **image captioning**, so once you send it an image, it will generate a description automatically, even without any prompt.

If you have specific requirements for the output, you can provide instructions to **JoyCaption** in the chat just as you would with any other large language model.
Getting JoyCaption to produce descriptions that better match your needs is also similar to using other LLMs. You need to adjust and refine your prompts. This isn’t the focus of this article, so I won’t go into detail. There are plenty of useful references online. In my experience, **ChatGPT** and **Gemini** can help write good prompts and refine them based on your feedback.
At this point, JoyCaption is already fully set up and running locally. In fact, you don’t really need something like **SoloTagger** at all.
If you only want to send a few images to JoyCaption for testing, you can stop here. There’s no need to read further.
<br><br><br>
**However…**
<br><br><br>
If you often need to caption dozens or even hundreds of images with different requirements, and you want to generate **txt files with the same names as the images**, then a small tool like **SoloTagger** becomes useful to handle some dirty work. There are two main reasons, which correspond to the two functions mentioned earlier.
First, **LM Studio currently cannot directly create or edit files like VS Code**. The captions generated by JoyCaption have to be copied and pasted manually. Doing this by hand is tedious and error-prone.
Of course, if you are a wicked employer with workhorses like **OpenClaw** working for you, just ignore this.
Second, and more importantly, **sending multiple images to JoyCaption at once greatly increases VRAM usage**. On my entry-level laptop, it quickly becomes sluggish.
If you use **Qwen3.5-2B-GGUF**, **Qwen3.5-4B-GGUF**, or even smaller models, the speed is fine, but new and **more serious problems** arise: the model becomes forgetful and mixes things up. If you send it 10 images, it might only output 7 descriptions, and the order of the output could be jumbled. You would then have to check each description against the corresponding image one by one 😵💫.
**SoloTagger**, and many other captioning tools, help solve these two problems.
## III. SoloTagger Quick Guide
### A. Structure and Configuration
SoloTagger mainly consists of <span class="emp-y">one folder and two files</span>.
#### 1. Folder: img
Place the images you want to caption in this folder.
The generated **txt caption files** will also be saved here.
You can also change the path in the configuration if you prefer using another folder.
#### 2. Main File: SoloTagger.py
There are **at least three ways** to run it.
a. Run it from the command line:
```
python SoloTagger.py
```
No arguments are required.
b. Double-click the **SoloTagger.py** file.
If your Windows **PATH environment variable** is set correctly for Python, you can simply double-click the `.py` script to run it.
c. Create a simple **.bat file** to run it.
Personally, I think this is a bit unnecessary.
#### 3. Config File: config.json
To be honest, I hate manually editing JSON files, but I endured it to avoid introducing extra third-party libraries.
Open **config.json** and you will see something like this:
```
{
"image_folders": [
"img"
],
"API_URL": "http://127.0.0.1:1234/v1/chat/completions",
"models": {
"1": "joycaption-beta-one",
"2": "qwen3.5-9b",
"3": "qwen3.5-4b"
},
"prompts": [
{
"title": "Natural Caption",
"text": "Describe this image in a concise, natural paragraph. Focus on the main subject and the environmental atmosphere. Use descriptive but direct language. No artistic fluff, just the facts."
},
{
"title": "TagGen Minimal",
"text": "List only comma-separated tags. No sentences. Focus on: subject, pose, clothing, background, lighting, and style."
}
]
}
```
The configuration file has **four sections**.
a. image_folders
This sets the path of the images to be captioned. The default location is the **img** folder inside the SoloTagger directory.
You can change it to another path, for example: `d:/abc/xyz/`.
Usually there is no need to modify it. If you do change it, be careful with the path separator.
The **“/” must not be replaced with the Windows default “\”**.
✅ **Correct:** `d:/abc/xyz/`
❌ **Incorrect:** `d:\abc\xyz\`
b. API_URL
The API address of the model you want to call.
See the next section for details.
c. models
The name of the model to use.
In this example, the values for **b** and **c** come from **LM Studio**.
You need to **start the Local Server in LM Studio** to get them.
See the image below for reference.

Follow the three steps in the image above in order:
1. Open the **Local Server** interface;
2. Start the service; load the model;
3. Then you will obtain the two items needed for the config file: **models** and **API_URL**.
The `http://127.0.0.1:1234/v1/chat/completions` in the config file is the **default API address used by LM Studio**.
d. prompts
The prompt is what has the greatest impact on tagging quality; these are the instructions and rules given to the model. I have written two of the simplest examples: natural language description and tag-style description.
Different images and different purposes require different tagging standards, and everyone has their own specific requirements. Therefore, you can modify or add new prompts according to your own needs.
### B. SoloTagger Workflow
Even though I’ve rambled on quite a bit, using SoloTagger is actually extremely simple. Just three steps:
1. Run **LM Studio** and start the **Local Server**.
2. Put the images you want to tag into the target folder.
3. Double-click the file to run **SoloTagger**.
As long as the **LM Studio Local Server** is running, SoloTagger will work. Even if the model is not loaded yet, **LM Studio will automatically load the required model** when it receives the request.
During the process, you can also **see the progress in the LM Studio interface**.

### C. Common Errors
To make troubleshooting easier, the script includes several basic **error messages and hints**.
1. `Error: config file not found: {config_path}`
`config.json` cannot be found at startup.
2. `Error: failed to load config file as JSON: {e}`
`config.json` is not valid JSON, or it failed to load.
3. `Error: '{key}' must be a non-empty list in config.json`
`image_folders` or `prompts` is missing, not a list, or empty.
4. `Error: each item in 'image_folders' must be a non-empty string`
`image_folders` contains an empty string or a non-string value.
5. `Error: 'models' must be a non-empty mapping in config.json`
`models` is missing, not an object (dict), or empty.
6. `Error: 'API_URL' must be a non-empty string in config.json`
`API_URL` is missing, not a string, or empty.
7. `Error: image folder does not exist: {image_folder}`
The selected image folder does not exist.
8. `Error: Unable to access API_URL: {api_url}`
The API endpoint is unreachable (network error, invalid URL, etc.).
9. `Error: Selected model is unavailable: {target_model}`
During the pre-check, the model is not found in the `/models` list, or the server reports the model as unavailable.
10. `Error: No available images to process. All images already have .txt tags.`
Every image already has a corresponding `.txt` file.
11. `Server response: {body}`
When the model is unavailable, the script prints the raw error response from the server.
12. `Error processing {file}: HTTP {e.code} - {body}`
The API returned an HTTP error while processing a specific image.
13. `Error processing {file}: {e}`
A URL/network error or another unexpected exception occurred while processing an image.
14. `Available models: ...`
The selected model is not in the server’s available model list. The script will show the models that are currently available.
15. `Warning: No image files found in the folder.`
No `.png`, `.jpg`, `.jpeg`, or `.webp` image files were found in the target folder.
<hr>
**Download:** <a href="https://assets.sololo.xyz/articles/024/SoloTagger_v0.1.zip">SoloTagger_v0.1.zip</a>
I didn’t expect such a simple little tool to end up with such a long explanation. Maybe I’m just getting older and more talkative.😮💨
BTW, those **Qwen3.5 mini models** I mentioned at the beginning still haven’t shown much practical use for me so far.
<br><br>
What I'm thinking now is:
within a few years, whether it's simple scripts like SoloTagger or many of the software people commonly use today, they will all become obsolete.
In the near future, the way people use computers, phones, and other devices will undergo a massive change.
And in the foreseeable future, something as clunky and inelegant as a **computer** is something only silicon-based lifeforms would use...