Loading article...

SoloTagger: Local JoyCaption Beta One GGUF Setup on Windows via LM Studio

  SoloTagger is an LLM-based image captioning tool.  

Over the past couple of days I’ve been testing some of the smaller Qwen3.5 models on my laptop, including the 2B, 4B, and 9B versions. I wanted to see whether these mini models could be useful for my workflow.

Initially, I planned to use them to tag datasets for LoRA training. After many attempts they technically worked, but the results were not very satisfying.

When it comes to image tagging, JoyCaption naturally comes to mind. So I casually made this simple little tool, let's call it: SoloTagger .

Running JoyCaption locally on a laptop isn’t super fast, but it’s still acceptable. For my tagging needs, the speed is good enough.

First, SoloTagger is just a simple Python script with two basic functions:

1. Send images and tagging instructions to JoyCaption.

2. Receive the output from JoyCaption and save it into a TXT file.

SoloTagger itself does not run the model. It relies on a third-party LLM runtime, such as LM Studio, Ollama, or other model runners. You can think of these tools as a “player” for large language models.
In this guide, I use LM Studio as the example because it has a graphical interface, works well on Windows, and is easy for regular users to get started with.

Also, I always try to keep things simple. So SoloTagger does not include unnecessary third-party libraries just for convenience. It has zero external dependencies. As long as you have Python installed, you can run it.

There are many ways and tools to tag images, and JoyCaption itself can be used in different ways. SoloTagger was just a quick idea I had and threw together as a small, simple tool.
If you want to run JoyCaption locally, you can use this method as a reference, or just download and use SoloTagger directly.

I know that releasing a tagging tool now is quite outdated, not to mention how simple this tool is. However, this post serves as a record of my own daily use and can also act as an introductory guide for beginners.

## I. Basic Hardware Configuration and LM Studio Installation
### A. Hardware Configuration
Still using my old partner, the low-spec laptop. The basic configuration is as follows:
- CPU: Intel Core Ultra 5 225H

- RAM: 32GB

- GPU: NVIDIA GeForce RTX 5060 Laptop

- VRAM: 8GB

### B. Download and Install LM Studio
LM Studio official website: https://lmstudio.ai

I downloaded the Windows version 0.4.6. The installation is straightforward. Just keep pressing Next until it finishes.
When you start it for the first time after installation, it will ask if you want to download a model. The default small model it suggests doesn't seem very useful, so you don't need to download it if you don't want to.

LM Studio uses a folder called App home directory to store models and related data. The default location is:

C:\Users\XXX\.lmstudio

Here XXX is your Windows username.

If you want to move it somewhere else, the simplest way is to create a Symbolic Link as follows:
a. Move the entire .lmstudio folder to the location you want.
Note: do not change the folder name. The dot before lmstudio must stay.

b. Press Win + R, type cmd, and open the command prompt.

c. Run the following command:
``
mklink /D C:\Users\XXX\.lmstudio D:\abc\.lmstudio
``
Replace XXX with your Windows username.
Replace D:\abc with the location where you want to store the folder.

After that, start LM Studio and you should see an interface like this.
LM Studio Interface

Click New Chat to open a conversation window with the model. Then click Pick a model at the bottom and choose one of the models you downloaded to start chatting.

No extra setup is required. Once the model is downloaded, you can use it right away. The experience is basically the same as using LLMs through a typical web interface.
SelectModel

## II. Using JoyCaption in LM Studio
### A. Download the JoyCaption Model
On my laptop, LM Studio stores models in this folder:
.lmstudio\models\lmstudio-community.
Your path should be similar.
You can create different subfolders here to organize models. Folder names can be anything that helps you recognize them. For example, I created these three folders:

- JoyCaption-beta-one-GGUF

- Qwen3.5-4B-GGUF

- Qwen3.5-9B-GGUF

The latest JoyCaption version is Beta One. For running on a laptop, the GGUF version is usually the best choice. Pick a GGUF model that fits your hardware. I currently use llama-joycaption-beta-one-hf-llava.i1-Q6_K.gguf.

Just search joycaption beta one gguf on HuggingFace and you will easily find the models. For example, these two repositories provide many different builds:

https://huggingface.co/Mungert/llama-joycaption-beta-one-hf-llava-GGUF/tree/main
https://huggingface.co/mradermacher/llama-joycaption-beta-one-hf-llava-GGUF/tree/main

My suggestion is to choose a model slightly smaller than your available VRAM.
In general, smaller models run faster but produce lower quality results. If you want, you can try different versions and compare.

You also need to download an mmproj file. A simple way to think about it is that this file acts like the model’s “eyes”. Without it, the model cannot see images.
You can download the mmproj file from either of these repositories:
https://huggingface.co/JermemyHaschal/llama-joycaption-beta-one-hf-llava-gguf/tree/main
https://huggingface.co/concedo/llama-joycaption-beta-one-hf-llava-mmproj-gguf/tree/main

### B. Using the JoyCaption Model
Once you have downloaded the two files above, you can start using JoyCaption in LM Studio.
After loading the model, click the + button in the lower right corner to send an image to the model.
Send Image

If you click the + button and don’t see the Attach image option, the most likely reason is that the mmproj file is missing or not working properly.

JoyCaption is designed specifically for image captioning, so once you send it an image, it will generate a description automatically, even without any prompt.
JoyCaption Auto Caption

If you have specific requirements for the output, you can provide instructions to JoyCaption in the chat just as you would with any other large language model.
Getting JoyCaption to produce descriptions that better match your needs is also similar to using other LLMs. You need to adjust and refine your prompts. This isn’t the focus of this article, so I won’t go into detail. There are plenty of useful references online. In my experience, ChatGPT and Gemini can help write good prompts and refine them based on your feedback.

At this point, JoyCaption is already fully set up and running locally. In fact, you don’t really need something like SoloTagger at all.

If you only want to send a few images to JoyCaption for testing, you can stop here. There’s no need to read further.




However…



If you often need to caption dozens or even hundreds of images with different requirements, and you want to generate txt files with the same names as the images, then a small tool like SoloTagger becomes useful to handle some dirty work. There are two main reasons, which correspond to the two functions mentioned earlier.

First, LM Studio currently cannot directly create or edit files like VS Code. The captions generated by JoyCaption have to be copied and pasted manually. Doing this by hand is tedious and error-prone.
Of course, if you are a wicked employer with workhorses like OpenClaw working for you, just ignore this.

Second, and more importantly, sending multiple images to JoyCaption at once greatly increases VRAM usage. On my entry-level laptop, it quickly becomes sluggish.
If you use Qwen3.5-2B-GGUF, Qwen3.5-4B-GGUF, or even smaller models, the speed is fine, but new and more serious problems arise: the model becomes forgetful and mixes things up. If you send it 10 images, it might only output 7 descriptions, and the order of the output could be jumbled. You would then have to check each description against the corresponding image one by one 😵‍💫.

SoloTagger, and many other captioning tools, help solve these two problems.

III. SoloTagger Quick Guide

### A. Structure and Configuration
SoloTagger mainly consists of one folder and two files.
#### 1. Folder: img
Place the images you want to caption in this folder.
The generated txt caption files will also be saved here.
You can also change the path in the configuration if you prefer using another folder.
#### 2. Main File: SoloTagger.py
There are at least three ways to run it.

a. Run it from the command line:
``
python SoloTagger.py
``
No arguments are required.

b. Double-click the SoloTagger.py file.

If your Windows PATH environment variable is set correctly for Python, you can simply double-click the .py script to run it.

c. Create a simple .bat file to run it.

Personally, I think this is a bit unnecessary.
#### 3. Config File: config.json
To be honest, I hate manually editing JSON files, but I endured it to avoid introducing extra third-party libraries.
Open config.json and you will see something like this:
``
{
"image_folders": [
"img"
],
"API_URL": "http://127.0.0.1:1234/v1/chat/completions",
"models": {
"1": "joycaption-beta-one",
"2": "qwen3.5-9b",
"3": "qwen3.5-4b"
},
"prompts": [
{
"title": "Natural Caption",
"text": "Describe this image in a concise, natural paragraph. Focus on the main subject and the environmental atmosphere. Use descriptive but direct language. No artistic fluff, just the facts."
},
{
"title": "TagGen Minimal",
"text": "List only comma-separated tags. No sentences. Focus on: subject, pose, clothing, background, lighting, and style."
}
]
}
``

The configuration file has four sections.

a. image_folders

This sets the path of the images to be captioned. The default location is the img folder inside the SoloTagger directory.

You can change it to another path, for example: d:/abc/xyz/.

Usually there is no need to modify it. If you do change it, be careful with the path separator.
The “/” must not be replaced with the Windows default “\”.

Correct: d:/abc/xyz/
Incorrect: d:\abc\xyz\

b. API_URL

The API address of the model you want to call.
See the next section for details.

c. models

The name of the model to use.
In this example, the values for b and c come from LM Studio.
You need to start the Local Server in LM Studio to get them.
See the image below for reference.
LM Studio Local Server

Follow the three steps in the image above in order:
1. Open the Local Server interface;
2. Start the service; load the model;
3. Then you will obtain the two items needed for the config file: models and API_URL.

The http://127.0.0.1:1234/v1/chat/completions in the config file is the default API address used by LM Studio.

d. prompts

The prompt is what has the greatest impact on tagging quality; these are the instructions and rules given to the model. I have written two of the simplest examples: natural language description and tag-style description.

Different images and different purposes require different tagging standards, and everyone has their own specific requirements. Therefore, you can modify or add new prompts according to your own needs.

B. SoloTagger Workflow

Even though I’ve rambled on quite a bit, using SoloTagger is actually extremely simple. Just three steps:
1. Run LM Studio and start the Local Server.

2. Put the images you want to tag into the target folder.

3. Double-click the file to run SoloTagger.

As long as the LM Studio Local Server is running, SoloTagger will work. Even if the model is not loaded yet, LM Studio will automatically load the required model when it receives the request.

During the process, you can also see the progress in the LM Studio interface.

SoloTagger in Action

### C. Common Errors
To make troubleshooting easier, the script includes several basic error messages and hints.

1. Error: config file not found: {config_path}
config.json cannot be found at startup.

2. Error: failed to load config file as JSON: {e}
config.json is not valid JSON, or it failed to load.

3. Error: '{key}' must be a non-empty list in config.json
image_folders or prompts is missing, not a list, or empty.

4. Error: each item in 'image_folders' must be a non-empty string
image_folders contains an empty string or a non-string value.

5. Error: 'models' must be a non-empty mapping in config.json
models is missing, not an object (dict), or empty.

6. Error: 'API_URL' must be a non-empty string in config.json
API_URL is missing, not a string, or empty.

7. Error: image folder does not exist: {image_folder}
The selected image folder does not exist.

8. Error: Unable to access APIURL: {apiurl}
The API endpoint is unreachable (network error, invalid URL, etc.).

9. Error: Selected model is unavailable: {target_model}
During the pre-check, the model is not found in the /models list, or the server reports the model as unavailable.

10. Error: No available images to process. All images already have .txt tags.
Every image already has a corresponding .txt file.

11. Server response: {body}
When the model is unavailable, the script prints the raw error response from the server.

12. Error processing {file}: HTTP {e.code} - {body}
The API returned an HTTP error while processing a specific image.

13. Error processing {file}: {e}
A URL/network error or another unexpected exception occurred while processing an image.

14. Available models: ...
The selected model is not in the server’s available model list. The script will show the models that are currently available.

15. Warning: No image files found in the folder.
No .png, .jpg, .jpeg, or .webp image files were found in the target folder.


Download: SoloTagger_v0.1.zip

I didn’t expect such a simple little tool to end up with such a long explanation. Maybe I’m just getting older and more talkative.😮‍💨

BTW, those Qwen3.5 mini models I mentioned at the beginning still haven’t shown much practical use for me so far.



What I'm thinking now is:

within a few years, whether it's simple scripts like SoloTagger or many of the software people commonly use today, they will all become obsolete.

In the near future, the way people use computers, phones, and other devices will undergo a massive change.

And in the foreseeable future, something as clunky and inelegant as a computer is something only silicon-based lifeforms would use...

You Might Also Like View All
Cover: SoloCropper: Efficient Human Image Cropping for Dataset Preparation
SoloCropper: Efficient Human Image Cropping for Dataset Preparation
Cover: Gemma 4: A Powerful New Weapon for SoloTagger
Gemma 4: A Powerful New Weapon for SoloTagger
Cover: Z Image LoRA Commissions Now Open
Z Image LoRA Commissions Now Open
Cover: Price Update for Model Orders
Price Update for Model Orders
Cover: SoloTagger v0.20: Improved Caption Output Quality
SoloTagger v0.20: Improved Caption Output Quality
Cover: SoloTagger v0.12: Easier Prompt Editing
SoloTagger v0.12: Easier Prompt Editing
Cover: Basic Workflow: FLUX.2-klein I2I v2.2  |  4-in-1 image editing
Basic Workflow: FLUX.2-klein I2I v2.2 | 4-in-1 image editing
Cover: Training Notes 2: Z Image LoRA - Phase Recap
Training Notes 2: Z Image LoRA - Phase Recap
Cover: The new version is coming. What surprises will it bring to the world this time?
The new version is coming. What surprises will it bring to the world this time?
Cover: Training Notes: My Take on Klein & Z Image
Training Notes: My Take on Klein & Z Image
Cover: Basic Workflow: Z Image T2I v1.3
Basic Workflow: Z Image T2I v1.3
Cover: Basic Workflow: FLUX.2-klein I2I v2.0  |  4-in-1 image editing
Basic Workflow: FLUX.2-klein I2I v2.0 | 4-in-1 image editing
Table of Contents