
Recently, most of my time and energy have been focused on experimenting with Z Image LoRA. In the 20+ days since my last *<a href="https://sololo.xyz/article/20-training-notes-my-take-on-klein-z-image">Training Note</a>*, I’ve completed **more than 80** additional Z Image LoRA training runs. Besides my own trial-and-error exploration, I’ve also learned from experiences shared by others in the community. Here I’m putting together some of my recent observations and takeaways as a brief phase recap.
## Disclaimer:
1. I’m **not a professiona**l, and this is **not a rigorous or scientific study**. It’s simply a record of my personal impressions and experiences.
2. Unless otherwise specified, all LoRAs mentioned below refer to real-person character LoRAs.
3. How good a LoRA is, or how closely it resembles the target person, is highly **subjective** and **personal**. People can have very different opinions, sometimes even completely opposite ones. That’s totally normal. If your view differs from mine on this, that’s perfectly fine. **You are right, there is no need to argue with me.**
**Note:** All images in this post can be opened in a new tab via right-click to view the <span class="emp-dg">original full-size version</span>.
---
## I. Training Tools
So far, the training tools I’ve used include **kohya-ss musubi-tuner**, **ostris ai-toolkit**, and **Nerogar OneTrainer**.
**musubi-tuner** is the tool I’ve used the most and for the longest time. When I trained FLUX1 LoRAs before, I was using kohya-ss sd-scripts. Because I was already familiar with that workflow, musubi-tuner became my first choice when I started training Z Image, and it’s still my preferred tool right now.
I like it not only because the training results are solid, but also because the setup and workflow are very simple. The author also provides clear and concise documentation that covers what you actually need. In terms of documentation quality, I think musubi-tuner does the best job among the three.
One drawback is that LoRAs trained with musubi-tuner need to **be converted to another format**, otherwise ComfyUI will throw backend errors. The errors in ComfyUI do not affect actual usage, but I’m not sure whether similar or more serious issues might appear in other clients. To avoid potential compatibility problems across different users and software, all the LoRAs I release are converted versions.
However, the conversion process can cause some loss in performance. For ZIT LoRAs in particular, conversion **can sometimes reduce image quality**. For custom client work, I provide both the converted and unconverted versions, and I recommend using the original unconverted version if it works properly in their setup.
Because of the issue mentioned above, I currently train ZIT LoRAs using **ostris ai-toolkit**. That’s actually the only scenario where I use ai-toolkit right now, and the results have been quite good.
ai-toolkit seems to have little to no official documentation. Maybe it’s simply because the tool is straightforward enough that detailed docs are not really necessary.
The setup and workflow in ai-toolkit are very simple, and all configuration options can be written directly in a config file. In this respect, it’s more convenient than musubi-tuner. Although musubi-tuner also supports config files, most training parameters still need to be set through the command line, so I usually have to modify two places each time. I’m not sure whether musubi-tuner can put everything into a config file, and to be honest I haven’t looked into that part of the documentation very carefully.
I only started using **Nerogar OneTrainer** in the past few days, and so far I’ve used it exclusively for training ZIB LoRAs. From what I can tell, it does have some noticeable advantages specifically for ZIB LoRA training. You can see more details in the image comparison section below. I’ll need more time with it before forming a fuller opinion.
The official documentation for Nerogar OneTrainer looks quite comprehensive, which is definitely a plus, but I haven’t had the chance to study it in depth yet.
I use all three training tools through the **command line on Ubuntu**. For me, the command line is simply easier and more convenient.
Aside from musubi-tuner, which does not have an official GUI, I feel the graphical interfaces of the other two still have **plenty of room for improvement** in both installation and usability. Nerogar OneTrainer in particular is not very friendly for users running a headless system, although that might just be my own setup or lack of experience.
If a graphical interface is really necessary, I think a **Web UI** is the best cross-platform solution. Gradio or Streamlit would both be solid choices. Even better would be a dedicated custom Web UI paired with a simple, general-purpose web server.
Previously, the official Z Image training tool, “Ztuner,” was said to be coming soon, but for some reason its release has been delayed. Maybe it will move forward after the Chinese New Year holiday period ends.
**If the official training tool cannot produce better results than the existing options, then in my opinion the issue would likely be with ZIB itself rather than the tools.**
## II. A Quick Take on Z Image Style LoRAs
When it comes to how closely the generated results resemble the target, users clearly hold different expectations for style LoRAs and character LoRAs. The standards for style LoRAs tend to be much more relaxed, and sometimes there isn’t even a clear reference to compare against. Even if the output does not closely match the training images, it can still be considered a strong and widely accepted LoRA.
**To some extent, this makes style LoRAs relatively easier to create.**
Both ZIB and ZIT perform very well when it comes to style LoRAs.
Most of the Z Image style LoRAs I’ve made have already been released for free on <a href="https://sololo.xyz">sololo.xyz</a> and on Civitai. Most of them include both ZIB and ZIT versions. From my experience, both ZIB and ZIT handle style LoRAs quite well.
In most cases, ZIT LoRAs work best when the strength is reduced to **around 0.7**, which helps avoid an overly strong effect. ZIT tends to lean heavily toward realism, and this often partially offsets more stylized or non-realistic elements, resulting in a softer and often visually pleasing “mutated” style. This kind of offsetting effect also exists in FLUX1, and in fact it is even more pronounced there than in ZIT.
ZIB clearly shows **stronger learning capability** than ZIT and can more easily produce results that are closer to the training material. In general, the ZIB versions of style LoRAs also tend to perform well when used with ZIT.
At the moment, my ZIB training settings are still more suited for character LoRAs. When I have time, I plan to further adjust the ZIB settings for style LoRAs so that they can better capture stylistic features while minimizing the learning of specific facial characteristics.
## III. ZIT Character LoRAs
I trained a number of ZIT character LoRAs, including a small set based on real people. All of them are available for free download on <a href="https://sololo.xyz">sololo.xyz</a>.
At this point, I’ve basically stopped training real-person LoRAs on ZIT for two main reasons. First, the similarity and stability of the results are not quite where I want them to be. Second, I’ve been using ai-toolkit for ZIT training, but for some reason I just don’t feel motivated to invest more time and effort into that workflow.
For now, I mainly use ai-toolkit with ZIT to train style LoRAs and virtual character LoRAs where strict likeness is not a priority, and the results have been quite good.
When I have time in the future, I might try testing how OneTrainer performs with ZIT.
## IV. LoRAs Used Below: Basic Info
In the comparison images below, the four LoRAs labeled **v2**, **v3**, **v4**, and **Third-party** are all trained on ZIB. Among them, v2, v3, and v4 were created by me, while the Third-party LoRA was made and publicly released by someone else.
The Third-party LoRA uses different training datasets and settings from mine. It is included here **not for judging quality, but simply as a reference within the corresponding training tool group**. <span class="emp-y">This post does not present any evaluation or judgment regarding the Third-party LoRA.</span>
v2, v3, and v4 have already been released for free on <a href="https://sololo.xyz">sololo.xyz</a>. Feel free to download and try them out.
v2: https://sololo.xyz/model/620-anna-kendrick-SoloLoRA
v3: https://sololo.xyz/model/621-anna-kendrick-SoloLoRA
v4: https://sololo.xyz/model/622-anna-kendrick-SoloLoRA
Other basic details are as follows:
<table style="width: 100%; text-align: center; ">
<thead>
<tr>
<th style="text-align: center; background-color: var(--bg-header); ">Item</th>
<th style="text-align: center; background-color: var(--db);">v2</th>
<th style="text-align: center; background-color: var(--db);">v3</th>
<th style="text-align: center; background-color: var(--dy);">v4</th>
<th style="text-align: center; background-color: var(--dy);">Third-party</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center; background-color: var(--bg-header); "><b>Training Tool</b></td>
<td colspan="2" style="text-align: center; background-color: var(--bg-span);">Kohya-ss Musubi-tuner</td>
<td colspan="2" style="text-align: center; background-color: var(--bg-span);">Nerogar OneTrainer</td>
</tr>
<tr>
<td style="text-align: center; background-color: var(--bg-header); "><b>Optimizer</b></td>
<td colspan="4" style="text-align: center; background-color: var(--bg-span);">Prodigy_adv</td>
</tr>
<tr>
<td style="text-align: center; background-color: var(--bg-header); "><b>File Size</b></td>
<td colspan="2" style="text-align: center; background-color: var(--bg-span);">33.4 MB</td>
<td colspan="2" style="text-align: center; background-color: var(--bg-span);">66.8 MB</td>
</tr>
</tbody>
</table>
v2, v3, and v4 were all trained on the same dataset, with roughly similar training settings. Because different training scripts use different configuration options, v4 could not be matched exactly to the settings used for the first two. The Third-party LoRA is said to have used the Prodigy_adv optimizer as well.
v2 uses the training settings I refined through repeated testing over the past period. It represents a fairly standard ZIB LoRA training workflow using kohya-ss musubi-tuner. At the moment, this is about as far as I can push the results. There are also several other ZIB LoRAs on <a href="https://sololo.xyz">sololo.xyz</a> that were trained with settings largely similar to v2.
v3 is something I’ve been experimenting with over the past couple of days. It’s still not very stable and shows some degree of overfitting. The failed example shown at the end of this post was trained using the same method as v3.
## V. How ZIB LoRAs Perform on ZIT
Here are the main takeaways:
1. <span class="emp-y">There is definitely some loss in performance.</span>
2. Increasing the strength moderately can noticeably improve the loss in quality.
3. **LoRAs trained with OneTrainer show better stability on ZIT.**
4. <span class="emp-dy">When the strength is set to 1, LoRAs trained with OneTrainer clearly perform better than those trained with musubi-tuner.</span> I’m not yet sure whether this difference comes from the training parameters or from the tools themselves.

**As shown clearly in the images above, performance on ZIB is noticeably better than on ZIT.** So even for ZIB LoRAs trained with OneTrainer, ZIB remains the best platform to use them on.
v2 went through repeated tuning and optimization over the past period, and both its performance and stability are quite solid.
v3 is clearly overfitted. This comes from the training approach itself, and it can likely be improved with some straightforward adjustments.
v4 also shows some overfitting due to relatively aggressive settings. When the strength reaches 1.5, the image quality breaks down noticeably.
The Third-party LoRA shows the most stable performance on ZIT, which suggests its parameter setup is more balanced than v4.
One more thing I’ve been consistently complaining about with ZIT is the <span class="emp-dr">"dirty face"</span> issue, which is also quite obvious in the images above. Of course, some people see it as added realism, but personally I just don’t like that kind of unclean look on faces.
## VI. Is OneTrainer Actually Better?
Since I’ve only been using it for a short time, aside from the fact that ZIB LoRAs trained with it seem to perform better on ZIT, I still can’t say for sure whether OneTrainer performs better than musubi-tuner on ZIB itself. The training speed of the two is quite similar, and both offer a large number of configurable parameters. OneTrainer even seems to have more options overall. The moment I opened its long config file, I honestly felt a bit overwhelmed.
Whether the differences I’m seeing come from the parameters or from the tools themselves, the training results from OneTrainer and musubi-tuner do feel **noticeably different**. As for whether one is clearly better, I’ll need to spend more time using them before I can say.
In this set of images, the OneTrainer group appears to have a slight advantage over the musubi-tuner group.

<br>
In this set, the musubi-tuner group seems to perform slightly better.

<br>
For this set, honestly… I can’t really tell either.

<br>
If you stare at tons of images of the same person for long enough like I did, it honestly becomes really hard to tell which one is better. 😵💫💫🌀
The Third-party LoRA shows solid stability.
However, ZIB LoRAs currently have one obvious issue, which I’ll mention next.
## VII. Issues with Facial Consistency and Stability
The two most commonly criticized issues with ZIB LoRAs right now are, **first**, that the character **does not resemble** the target well enough, and **second**, that performance on ZIT is **not ideal**. Both of these ultimately come down to problems with facial consistency and stability. I’ve already discussed performance on ZIT above, so here I’ll focus on ZIB itself.
To start with, I personally have never encountered a situation in ZIB LoRA training where the model simply refuses to converge even after many steps. This may be related to the fact that I almost never use the **AdamW8bit** optimizer. However, I have run into the other two issues that many people have mentioned, and so far I have not found a solution for them. Those issues are:
### A. It Always Feels Like Something Is Still Missing
The generated results already look clearly similar to the target person, but it always feels like they are just a little bit off. This is a problem I still haven’t solved, and I also haven’t seen other people publicly release any LoRAs that solve it well.
If you look back at the comparison images above, you can probably notice that they all show this issue to some extent.
Of course, as I mentioned at the beginning, likeness is a highly subjective and personal judgment, so others may feel differently about it.
### B. Facial Instability
If we treat 90 points as a passing score, most results from the LoRA fall somewhere between 80 and 90. Occasionally they even drop below 80, and only in rare cases do they exceed 90.
This kind of instability is related to many factors, including the dataset, how well the training converges, the prompt, and whether the generated image is a close-up face, a half-body shot, or a full-body shot.
First, compared to FLUX1, ZIB is **more sensitive** to the **dataset**. A dataset that produced good training results on FLUX1 does not necessarily achieve the same success on ZIB.
When I reused some of my earlier datasets for ZIB training, especially those from when I first started training FLUX1 LoRAs, the results were clearly not very good.
Second, both ZIB and ZIT are quite **sensitive to prompts**, and certain prompts can cause the generated facial features to deviate significantly from the target person. Sometimes prompts like **“black hair”** or **“black eyes”** can push the face toward East Asian features. In those cases, adding terms such as **“Asian”** or **“Chinese”** at the beginning of the negative prompt can usually fix the issue.
However, there are also situations where it is unclear which part of the prompt is causing the problem, or what kind of negative prompt would resolve it. When that happens, the only practical solution is often to replace the entire prompt.
I think this is related to the fact that ZIB was **trained on a large amount of Chinese and East Asian data**, <span class="emp-dr">and this may be one of the core reasons behind many of the current LoRA issues.</span> In a way, this is quite normal. Similar situations have **happened before** with FLUX and SD, just in the opposite direction. I clearly remember that the original SD models simply could not generate natural-looking East Asian faces, and training LoRAs for East Asian subjects with FLUX1 was noticeably more difficult than for Western subjects.
<span class="emp-y">Because of this, I suspect that training LoRAs for East Asian subjects on ZIB should actually be easier.</span> I haven’t tested this yet, but I’m currently preparing several East Asian datasets and should be able to verify this soon. It’s even possible that the issue described in section A comes from this underlying bias. If that is the case, then even the official training tool may not be able to solve the problem, because it would be something fundamentally built into ZIB. If this guess turns out to be correct, then the only real solution might be a heavily modified derivative model, similar to what happened in the SD1.5 era with **ChilloutMix**.
Finally, when generating **full-body** images, facial features tend to become distorted. This is not a new issue. It has existed from SD to FLUX. I’ve found that optimizing the dataset can improve it to some extent, but it cannot be completely resolved. This may simply result from the core issue mentioned earlier being compounded by the fact that full-body compositions naturally weaken facial features.


These two image sets above clearly show the issue of facial instability. It is not related to which training tool was used.
## VIII. Likeness Is a Subjective Feeling
As I write this, I suddenly remembered an issue that isn’t really related to training itself, which is the subjective difference in judging likeness. Not only do different people have noticeably different opinions on whether the same LoRA resembles the target, but **even the same person** can judge the same LoRA differently at different times. I often find that after sleeping on it, when I look at a LoRA I made yesterday, it feels very different from when I first tested it. Sometimes it seemed unlike the target during testing, but the next day it seems closer, and other times the opposite happens.
Many of my LoRA themes or characters date back to the SD1.5 era. For example, the one shown in the sample images above. On <a href="https://sololo.xyz">sololo.xyz</a>, you can see the same character’s SD1.5 TI, FLUX1 LoRA, and now the current v2, v3, and v4. I’ve noticed that my standards for likeness have gradually increased over time. The models themselves, from SD and FLUX1 to Z Image, have become more capable of realistic reproduction. At the same time, I have changed compared to when I first started experimenting a few years ago.
The TI I made back in the SD1.5 days seemed pretty good at the time, and the FLUX1 LoRAs also seemed decent back then. But looking at them today, most of them are not really impressive, and some do not even meet my current minimum standards.

Regardless of how my perception shifts, I always stick to two core principles when creating real-person LoRAs: first, it must be <span class="emp-dy">accurate</span>; second, it must be <span class="emp-dy">beautiful</span>.
As for what defines "accuracy" or "beauty," everyone has their own interpretation. There is no need to force a consensus on something so personal.
## IX. A Note on LoHa and LoKr
The terms **LoHa** and **LoKr** have been around for quite a while. It wasn’t until a few days ago, when musubi-tuner was updated to support LoHa and LoKr for ZIB, that I tried making a few LoKr files for the first time.
What I like most about LoKr is that it can make files **very small**. At first, because I wasn’t familiar with LoKr’s parameters, I tried using the same settings I would for a LoRA. The result was LoKr files only a few MB in size, but the generated images turned out much better than I expected. It immediately reminded me of the layered training I used with FLUX1 LoRAs. Back then, training only a few layers could still give good results, and I even made a few real-person LoRAs that were only a few MB in size, which worked well and were published on Civitai.
However, aside from producing small files, LoKr hasn’t really impressed me in other ways. I haven’t found any other clear advantages, and in terms of likeness or stability, it doesn’t show any obvious improvement over regular LoRAs. Considering that these concepts have been around for quite a while and haven’t become mainstream, the lack of obvious performance advantages is probably part of the reason. I’ll try experimenting more when I have time.
## X. Odd Problems in the v3 Series
As I mentioned earlier, v3 is something I’ve been experimenting with over the past few days, and it’s still unstable. I’ve tried a few other LoRAs using a similar approach, but I’ve run into a problem I haven’t solved yet: the sample images during training look better than the results you get when actually using the LoRA. See the images below for an example.

Seeing the sample images generated during training got me a little excited, and I thought the experiment had succeeded. I didn’t expect that when I actually used it in ComfyUI, the results would turn out like the messy one on the right. I tried adjusting various generation parameters repeatedly, but I still couldn’t get normal images.
In the past, the usual situation was that sample images during training looked worse than or about the same as the results in testing. This complete reversal is the first time I’ve encountered it. I know part of the reason is overcooking, but that’s not the whole cause, and it’s not even the main reason. I’ll need to spend more time testing. If I can get it to work, it would be a small breakthrough in my ZIB real-person LoRA training. For now, it’s something I can see but not quite reach.
OK, that’s a lot of rambling for now. Thanks to everyone who took the time to read through it.
I’ll continue my **_Training Notes_** next time when I have new discoveries or ideas.
Thank you! 😊🙏