Loading article...

Training Notes 2: Z Image LoRA - Phase Recap

solo Feb 24, 2026 Notes Original content from sololo.xyz. Please credit the source when reposting: https://sololo-au8.pages.dev/article/22-training-notes-2-z-image-lora-phase-recap

Recently, most of my time and energy have been focused on experimenting with Z Image LoRA. In the 20+ days since my last Training Note, I’ve completed more than 80 additional Z Image LoRA training runs. Besides my own trial-and-error exploration, I’ve also learned from experiences shared by others in the community. Here I’m putting together some of my recent observations and takeaways as a brief phase recap.

## Disclaimer：
1. I’m not a professional, and this is not a rigorous or scientific study. It’s simply a record of my personal impressions and experiences.
2. Unless otherwise specified, all LoRAs mentioned below refer to real-person character LoRAs.
3. How good a LoRA is, or how closely it resembles the target person, is highly subjective and personal. People can have very different opinions, sometimes even completely opposite ones. That’s totally normal. If your view differs from mine on this, that’s perfectly fine. You are right, there is no need to argue with me.

Note: All images in this post can be opened in a new tab via right-click to view the original full-size version.

---

## I. Training Tools
So far, the training tools I’ve used include kohya-ss musubi-tuner, ostris ai-toolkit, and Nerogar OneTrainer.

musubi-tuner is the tool I’ve used the most and for the longest time. When I trained FLUX1 LoRAs before, I was using kohya-ss sd-scripts. Because I was already familiar with that workflow, musubi-tuner became my first choice when I started training Z Image, and it’s still my preferred tool right now.

I like it not only because the training results are solid, but also because the setup and workflow are very simple. The author also provides clear and concise documentation that covers what you actually need. In terms of documentation quality, I think musubi-tuner does the best job among the three.

One drawback is that LoRAs trained with musubi-tuner need to be converted to another format, otherwise ComfyUI will throw backend errors. The errors in ComfyUI do not affect actual usage, but I’m not sure whether similar or more serious issues might appear in other clients. To avoid potential compatibility problems across different users and software, all the LoRAs I release are converted versions.

However, the conversion process can cause some loss in performance. For ZIT LoRAs in particular, conversion can sometimes reduce image quality. For custom client work, I provide both the converted and unconverted versions, and I recommend using the original unconverted version if it works properly in their setup.

Because of the issue mentioned above, I currently train ZIT LoRAs using ostris ai-toolkit. That’s actually the only scenario where I use ai-toolkit right now, and the results have been quite good.

ai-toolkit seems to have little to no official documentation. Maybe it’s simply because the tool is straightforward enough that detailed docs are not really necessary.

The setup and workflow in ai-toolkit are very simple, and all configuration options can be written directly in a config file. In this respect, it’s more convenient than musubi-tuner. Although musubi-tuner also supports config files, most training parameters still need to be set through the command line, so I usually have to modify two places each time. I’m not sure whether musubi-tuner can put everything into a config file, and to be honest I haven’t looked into that part of the documentation very carefully.

I only started using Nerogar OneTrainer in the past few days, and so far I’ve used it exclusively for training ZIB LoRAs. From what I can tell, it does have some noticeable advantages specifically for ZIB LoRA training. You can see more details in the image comparison section below. I’ll need more time with it before forming a fuller opinion.

The official documentation for Nerogar OneTrainer looks quite comprehensive, which is definitely a plus, but I haven’t had the chance to study it in depth yet.

I use all three training tools through the command line on Ubuntu. For me, the command line is simply easier and more convenient.

Aside from musubi-tuner, which does not have an official GUI, I feel the graphical interfaces of the other two still have plenty of room for improvement in both installation and usability. Nerogar OneTrainer in particular is not very friendly for users running a headless system, although that might just be my own setup or lack of experience.

If a graphical interface is really necessary, I think a Web UI is the best cross-platform solution. Gradio or Streamlit would both be solid choices. Even better would be a dedicated custom Web UI paired with a simple, general-purpose web server.

Previously, the official Z Image training tool, “Ztuner,” was said to be coming soon, but for some reason its release has been delayed. Maybe it will move forward after the Chinese New Year holiday period ends.

If the official training tool cannot produce better results than the existing options, then in my opinion the issue would likely be with ZIB itself rather than the tools.

## II. A Quick Take on Z Image Style LoRAs
When it comes to how closely the generated results resemble the target, users clearly hold different expectations for style LoRAs and character LoRAs. The standards for style LoRAs tend to be much more relaxed, and sometimes there isn’t even a clear reference to compare against. Even if the output does not closely match the training images, it can still be considered a strong and widely accepted LoRA.
To some extent, this makes style LoRAs relatively easier to create.

Both ZIB and ZIT perform very well when it comes to style LoRAs.
Most of the Z Image style LoRAs I’ve made have already been released for free on sololo.xyz and on Civitai. Most of them include both ZIB and ZIT versions. From my experience, both ZIB and ZIT handle style LoRAs quite well.

In most cases, ZIT LoRAs work best when the strength is reduced to around 0.7, which helps avoid an overly strong effect. ZIT tends to lean heavily toward realism, and this often partially offsets more stylized or non-realistic elements, resulting in a softer and often visually pleasing “mutated” style. This kind of offsetting effect also exists in FLUX1, and in fact it is even more pronounced there than in ZIT.

ZIB clearly shows stronger learning capability than ZIT and can more easily produce results that are closer to the training material. In general, the ZIB versions of style LoRAs also tend to perform well when used with ZIT.

At the moment, my ZIB training settings are still more suited for character LoRAs. When I have time, I plan to further adjust the ZIB settings for style LoRAs so that they can better capture stylistic features while minimizing the learning of specific facial characteristics.

## III. ZIT Character LoRAs
I trained a number of ZIT character LoRAs, including a small set based on real people. All of them are available for free download on sololo.xyz.

At this point, I’ve basically stopped training real-person LoRAs on ZIT for two main reasons. First, the similarity and stability of the results are not quite where I want them to be. Second, I’ve been using ai-toolkit for ZIT training, but for some reason I just don’t feel motivated to invest more time and effort into that workflow.

For now, I mainly use ai-toolkit with ZIT to train style LoRAs and virtual character LoRAs where strict likeness is not a priority, and the results have been quite good.

When I have time in the future, I might try testing how OneTrainer performs with ZIT.

## IV. LoRAs Used Below: Basic Info
In the comparison images below, the four LoRAs labeled v2, v3, v4, and Third-party are all trained on ZIB. Among them, v2, v3, and v4 were created by me, while the Third-party LoRA was made and publicly released by someone else.

The Third-party LoRA uses different training datasets and settings from mine. It is included here not for judging quality, but simply as a reference within the corresponding training tool group. This post does not present any evaluation or judgment regarding the Third-party LoRA.

v2, v3, and v4 have already been released for free on sololo.xyz. Feel free to download and try them out.

v2: https://sololo.xyz/model/620-anna-kendrick-SoloLoRA

v3: https://sololo.xyz/model/621-anna-kendrick-SoloLoRA

v4: https://sololo.xyz/model/622-anna-kendrick-SoloLoRA

Other basic details are as follows:

Item	v2	v4
Training Tool	Kohya-ss Musubi-tuner	Nerogar OneTrainer
Optimizer	Prodigy_adv
File Size	33.4 MB	66.8 MB

v2, v3, and v4 were all trained on the same dataset, with roughly similar training settings. Because different training scripts use different configuration options, v4 could not be matched exactly to the settings used for the first two. The Third-party LoRA is said to have used the Prodigy_adv optimizer as well.

v2 uses the training settings I refined through repeated testing over the past period. It represents a fairly standard ZIB LoRA training workflow using kohya-ss musubi-tuner. At the moment, this is about as far as I can push the results. There are also several other ZIB LoRAs on sololo.xyz that were trained with settings largely similar to v2.

v3 is something I’ve been experimenting with over the past couple of days. It’s still not very stable and shows some degree of overfitting. The failed example shown at the end of this post was trained using the same method as v3.

## V. How ZIB LoRAs Perform on ZIT
Here are the main takeaways:
1. There is definitely some loss in performance.
2. Increasing the strength moderately can noticeably improve the loss in quality.
3. LoRAs trained with OneTrainer show better stability on ZIT.
4. When the strength is set to 1, LoRAs trained with OneTrainer clearly perform better than those trained with musubi-tuner. I’m not yet sure whether this difference comes from the training parameters or from the tools themselves.

Z Image sample

As shown clearly in the images above, performance on ZIB is noticeably better than on ZIT. So even for ZIB LoRAs trained with OneTrainer, ZIB remains the best platform to use them on.

v2 went through repeated tuning and optimization over the past period, and both its performance and stability are quite solid.

v3 is clearly overfitted. This comes from the training approach itself, and it can likely be improved with some straightforward adjustments.

v4 also shows some overfitting due to relatively aggressive settings. When the strength reaches 1.5, the image quality breaks down noticeably.

The Third-party LoRA shows the most stable performance on ZIT, which suggests its parameter setup is more balanced than v4.

One more thing I’ve been consistently complaining about with ZIT is the "dirty face" issue, which is also quite obvious in the images above. Of course, some people see it as added realism, but personally I just don’t like that kind of unclean look on faces.

## VI. Is OneTrainer Actually Better?
Since I’ve only been using it for a short time, aside from the fact that ZIB LoRAs trained with it seem to perform better on ZIT, I still can’t say for sure whether OneTrainer performs better than musubi-tuner on ZIB itself. The training speed of the two is quite similar, and both offer a large number of configurable parameters. OneTrainer even seems to have more options overall. The moment I opened its long config file, I honestly felt a bit overwhelmed.

Whether the differences I’m seeing come from the parameters or from the tools themselves, the training results from OneTrainer and musubi-tuner do feel noticeably different. As for whether one is clearly better, I’ll need to spend more time using them before I can say.

In this set of images, the OneTrainer group appears to have a slight advantage over the musubi-tuner group.
ZIB comparison 1

In this set, the musubi-tuner group seems to perform slightly better.
ZIB comparison 2

For this set, honestly… I can’t really tell either.
ZIB comparison 3

If you stare at tons of images of the same person for long enough like I did, it honestly becomes really hard to tell which one is better. 😵‍💫💫🌀

The Third-party LoRA shows solid stability.

However, ZIB LoRAs currently have one obvious issue, which I’ll mention next.

## VII. Issues with Facial Consistency and Stability
The two most commonly criticized issues with ZIB LoRAs right now are, first, that the character does not resemble the target well enough, and second, that performance on ZIT is not ideal. Both of these ultimately come down to problems with facial consistency and stability. I’ve already discussed performance on ZIT above, so here I’ll focus on ZIB itself.

To start with, I personally have never encountered a situation in ZIB LoRA training where the model simply refuses to converge even after many steps. This may be related to the fact that I almost never use the AdamW8bit optimizer. However, I have run into the other two issues that many people have mentioned, and so far I have not found a solution for them. Those issues are:
### A. It Always Feels Like Something Is Still Missing
The generated results already look clearly similar to the target person, but it always feels like they are just a little bit off. This is a problem I still haven’t solved, and I also haven’t seen other people publicly release any LoRAs that solve it well.

If you look back at the comparison images above, you can probably notice that they all show this issue to some extent.
Of course, as I mentioned at the beginning, likeness is a highly subjective and personal judgment, so others may feel differently about it.

### B. Facial Instability
If we treat 90 points as a passing score, most results from the LoRA fall somewhere between 80 and 90. Occasionally they even drop below 80, and only in rare cases do they exceed 90.
This kind of instability is related to many factors, including the dataset, how well the training converges, the prompt, and whether the generated image is a close-up face, a half-body shot, or a full-body shot.

First, compared to FLUX1, ZIB is more sensitive to the dataset. A dataset that produced good training results on FLUX1 does not necessarily achieve the same success on ZIB.
When I reused some of my earlier datasets for ZIB training, especially those from when I first started training FLUX1 LoRAs, the results were clearly not very good.

Second, both ZIB and ZIT are quite sensitive to prompts, and certain prompts can cause the generated facial features to deviate significantly from the target person. Sometimes prompts like “black hair” or “black eyes” can push the face toward East Asian features. In those cases, adding terms such as “Asian” or “Chinese” at the beginning of the negative prompt can usually fix the issue.

However, there are also situations where it is unclear which part of the prompt is causing the problem, or what kind of negative prompt would resolve it. When that happens, the only practical solution is often to replace the entire prompt.

I think this is related to the fact that ZIB was trained on a large amount of Chinese and East Asian data, and this may be one of the core reasons behind many of the current LoRA issues. In a way, this is quite normal. Similar situations have happened before with FLUX and SD, just in the opposite direction. I clearly remember that the original SD models simply could not generate natural-looking East Asian faces, and training LoRAs for East Asian subjects with FLUX1 was noticeably more difficult than for Western subjects.

Because of this, I suspect that training LoRAs for East Asian subjects on ZIB should actually be easier. I haven’t tested this yet, but I’m currently preparing several East Asian datasets and should be able to verify this soon. It’s even possible that the issue described in section A comes from this underlying bias. If that is the case, then even the official training tool may not be able to solve the problem, because it would be something fundamentally built into ZIB. If this guess turns out to be correct, then the only real solution might be a heavily modified derivative model, similar to what happened in the SD1.5 era with ChilloutMix.

Finally, when generating full-body images, facial features tend to become distorted. This is not a new issue. It has existed from SD to FLUX. I’ve found that optimizing the dataset can improve it to some extent, but it cannot be completely resolved. This may simply result from the core issue mentioned earlier being compounded by the fact that full-body compositions naturally weaken facial features.

ZIB comparison 4

ZIB comparison 5

These two image sets above clearly show the issue of facial instability. It is not related to which training tool was used.

## VIII. Likeness Is a Subjective Feeling
As I write this, I suddenly remembered an issue that isn’t really related to training itself, which is the subjective difference in judging likeness. Not only do different people have noticeably different opinions on whether the same LoRA resembles the target, but even the same person can judge the same LoRA differently at different times. I often find that after sleeping on it, when I look at a LoRA I made yesterday, it feels very different from when I first tested it. Sometimes it seemed unlike the target during testing, but the next day it seems closer, and other times the opposite happens.

Many of my LoRA themes or characters date back to the SD1.5 era. For example, the one shown in the sample images above. On sololo.xyz, you can see the same character’s SD1.5 TI, FLUX1 LoRA, and now the current v2, v3, and v4. I’ve noticed that my standards for likeness have gradually increased over time. The models themselves, from SD and FLUX1 to Z Image, have become more capable of realistic reproduction. At the same time, I have changed compared to when I first started experimenting a few years ago.

The TI I made back in the SD1.5 days seemed pretty good at the time, and the FLUX1 LoRAs also seemed decent back then. But looking at them today, most of them are not really impressive, and some do not even meet my current minimum standards.

ZIB comparison 6

Regardless of how my perception shifts, I always stick to two core principles when creating real-person LoRAs: first, it must be accurate; second, it must be beautiful.

As for what defines "accuracy" or "beauty," everyone has their own interpretation. There is no need to force a consensus on something so personal.

## IX. A Note on LoHa and LoKr
The terms LoHa and LoKr have been around for quite a while. It wasn’t until a few days ago, when musubi-tuner was updated to support LoHa and LoKr for ZIB, that I tried making a few LoKr files for the first time.

What I like most about LoKr is that it can make files very small. At first, because I wasn’t familiar with LoKr’s parameters, I tried using the same settings I would for a LoRA. The result was LoKr files only a few MB in size, but the generated images turned out much better than I expected. It immediately reminded me of the layered training I used with FLUX1 LoRAs. Back then, training only a few layers could still give good results, and I even made a few real-person LoRAs that were only a few MB in size, which worked well and were published on Civitai.

However, aside from producing small files, LoKr hasn’t really impressed me in other ways. I haven’t found any other clear advantages, and in terms of likeness or stability, it doesn’t show any obvious improvement over regular LoRAs. Considering that these concepts have been around for quite a while and haven’t become mainstream, the lack of obvious performance advantages is probably part of the reason. I’ll try experimenting more when I have time.

## X. Odd Problems in the v3 Series
As I mentioned earlier, v3 is something I’ve been experimenting with over the past few days, and it’s still unstable. I’ve tried a few other LoRAs using a similar approach, but I’ve run into a problem I haven’t solved yet: the sample images during training look better than the results you get when actually using the LoRA. See the images below for an example.

Seeing the sample images generated during training got me a little excited, and I thought the experiment had succeeded. I didn’t expect that when I actually used it in ComfyUI, the results would turn out like the messy one on the right. I tried adjusting various generation parameters repeatedly, but I still couldn’t get normal images.

In the past, the usual situation was that sample images during training looked worse than or about the same as the results in testing. This complete reversal is the first time I’ve encountered it. I know part of the reason is overcooking, but that’s not the whole cause, and it’s not even the main reason. I’ll need to spend more time testing. If I can get it to work, it would be a small breakthrough in my ZIB real-person LoRA training. For now, it’s something I can see but not quite reach.

OK, that’s a lot of rambling for now. Thanks to everyone who took the time to read through it.

I’ll continue my Training Notes next time when I have new discoveries or ideas.

Thank you! 😊🙏