11# WD LLM Caption Cli
22
3- A Python base cli tool for caption images
3+ A Python base cli tool and a simple gradio GUI for caption images
44with [ WD series] ( https://huggingface.co/SmilingWolf ) , [ joy-caption-pre-alpha] ( https://huggingface.co/spaces/fancyfeast/joy-caption-pre-alpha ) , [ LLama3.2 Vision Instruct] ( https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct ) ,
55[ Qwen2 VL Instruct] ( https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct ) , [ Mini-CPM V2.6] ( https://huggingface.co/openbmb/MiniCPM-V-2_6 )
6- and [ Florence-2] ( https://huggingface.co/microsoft/Florence-2-large ) models.
6+ and [ Florence-2] ( https://huggingface.co/microsoft/Florence-2-large ) models.
7+
8+ <img alt =" DEMO_her.jpg " src =" DEMO/DEMO_GUI.png " width =" 700 " />
79
810## Introduce
911
@@ -12,18 +14,23 @@ This tool can make a caption with danbooru style tags or a nature language descr
1214
1315### New Changes:
1416
15- #### 2024.10.13: Add Florence2 Support. Now LLM will use own default generate params while ` --llm_temperature ` and
16- ` --llm_max_tokens ` are 0.
17+ 2024.10.19: Add option to save WD tags and LLM Captions in one file.(Only support CLI mode or GUI batch mode.)
18+
19+ 2024.10.18: Add Joy Caption Alpha One, Joy-Caption Alpha Two, Joy-Caption Alpha Two Llava Support.
20+ GUI support Joy formated prompt inputs (Only for Joy-Caption Alpha Two, Joy-Caption Alpha Two Llava).
21+
22+ 2024.10.13: Add Florence2 Support.
23+ Now LLM will use own default generate params while ` --llm_temperature ` and ` --llm_max_tokens ` are 0.
1724
18- #### 2024.10.11: GUI using Gradio 5 now. Add Mini-CPM V2.6 Support.
25+ 2024.10.11: GUI using Gradio 5 now. Add Mini-CPM V2.6 Support.
1926
20- #### 2024.10.09: Build in wheel, now you install this repo from pypi.
27+ 2024.10.09: Build in wheel, now you can install this repo from pypi.
2128
2229``` shell
2330# Install torch base on your GPU driver. e.g.
24- pip install torch==2.4.1 --index-url https://download.pytorch.org/whl/cu124
31+ pip install torch==2.5.0 --index-url https://download.pytorch.org/whl/cu124
2532# Install via pip from pypi
26- pip install wd_llm_caption
33+ pip install wd-llm-caption
2734# For CUDA 11.8
2835pip install -U -r requirements_onnx_cu118.txt
2936# For CUDA 12.X
@@ -34,15 +41,13 @@ wd-llm-caption --data_path your_data_path
3441wd-llm-caption-gui
3542```
3643
37- #### 2024.10.04: Add Qwen2 VL support.
44+ 2024.10.04: Add Qwen2 VL support.
3845
39- #### 2024.09.30: A simple gui run through gradio now😊
40-
41- <img alt =" DEMO_her.jpg " src =" DEMO/DEMO_GUI.png " width =" 300 " />
46+ 2024.09.30: A simple gui run through gradio now😊
4247
4348## Example
4449
45- <img alt =" DEMO_her.jpg " src =" DEMO/DEMO_her.jpg " width =" 300 " height =" 400 " />
50+ <img alt =" DEMO_her.jpg " src =" DEMO/DEMO_her.jpg " width =" 600 " height =" 800 " />
4651
4752### Standalone Inference
4853
@@ -167,12 +172,16 @@ place).
167172
168173### Joy Caption models
169174
170- | Model | Hugging Face Link | ModelScope Link |
171- | :---------------------------------:| :---------------------------------------------------------------------------------:| :------------------------------------------------------------------------------------------:|
172- | joy-caption-pre-alpha | [ Hugging Face] ( https://huggingface.co/spaces/fancyfeast/joy-caption-pre-alpha ) | [ ModelScope] ( https://www.modelscope.cn/models/fireicewolf/joy-caption-pre-alpha ) |
173- | siglip-so400m-patch14-384(Google) | [ Hugging Face] ( https://huggingface.co/google/siglip-so400m-patch14-384 ) | [ ModelScope] ( https://www.modelscope.cn/models/fireicewolf/siglip-so400m-patch14-384 ) |
174- | Meta-Llama-3.1-8B | [ Hugging Face] ( https://huggingface.co/meta-llama/Meta-Llama-3.1-8B ) | [ ModelScope] ( https://www.modelscope.cn/models/fireicewolf/Meta-Llama-3.1-8B ) |
175- | Llama-3.1-8B-Lexi-Uncensored-V2 | [ Hugging Face] ( https://huggingface.co/Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2 ) | [ ModelScope] ( https://www.modelscope.cn/models/fireicewolf/Llama-3.1-8B-Lexi-Uncensored-V2 ) |
175+ | Model | Hugging Face Link | ModelScope Link |
176+ | :----------------------------------:| :-------------------------------------------------------------------------------------:| :----------------------------------------------------------------------------------------------:|
177+ | joy-caption-pre-alpha | [ Hugging Face] ( https://huggingface.co/spaces/fancyfeast/joy-caption-pre-alpha ) | [ ModelScope] ( https://www.modelscope.cn/models/fireicewolf/joy-caption-pre-alpha ) |
178+ | Joy-Caption-Alpha-One | [ Hugging Face] ( https://huggingface.co/spaces/fancyfeast/joy-caption-alpha-one ) | [ ModelScope] ( https://www.modelscope.cn/models/fireicewolf/joy-caption-alpha-one ) |
179+ | Joy-Caption-Alpha-Two | [ Hugging Face] ( https://huggingface.co/spaces/fancyfeast/joy-caption-alpha-two ) | [ ModelScope] ( https://www.modelscope.cn/models/fireicewolf/joy-caption-alpha-two ) |
180+ | Joy-Caption-Alpha-Two-Llava | [ Hugging Face] ( https://huggingface.co/fancyfeast/llama-joycaption-alpha-two-hf-llava ) | [ ModelScope] ( https://www.modelscope.cn/models/fireicewolf/llama-joycaption-alpha-two-hf-llava ) |
181+ | siglip-so400m-patch14-384(Google) | [ Hugging Face] ( https://huggingface.co/google/siglip-so400m-patch14-384 ) | [ ModelScope] ( https://www.modelscope.cn/models/fireicewolf/siglip-so400m-patch14-384 ) |
182+ | Meta-Llama-3.1-8B | [ Hugging Face] ( https://huggingface.co/meta-llama/Meta-Llama-3.1-8B ) | [ ModelScope] ( https://www.modelscope.cn/models/fireicewolf/Meta-Llama-3.1-8B ) |
183+ | unsloth/Meta-Llama-3.1-8B-Instruct | [ Hugging Face] ( https://huggingface.co/unsloth/Meta-Llama-3.1-8B-Instruct ) | [ ModelScope] ( https://www.modelscope.cn/models/fireicewolf/unsloth-Meta-Llama-3.1-8B-Instruct ) |
184+ | Llama-3.1-8B-Lexi-Uncensored-V2 | [ Hugging Face] ( https://huggingface.co/Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2 ) | [ ModelScope] ( https://www.modelscope.cn/models/fireicewolf/Llama-3.1-8B-Lexi-Uncensored-V2 ) |
176185
177186### Llama 3.2 Vision Instruct models
178187
@@ -221,7 +230,7 @@ python -m venv .venv
221230
222231# Install torch
223232# Install torch base on your GPU driver. e.g.
224- pip install torch==2.4.1 --index-url https://download.pytorch.org/whl/cu124
233+ pip install torch==2.5.0 --index-url https://download.pytorch.org/whl/cu124
225234
226235# Base dependencies, models for inference will download via python request libs.
227236# For WD Caption
@@ -376,6 +385,19 @@ if `queue`, all images will caption with wd models first,
376385then caption all of them with joy models while wd captions in joy user prompt.
377386default is ` sync ` .
378387
388+ ` --caption_extension `
389+
390+ extension of caption file, default is ` .txt ` .
391+ If ` caption_method ` not ` wd+llm ` , it will be wd or llm caption file extension.
392+
393+ ` --save_caption_together `
394+
395+ Save WD tags and LLM captions in one file.
396+
397+ ` --save_caption_together_seperator `
398+
399+ Seperator between WD and LLM captions, if they are saved in one file.
400+
379401` --image_size `
380402
381403resize image to suitable, default is ` 1024 ` .
@@ -481,15 +503,15 @@ load joy models use cpu.
481503
482504` --llm_llm_dtype `
483505
484- choice joy llm load dtype[ ` auto ` , ` fp16 ` , ` bf16", ` fp32` ], default is ` auto `.
506+ choice joy llm load dtype[ ` fp16 ` , ` bf16", ` fp32` ], default is ` fp16 `.
485507
486508` --llm_llm_qnt `
487509
488510Enable quantization for joy llm [ ` none ` ,` 4bit ` , ` 8bit ` ] . default is ` none ` .
489511
490512` --llm_caption_extension `
491513
492- extension of caption file, default is ` .txt `
514+ extension of caption file, default is ` .llmcaption `
493515
494516` --llm_read_wd_caption `
495517
@@ -516,7 +538,7 @@ max tokens for LLM model output, default is `0`, means use llm own default value
516538## Credits
517539
518540Base
519- on [ SmilingWolf/wd-tagger] ( https://huggingface.co/spaces/SmilingWolf/wd-tagger/blob/main/app.py ) , [ joy-caption-pre-alpha ] ( https://huggingface.co/spaces/ fancyfeast/joy-caption-pre-alpha ) , [ meta-llama/Llama-3.2-11B-Vision-Instruct] ( https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct ) ,
520- [ Qwen/Qwen2-VL-7B-Instruct] ( https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct )
521- and [ Mini-CPM V2.6 ] ( https://huggingface.co/openbmb/MiniCPM-V-2_6 )
541+ on [ SmilingWolf/wd-tagger models ] ( https://huggingface.co/spaces/SmilingWolf/wd-tagger/blob/main/app.py ) , [ fancyfeast/ joy-caption models ] ( https://huggingface.co/fancyfeast ) , [ meta-llama/Llama-3.2-11B-Vision-Instruct] ( https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct ) ,
542+ [ Qwen/Qwen2-VL-7B-Instruct] ( https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct ) , [ openbmb/Mini-CPM V2.6 ] ( https://huggingface.co/openbmb/MiniCPM-V-2_6 )
543+ and [ microsoft/florence2 ] ( https://huggingface.co/collections/microsoft/florence-6669f44df0d87d9c3bfb76de ) .
522544Without their works(👏👏), this repo won't exist.
0 commit comments