Consistent portraits using IP-Adapters for SDXL

Getting consistent character portraits generated by SDXL has been a challenge... until now! ComfyUI IPAdapter Plus (dated 30 Dec 2023) now supports both IP-Adapter and IP-Adapter-FaceID (released 4 Jan 2024)!

I will be using the models for SDXL only, i.e. ip-adapter-plus-face_sdxl_vit-h and IP-Adapter-FaceID-SDXL below. From the respective documentation:

IP-Adapter:

ip-adapter_sdxl.bin: use global image embedding from OpenCLIP-ViT-bigG-14 as condition

ip-adapter_sdxl_vit-h.bin: same as ip-adapter_sdxl, but use OpenCLIP-ViT-H-14

ip-adapter-plus_sdxl_vit-h.bin: use patch image embeddings from OpenCLIP-ViT-H-14 as condition, closer to the reference image than ip-adapter_xl and ip-adapter_sdxl_vit-h

ip-adapter-plus-face_sdxl_vit-h.bin: same as ip-adapter-plus_sdxl_vit-h, but use cropped face image as condition

IP-Adapter-FaceID:

IP-Adapter-FaceID-PlusV2: face ID embedding (for face ID) + controllable CLIP image embedding (for face structure)

IP-Adapter-FaceID-SDXL: An experimental SDXL version of IP-Adapter-FaceID

Installing

Before you begin, make sure you install ComfyUI IPAdapter Plus - I used ComfyUI Manager.

First, get the CLIP Vision ViT-H image encoder models:

Download the ViT-H image encoder from CLIP-ViT-H-14-laion2B-s32B-b79K to ComfyUI/models/clip_vision/,and rename it to something sensible,
Optionally, download the ViT-bigG image encoder from CLIP-ViT-bigG-14-laion2B-39B-b160k and do the same.

And download the IPAdapter SDXL face model for CLIP Vision:

ip-adapter-plus-face_sdxl_vit-h to ComfyUI/models/ipadapter/

Then, for FaceID, first install InsightFace. I will only describe the steps for ComfyUI Portable with Python 3.11, per instructions here:

Download Insightface,

Then install it and the ONNX runtime for GPU and CPU respectively:

python_embeded\python.exe -m pip install insightface-0.7.3-cp311-cp311-win_amd64.whl
python_embeded\python.exe -m pip install onnxruntime-gpu onnxruntime

Note that:
- the first time InsightFace is run, it is supposed to download a buffalo_l model. If you encounter issues, download it manually to ComfyUI/models/insightface/models.
- some CUDA versions may not be compatible with the ONNX runtime, in that case, use the CPU model.

And download the IPAdapter FaceID models and LoRA for SDXL:

FaceID to ComfyUI/models/ipadapter (create this folder if necessary),
FaceID SDXL LoRA to ComfyUI/models/loras/.

Remember to re-start ComfyUI!

Workflow

A pretty low-effort workflow is all that is required:

Start by loading an image and then passing it through PrepImageForInsightFace to resize and crop the image as required by the FaceID model.
Then load the required models - use IPAdapterModelLoader to load the ip-adapter-faceid_sdxl.bin model, the CLiP Vision model CLIP-ViT-H-14-laion2B.safetensors, and Insight Face (since I have an Nvidia card, I use CUDA).
As usual, load the SDXL model but pass that through the ip-adapter-faceid_sdxl_lora.safetensors LoRA first. I tested with and without the LoRA (by bypassing it) and it made no difference. No idea!
Then all the above become inputs (ipadapter, clip_vision, insightface image and model) to the IPAdapterApplyFaceID node - I don’t know what I am doing so I can’t explain the parameters...
Use the clip output to do the usual SDXL clip text encoding for the positive and negative prompts.
And, I use the KSamplerAdvanced node with the model from the IPAdapterApplyFaceID node, and the positive and negative conditioning, and a 1024x1024 empty latent image as inputs.

IP-Adapter-FaceID Examples

Here are three outputs using generative AI images from my previous posts - the first two were from “Stability AI Control LoRAs” and the third from “SDXL Revision workflow in ComfyUI” (from 19 and 20 Aug 2023 respectively). No parameters were changed in all three runs:

The portraits look sufficiently consistent to me, even hair color! What do you think?

IP-Adapter Face Example

To compare the standard IP-Adapter with IP-Adapter-FaceID previously:

Change the IPAdapterModelLoader to load ip-adatper-plus-face_sdxl_vit-h.safetensors,
Use PrepImageForClipVision instead, and
Remove the LoRA by bypassing it.

No other parameters were changed. Here is the output for the first example - I think this is even better, which do you prefer?

(Generative AI research never ceases to amaze - with the “openness” of SDXL and ComfyUI, crazy-smart people made so many improvements in in such a short time)

❮ Older

Newer ❯