I never tried generating video clips or animations with SDXL before, simply because my GPU is not powerful enough. But after testing out the LCM LoRA for SDXL yesterday, I thought I’d try the SDXL LCM LoRA with Hotshot-XL, which is something akin to AnimateDiff.

I mainly followed these two guides: ComfyUI SDXL Animation Guide Using Hotshot-XL, and ComfyUI AnimateDiff Guide/Workflows Including Prompt Scheduling by Inner_Reflections_AI.

Installing

I am using ComfyUI with Dr. Lt. Data’s ComfyUI-Manager.

If you haven’t installed the later, and if you are using ComfyUI Portable and have Git installed, then just run these two commands in the ComfyUI Portable folder and then restart ComfyUI installed):

.\python_embeded\python.exe -s -m pip install gitpython
.\python_embeded\python.exe -c "import git; git.Repo.clone_from('https://github.com/ltdrdata/ComfyUI-Manager', './ComfyUI/custom_nodes/ComfyUI-Manager')"

Using the ComfyUI Manager, install AnimateDiff-Evolved and VideoHelperSuite custom nodes, both by Jedrzej Kosinski.

And download either the Hotshot-XL Motion Model hotshotxl_mm_v1.pth or the alternative Hotshot-XL Model hsxl_temporal_layers.f16.safetensors to ComfyUI\custom_nodes\ComfyUI-AnimateDiff-Evolved\models.

ComfyUI Workflow

I successfully managed to generate a few video clips with the flow below. Admittedly I am not sure what to expect, as I do not know what types of “motion keywords” the Hotshot-XL model understands. The Hotshot-XL documentation states that it was “trained to generate 1 second GIFs at 8 FPS.”

SDXL LCM LoRA with Hotshot-XL motion model

Here is the flow and a few points to note:

  • First, load SDXL and then the LCM LoRA as usual,
  • And create the usual positive and negative prompts with CLIPTextEncodeSDXL and CLIPTextEncode respectively,
  • Next use the Uniform Context Options - I am unable to increase context_length above 8 due to memory limitations, so the video I get is a bit... disjointed,
  • Pass the options to the AnimateDiff Loader, which load the Hotshot-XL model with the linear (HotshotXL/default) scheduler,
  • Create an empty latent image - Hotshot-XL was trained for certain aspect ratios, and here I am using 608 x 416,
  • Then, pass the model output to the usual KSampler with the lcm sampler and sgm_uniform scheduler - I used 8 steps, too few and I get fuzzy, speckled output,
  • Finally, after saving the image, use the VideoCombine node to create a video file - I use the webp or webm format:

I did try AnimateDiff SDXL, specifically the beta mm_sdxl_v10_beta.ckpt model too, but the flow above does not work for me.