CharForge: character consistency with one reference image

June 27, 2025 • Rishi Desai

Demo Available: Try CharForge yourself here.

I spent hundreds of hours exploring character consistency in image generation models. I surveyed the best methods from cutting-edge research to complex ComfyUI workflows. I synthesized this knowledge to build CharForge, the best method for generating character consistent images from a single reference.

What’s Character Consistency?

Character consistency is an AI model’s ability to maintain a character’s distinct appearance across multiple images and scenarios. This includes maintaining a consistent face, body, and outfit, all of which are notoriously difficult for image generation models to get right.

There are two classes of models for achieving character consistency:

Inference-time generation, which uses a reference image during prompt-based generation.
LoRA training, which teaches the model a specific character identity from a dataset.

	Inference Time	LoRA Training
Closed-Source	GPT Image 1, RunwayML Gen-4, Midjourney	OpenArt
Open-Source	InfiniteYou, PuLID, ACE++	CharForge

Inference Time Generation

Closed-Source Models

The two best models are GPT Image 1 and RunwayML Gen-4. GPT’s native image-generation can perform tasks that previously required multiple specialized models, like reference-based character generation, pose manipulation, style transfer, and inpainting. It has remarkable prompt-adherence, far better than Flux.1-dev, but face quality of real people does suffer (see FaceEnhance for more details). Furthermore, GPT is slow and often refuses to answer harmless prompts.

Open-Source Models

There are no good open-source image-to-image models that accept reference character images. There are, however, a class of models that generate images while maintaining face consistency. The best inference-time model is Infinite-You, followed by PuLID-Flux.1-dev. These are image-to-image models that take a face input image and text prompt.

The biggest weaknesses of these models is they can’t reliably control hair, clothing, and multi-view consistency. The generated image often looks like the face is copy-pasted onto the person. Furthermore, they rely on InsightFace to extract facial embeddings. Insightface struggles to detect anime and cartoon characters’ faces, so these models only work on photorealistic images of people. These are four images generated from the same face reference with InfiniteYou.

If you want high-quality, full-body character consistency, especially across multiple angles and scenarios, training a LoRA is the superior approach and Flux.1-dev is the best open-source model for this.

Training a LoRA

Training a LoRA on Flux offers the best balance between efficiency and quality, and outperforms all inference-time approaches with respect to character consistency. To train a LoRA, we typically gather a dataset of 10-20 high quality images of the character in various poses, lighting and expressions

This is straightforward for popular characters and people, where you can easily find diverse high quality images online. But often with AI generated characters, we only have 1 generated image. We can’t train a LoRA on a single image, so we’ll need to generate a character sheet.

What’s a Character Sheet?

A character sheet is a curated collection of images depicting a character from multiple perspectives, poses, lighting, and expressions. These images serve as the dataset for training a LoRA model to generate consistent representations of the character across various contexts.

The character sheet must contain diverse but consistent visual information about the character. This ensures that the trained model can accurately reproduce the character’s unique features, such as facial attributes, clothing, and accessories, across generated images.

CharForge

We only have one image of the character, so CharForge synthetically generates the entire character sheet.

More information on the character sheet generation process will be added soon!

Contents