⚠️ Community curated page based on public info. For official updates visitGitHub

Open-source from Meituan LongCat team

LongCat-Image
RedefinesChinese text-to-image quality

The first 6B lightweight model that truly understands Chinese characters.
Beautiful typography, commercial-grade editing, and blazing fast generation.

Create freely on consumer GPUs, no tricks required.

🚀 Try Live 📥 Download weights 🛠️ GitHub repo

Parameters

8GB

Min VRAM

90%+

Chinese accuracy

~3s

Per image

No install needed

Live demo

Enter a prompt and feel LongCat-Image-Edit handle Chinese text-aware editing.

🤗

Official HF Space

Text-to-image, seconds per shot

✏️

Community Edit build

Inpaint, replace background, keep ID

📋 Product snapshot

LongCat-Image is an open-source text-to-image model built by Meituan. With a lightweight 6B DiT (Diffusion Transformer) architecture, it delivers visuals on par with or better than closed 20B+ models while staying friendly to creators and developers.

Why LongCat-Image

✦
Native Chinese support:No more garbled Chinese. From complex characters to vertical layouts, accuracy pushes past 90%.
✦
Commercial-grade editing:The LongCat-Image-Edit build enables precise inpaint. Swap backgrounds or outfits while keeping identity intact.
✦
Low barrier:Runs smoothly on 8GB GPUs like RTX 3060. Built-in prompt refiner helps beginners get pro results in minutes.

Core capabilities

Four pillars designed for Chinese creators

🇨🇳

Best-in-class Chinese rendering

Finally, correct Chinese in text-to-image outputs.

•Dual text encoders cover 99% common characters
•Horizontal, vertical, and stylized fonts render cleanly
•Mixed Chinese/English in the same frame

✨

Prompt rewrite assistant

Explain ideas in plain language—LongCat rewrites prompts for you.

•Lightweight LLM module for prompt polishing
•Turn ‘girl in the snow’ into studio-grade prompts
•Boosts first-try success rates for new users

🖌️

Commercial editing

LongCat-Image-Edit specializes in precise inpaint.

•ID lock: swap outfits and scenes while keeping the face
•Lighting-aware blending for seamless edits
•Natural-language commands like ‘replace background with palace red wall’

⚡

Extreme performance

6B params delivering 20B-level detail.

•RTX 4090: <2s per image
•RTX 3060 8GB: ~5s
•INT8 quantization supported

Technical overview

Under the hood of LongCat-Image

Architecture

DiT + Flow Matching

Modern diffusion transformer

Params

Lightweight yet powerful

Sampler

ODE Solver

Efficient sampling

Default steps

Fast high-quality outputs

Text encoders

T5-XXL + CLIP

Understands complex semantics

VAE

Custom HiFi VAE

Sharper details, better fidelity

3 ways to get started

Pick the path that matches your workflow

Zero-install demo

Try everything in the browser, no setup required.

🤗 HuggingFace Space 🚀 ModelScope demo

Python local deploy

Great for developers or batch jobs.

pip install diffusers transformers accelerate

from diffusers import LongCatPipeline
import torch

pipe = LongCatPipeline.from_pretrained(
    "meituan-longcat/LongCat-Image",
    torch_dtype=torch.bfloat16
).to("cuda")

image = pipe("A ginger cat sitting beside a mooncake with 'Happy Mid-Autumn' written next to it").images[0]
image.save("output.png")