Happy Horse 1.0 Technical Specifications

Apr 7, 2026

Model Architecture

15B Sandwich Transformer

40-layer unified sandwich Transformer architecture. Modality-specific and shared layers work together seamlessly.

The 15-billion-parameter model uses a single-stream self-attention architecture to handle video and sound as one unified sequence.

Unified Architecture

One model handles video and audio simultaneously. No separate pipelines, no cross-attention modules, no manual syncing required.

Performance Specifications

~38s to 1080p

Generation time on H100 GPU. DMD-2 distillation enables rapid generation.

With MagiCompiler, it's 1.2x faster.

1080p Native Resolution

True 1080p output, not upscaled. Crisp details without artifacts.

5-10s Duration

Single-pass generation. Ideal for short-form content creation.

Audio Capabilities

Joint Video + Audio

One unified model handles it all. Dialogue, ambient sound, foley—all synchronized.

7 Languages Lip Sync

Lip sync that matches speech naturally:

LanguageSupported
English
Mandarin
Cantonese
Japanese
Korean
German
French

Inference Optimization

8-Step DMD-2 Distillation

Fast and efficient inference. No CFG needed.

MagiCompiler Support

1.2x faster with MagiCompiler optimization.

Rankings

CategoryRankElo Score
Text-to-Video (No Audio)#11,375
Image-to-Video (No Audio)#11,392
Lead over Seedance 2.060+

Open Source

The team has committed to releasing the full open-source package, including the base model, distilled versions, and inference code by mid-2026.