Happy Horse 1.0 Technical Specifications

Apr 7, 2026

Model Architecture

40-layer unified sandwich Transformer architecture. Modality-specific and shared layers work together seamlessly.

The 15-billion-parameter model uses a single-stream self-attention architecture to handle video and sound as one unified sequence.

One model handles video and audio simultaneously. No separate pipelines, no cross-attention modules, no manual syncing required.

Generation time on H100 GPU. DMD-2 distillation enables rapid generation.

With MagiCompiler, it's 1.2x faster.

True 1080p output, not upscaled. Crisp details without artifacts.

Single-pass generation. Ideal for short-form content creation.

One unified model handles it all. Dialogue, ambient sound, foley—all synchronized.

Lip sync that matches speech naturally:

Fast and efficient inference. No CFG needed.

1.2x faster with MagiCompiler optimization.

The team has committed to releasing the full open-source package, including the base model, distilled versions, and inference code by mid-2026.