Fish Audio S2 — Pre-Release

Best text-to-speech system among both open source and closed source.

Trained on 10M+ hours of audio across ~50 languages, S2 combines a Dual-AR architecture (Qwen3 backbone) with GRPO reinforcement learning alignment to produce natural, emotionally rich speech with fine-grained inline control.

Technical Report · Blog · Model · Playground

Model

Variant	Params	Codec	Output
S2-Pro	4B (slow) + 400M (fast)	ModifiedDAC, 10 codebooks, ~21 Hz	44.1 kHz

Highlights

Dual-AR: Slow AR (4B) predicts semantic codebook along time axis; Fast AR (400M) fills 9 residual codebooks per step
Inline Control: Free-form tags like [laugh], [whispers], [super happy] at word level
RL Alignment: GRPO with unified data-reward pipeline — same model for data filtering and RL reward
SGLang Streaming: RTF 0.195, TTFA ~100ms, 3000+ tokens/s on single H200
50+ Languages, multi-speaker (<|speaker:i|>), multi-turn, rapid voice cloning (10-30s reference)

What's Changed

Model & Inference

New Dual-AR architecture with Qwen3 backbone, replacing Fish-Speech v1.5
New ModifiedDAC audio codec (replaces Firefly/VQ-GAN)
Support fish_qwen3_omni checkpoint format (sharded safetensors) with backward compatibility
Fixed: torch.compile bugs, GPU memory leak, audio quality issues

Docker & Deployment

Docker overhaul: multi-target builds, compose support, health checks, non-root user
SGLang server integration

API & Server

Reference voice management API (CRUD), multipart upload support
Various server bug fixes, /health endpoint

Finetune

Full finetune pipeline for S1/S2 (datasets, training, LoRA merge)

Docs & Infra

README & MkDocs rewritten for S2 across 6 languages
License updated to Fish Audio Research License
Removed legacy code (Firefly VQ-GAN, SenseVoice, Fish Agent, old batch files)

Assets 2

31 May 12:15

leng-yue

v1.5.1

58046ea

V1.5.1 Latest

Latest

The last stable branch before the next model release.

Assets 2

25 Dec 02:53

leng-yue

v1.5.0

7902e40

V1.5.0

Fish Speech 1.5 release, both inference and finetune are done.

Assets 2

29 Nov 06:36

leng-yue

v1.4.3

1359896

V1.4.3

Last stable release before 1.5

Assets 2

25 Oct 07:15

leng-yue

v1.4.2

f8a57fb

V1.4.2

What's Changed

Add Audio Select to WebUI by @PoTaTo-Mika in #556
Fix cache max_seq_len by @AnyaCoder in #568
docs: Docker icon is missing in zh-cn README & ja README displays that it is in English & properer expression “简体中文” by @Octopus058 in #569
docs: Corrected the wrong expressions of supported languages in README by @Octopus058 in #574
Api json format by @AnyaCoder in #588
Update v1.4 readmes & samples by @AnyaCoder in #592
[chore] add docs for macos by @Tps-F in #544
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #599
chore: typo fix on post_api by @bjwswang in #605
feat: enable more workers in api.py by @AnyaCoder in #621
Fix broken remove_parameterization in firefly by @med1844 in #620
Fix dockerfile by @AnyaCoder in #622
Fix dockerfile for pyaudio by @AnyaCoder in #623
Update docs by @AnyaCoder in #626
Fix backend by @AnyaCoder in #627
Update docs by @AnyaCoder in #638