Laguna XS.2 và Laguna M.1: Poolside ship coding model open-weight đầu tiên

Có gì mới

Poolside ship hai model đầu tiên trong family Laguna, cùng agent runtime mà họ dùng nội bộ để train + operate agent.

Laguna M.1: model capable nhất, finished pre-training cuối năm ngoái
Laguna XS.2: model nhỏ hơn nhiều nhưng remarkably capable cho size, và là first open-weight release của Poolside

Cả hai free trong thời gian giới hạn qua API và OpenRouter. Laguna XS.2 weight available dưới Apache 2.0.

Vị trí của Poolside

Lab này lâu nay focus vào government + public sector — họ build model cho môi trường air-gapped và high-security. Đây là lần đầu họ ship cho cộng đồng wider, và là lần đầu ship open-weight.

Định hướng: agent capable hơn → coding capability + long-horizon task. Họ argue tool calling là transitional pattern, vì code là interface expressive hơn nhiều — agent viết và execute code có thể compose action, parallelize work, build ad-hoc system.

Spec model

Laguna M.1

225B total / 23B activated parameters (Mixture of Experts)
Train completely in-house, from scratch
30T token training
6.144 NVIDIA Hopper GPU interconnected

Benchmark vs các model lớn:

	Laguna M.1	Devstral 2 (123B dense)	GLM-4.7 (355B-A32B)	DeepSeek-V4-Flash (284B-A13B)	Qwen3.5 (397B-A17B)	Claude Sonnet 4.6
SWE-bench Verified	72.5	72.2	73.8	79.0	76.2	79.6
SWE-bench Multilingual	67.3	61.3	66.7	73.3	69.3	-
SWE-bench Pro	46.9	-	-	52.6	50.9	-
Terminal-Bench 2.0	40.7	32.6	41.0	56.9	52.5	59.1

Laguna XS.2 (open-weight)

33B total / 3B activated (MoE)
Apache 2.0, available trên Hugging Face + OpenRouter + Ollama
Day‑1 support TensorRT-LLM, có NVFP4 build cho Blackwell

	Laguna XS.2	Devstral Small 2 (24B)	Gemma 4 (31B)	Qwen3.5 (35B-A3B)	Qwen3.6 (35B-A3B)	Claude Haiku 4.5	GPT-5.4 Nano
SWE-bench Verified	68.2	68.0	52.0	69.2	73.4	73.3	-
SWE-bench Multilingual	62.4	55.7	51.7	60.3	67.2	-	-
SWE-bench Pro	44.5	-	35.7	44.6	49.5	39.5	52.4
Terminal-Bench 2.0	30.1	22.5	42.9	40.5	51.5	29.8	46.3

Reading benchmark: Laguna XS.2 ngang Qwen3.5 35B-A3B trên SWE-bench Verified và Pro, vượt Devstral Small 2. Nhưng Terminal-Bench 2.0 chỉ 30.1 — thua xa Claude Sonnet 4.6 (59.1) và DeepSeek-V4-Flash (56.9). Reading: Laguna mạnh ở patch-style task (SWE-bench), chưa phải shell agent generic (Terminal-Bench).

Architecture quyết định kỹ thuật

Data pipeline + AutoMixer

Total 30T token. Web data curation treat như joint optimization của quality + diversity — không chỉ keep top-quality (vì biased toward STEM/reasoning) mà retain mid- và lower-quality buckets để preserve diversity.

AutoMixer: framework để optimize data mixture. Mỗi run train ~60 proxy model trên data mix khác nhau, đo performance trên capability group (code, math, STEM, common sense), fit surrogate regressor để approximate tác động của tỷ lệ dataset đến downstream eval. Inspired by Olmix, MDE, RegMix.

Learning signal recovered: code performance driven mạnh bởi synthetic + curated code source, web data hurt nó. Math benefit từ diverse web math. STEM correlate với academic/educational text.

Synthetic data: ~13% final mix cho Laguna XS.2, ~4.4T+ synthetic token cho cả family. Spectrum giữa seed-heavy (reshape content qua format Q&A, list, dialogue) và pipeline-heavy (feature extraction + recomposition).

Muon optimizer

All training stage dùng distributed implementation của Muon. Initial pre-training ablation: cùng training loss với AdamW baseline trong 15% step ít hơn, evaluation uplift lớn ở final model, learning rate transfer được giữa các scale.

Muon naive có compute overhead lớn (Newton-Schulz orthogonalization). Implementation của Poolside assign mỗi parameter và gradient cho 1 rank, gather full param/grad trên rank đó, do Newton-Schulz, redistribute orthogonalized gradient shard về các rank khác. Overlapped batched communication với Newton-Schulz computation.

Kết quả: trong pre-training của Laguna M.1, overhead optimizer dưới 1% training step time.

Benefit thêm: Muon chỉ cần 1 state per parameter (vs AdamW 2) → checkpoint nhỏ hơn, save/load nhanh hơn.

Hash check chống silent data corruption

Update và compute replicated qua DDP rank → có periodic hash check trên model weight để assert mọi replica hold cùng weight. Mục đích chính: catch silent data corruption từ GPU defective (lỗi origin trong arithmetic logic + pipeline register, không được ECC bảo vệ như DRAM/SRAM). Cũng catch race condition + collective communication bug + replica divergence.

Async on-policy RL

Fully async online RL system, dùng agentic harness của họ inside training loop, chạy cross task end-to-end SE + terminal + tool-integrated reasoning thật.

Loop: trainer publish checkpoint mới → deploy lên inference cluster → actor pull task từ dataset, spin up sandboxed container, chạy production agent binary với fresh model → trajectory được score, filter, write vào Iceberg table → trainer consume continuous, produce next checkpoint.

Weight transfer custom qua GPUDirect RDMA: transfer hàng trăm GB weight trong vài giây. Cho Laguna M.1, BF16 weight transfer giữa training và inference trong <5s cross-node.

Token-in token-out actor: token ID preserved qua nhiều agentic turn trong cả trajectory, tránh re-tokenization mismatch — vấn đề common gây off-policy.

Dùng variant của CISPO algorithm cho off-policy stability. Run RL maintain stability nhiều ngày training, không cần technique bổ sung như entropy regularization.

Cách sử dụng

Free trial qua API trong thời gian giới hạn
OpenRouter: https://openrouter.ai/provider/poolside
Ollama (XS.2): https://ollama.com/library/laguna-xs.2
Pool agent harness (research preview): https://poolside.ai/get-started
Shimmer (Poolside vision cho future of building software): https://shimmer.run/
Higher rate limit / weights M.1 cho startup, viện, đại học: contact models@poolside.ai

Dev nên quan tâm vì…

Nếu deploy on-prem / air-gapped: Laguna XS.2 (Apache 2.0, NVFP4) là option mới ngoài Qwen / Llama — đặc biệt build cho high-security environment.
Nếu compare với Qwen3.5 35B-A3B: số liệu rất sát (SWE-bench Verified 68.2 vs 69.2). Khác biệt lớn nhất: Laguna có Day-1 TensorRT-LLM optimization và NVFP4 build cho Blackwell.
Nếu task là patch generation kiểu SWE-bench: Laguna XS.2 competitive. Nếu task là shell agent (Terminal-Bench style), Qwen3.6 hoặc Claude Haiku 4.5 vẫn lead — chưa thay được.
Nếu interested kỹ thuật train: Muon optimizer (15% fewer step), async on-policy RL với weight transfer <5s, AutoMixer cho data mix, hash check cho SDC — đều là pattern reusable cho team self-host training.

Laguna XS.2 và Laguna M.1: Poolside ship coding model open-weight đầu tiên

TL;DR

Có gì mới

Vị trí của Poolside

Spec model

Laguna M.1

Laguna XS.2 (open-weight)

Architecture quyết định kỹ thuật

Data pipeline + AutoMixer

Muon optimizer

Hash check chống silent data corruption

Async on-policy RL

Cách sử dụng

Dev nên quan tâm vì…

Đường dẫn nguồn

Laguna XS.2 và Laguna M.1: Poolside ship coding model open-weight đầu tiên

TL;DR

Có gì mới

Vị trí của Poolside

Spec model

Laguna M.1

Laguna XS.2 (open-weight)

Architecture quyết định kỹ thuật

Data pipeline + AutoMixer

Muon optimizer

Hash check chống silent data corruption

Async on-policy RL

Cách sử dụng

Dev nên quan tâm vì…

Đường dẫn nguồn

Cùng bản tin này