ERA V5 Registration

Note

Late-enrolment window open till 3rd July 2026. Session 1 was held on Saturday, 27th June 2026 and the recording is already available on the LMS. Late joiners get full access to the Session 1 video the moment their enrolment is processed.

Important

ERA V5 is a course designed to train students to actually train Large Language Models from scratch — pre-training, post-training, alignment, and serving — at frontier scale. It is built for beginners willing to commit fully. The pace is fast and the learning curve is steep.

Join only if you can promise yourself ~6 months of disciplined immersion.

ERA V5 is a hands-on, lab-run course where students are taught to build, train, and release a frontier-scale Mixture-of-Experts language model. We do not teach LLM training as theory. We run the training, and YOU are the lab.

What we built in ERA V4

V4 produced a real, public, working frontier-scale model.

LightningLM 0.1V — a 118.67B-parameter Mixture-of-Experts model grown through four stages: 1.78B dense → 4.96B MoE → 9.36B MoE → 118.67B MoE. Trained on GPU nodes including A100 → H200 → B200, bf16 throughout, for around 40–50 days of pure GPU training. Public checkpoints on HuggingFace under theschoolofai/LightningLM-0.1V-*.
BrahmicTokenizer-131K — an Indic-tokenizer beating all other tokenizers of its class, with first-class support for Devanagari, Telugu, Odia, Bengali, Tamil, Kannada, Gujarati, Malayalam, Gurmukhi, Assamese, and more. Published on arXiv.
Kronecker Embeddings — a factorized byte-level embedding construction that eliminates 91–94% of input-side parameters compared to a standard embedding table. Published on arXiv.
Systems paper — the full training methodology, growth strategy, and infrastructure for the 120B training pipeline behind LightningLM 0.1V. Published on arXiv.

V4 was largely executed as the last stage of the course, the Capstone.

V5 is built to change that, whole course IS the training and Capstone.

What V5 will train

A Large Mixture-of-Experts model at frontier-scale. Current SOTA at every layer of the stack: tokenizer, architecture, optimizer, kernels, parallelism, precision — assembled together rather than each studied in isolation. Pre-training, supervised fine-tuning, preference alignment, and inference serving are all covered before the run launches. The scaffold (the training framework itself) and the final model are released openly. The cohort writes and submits research papers on the contributions. Capstone sub-projects in ERA V4 is now our syllabus!

How V5 is different from any other LLM course

There really is no other course like it. Since other's objective is to add more students or earn just money, nobody else will focus on actual research, publishing something, creating something that India can be proud of.. none of these obective would meet their investors or stomach.

TSAI exists to build this capability in India and challenge the world. PERIOD. This is also why we offer 15 days no questions asked full refund, and charge 1/5th to 1/10th of what others charge. And how do we do this?

We actually train the model. As part of the course. Session 20 is not a write-up of someone else's training run, it is the kickoff of ours. Training continues past the formal calendar with students staffed into running roles.
You contribute to a public research artifact. The scaffold is released open-source. Real contribution earns named authorship on the systems paper.
No history lessons. We teach the current best — optimizer, attention, alignment — not the evolution that led there. Time is too short for archaeology.
Beginners welcome. Coasting impossible. We start from foundations, but you will be in deep technical territory by week six. You WILL fall behind if you are not dedicating your next 4-6 months. 1 missed class will set you back by weeks! Remember you are doing what others take years to learn and have 100s of professional staff to support! Here is you and your instructor!

Course structure


Duration	~6 months, including the training run that continues past the formal calendar
Sessions	20 classes, each up to 3 hours, live
Schedule	Every Saturday, 7:00 AM IST
Format	Live coding + weekly assignments + ongoing lab contributions
Assignments	After every class; minimum 70% to qualify for the completion certificate
Capstone	The actual training run itself (starts around 22nd week), students are staffed into running roles (training tracking, evaluation, alignment, operations, ablation, narrative) and continue contributing past the formal calendar

Syllabus

#	Class	Focus
1	Transformer Foundations	Attention, multi-head attention, positional encodings; build a minimal transformer block from scratch
2	Tokenization & Vocabulary Design	BPE, WordPiece, SentencePiece; vocab size, merges, frequency sorting; Indic and multilingual
3	Data Collection & Sourcing	Sourcing across the full lifecycle: pre-training corpora, SFT, preference, safety, evaluation
4	Data Cleaning & Deduplication	Quality filters, MinHash/LSH dedup, toxicity/PII, contamination scans; reproducible at scale
5	Data Mixtures & Curriculum	Domain weighting, upsampling, mixture-shift effects on loss
6	Building the Training Dataset	Sharding, packing, streaming dataloaders, tokenized binary formats; resumable data ordering
7	Embeddings & Model Internals	Token, positional, factorized (Kronecker) embeddings; weight tying
8	Modern Attention Variants	RoPE, ALiBi, GQA/MQA, sliding-window, linear-attention families; long-context extension
9	Loss Functions & Output Heads	Cross-entropy, adaptive softmax, fused linear CE kernels, multi-token prediction
10	Training Loop Fundamentals	Forward/backward, gradient accumulation, mixed precision, gradient clipping
11	Optimizers & Learning-Rate Schedules	AdamW, weight decay, warmup, cosine vs WSD, EMA; linear scaling rule
12	Distributed Training I: Data Parallel & ZeRO	DDP, ZeRO 1/2/3; memory math for multi-GPU
13	Distributed Training II: Model & Pipeline Parallel	Tensor, pipeline, sequence parallelism; communication overhead, topology-aware placement
14	Mixture-of-Experts	Routing, load balancing, expert sharding, active-vs-total params
15	Stability, Debugging & Live Monitoring	Divergence detection, frozen-layer constraints, live training dashboards
16	Scaling Laws & Compute Planning	Chinchilla-style token/param trade-offs, compute budgeting, run sizing
17	Supervised Fine-Tuning	Current best SFT recipes; instruction datasets; LoRA/QLoRA; instruction-following benchmarks
18	Preference Alignment & Inference Serving	Current SOTA preference learning (GRPO/DPO family); vLLM serving, throughput/latency
19	Infrastructure, Checkpointing & Quantization	Cloud provisioning, fault tolerance, QAT; provisioning the actual cluster the run launches on
20	Training Run Kickoff & Ongoing Lab Operations	Launching the lab's flagship training run; ongoing roles continue past the formal calendar

Note

This is the 15th time this course is being written from scratch, starting from EVA V1!

What you need to bring

A laptop. No local GPU required.
Python comfort helps; we cover it. PyTorch is covered.
A working internet connection for live Saturday classes.
The willingness to write code every week and not skip assignments. Usage of Claude/AntiGravity/etc is allowed.

What we will not do

We will not teach historical methods that are no longer SOTA.
We will not pre-build the scaffold and hand it to you. You build it with us, live in class.
We will not assign roles. Natural selection plays its role. Students who put in work get the contribution credit.

Pricing and refund

The course is priced significantly below comparable IIT/IIIT programs while covering work those programs do not attempt. Pricing differs for returning TSAI students. AWS credits are included; you do not pay separately for compute.

15-day, no-questions-asked refund — equivalent to the first 2 classes. After that window your seat is committed. There are no transfers to future cohorts.

Pricing details will be shared with your enrollment form.

Schedule

Milestone	Date
Registration opens	29th May 11:59PM
Registration extended close	3rd July 2026 (late enrolment window for those who missed the 26th June deadline)
Enrolment opens	12th June
Enrolment extended close	3rd July 2026
First class (already held — recording available to late joiners)	27th June, 7:00 AM IST (Saturday)

**Late enrolment window open till 3rd July 2026.** The first class was held on 27th June; late joiners get the recording. Classes continue every Saturday at 7:00 AM IST.

Warning

This is not a course you finish by watching videos. ERA V5 demands real weekly work. The training run we are building is the kind of work that frontier labs do with 100+ experienced staff — we are doing it with 300 motivated students plus an instructor who has done it once before, solo. Once you cross the 15-day refund window, you are committed. Buckle up.

Note

Public links to LightningLM 0.1V, BrahmicTokenizer-131K, and Kronecker Embeddings are available at theschoolof.ai. Read them before you enrol — they show exactly the kind of work this course produces.

ERA V5 Registration

What we built in ERA V4

What V5 will train

How V5 is different from any other LLM course

Course structure

Syllabus

What you need to bring

What we will not do

Pricing and refund

Schedule

Register for ERA V5

Get notified about the next cohort