ERA V5 — Extended Resident AI logo

ERA V5 Registration

Important

ERA V5 is a course designed to train students to actually train Large Language Models from scratch — pre-training, post-training, alignment, and serving — at frontier scale. It is built for beginners willing to commit fully. The pace is fast and the learning curve is steep.

Join only if you can promise yourself ~6 months of disciplined immersion.


ERA V5 is a hands-on, lab-run course where students are taught to build, train, and release a frontier-scale Mixture-of-Experts language model. We do not teach LLM training as theory. We run the training, and YOU are the lab.

What we built in ERA V4

V4 produced a real, public, working frontier-scale model.

  • LightningLM 0.1V — a 118.67B-parameter Mixture-of-Experts model grown through four stages: 1.78B dense → 4.96B MoE → 9.36B MoE → 118.67B MoE. Trained on GPU nodes including A100 → H200 → B200, bf16 throughout, for around 40–50 days of pure GPU training. Public checkpoints (available post Technical Paper Release in first week of June) on HuggingFace under theschoolofai/LightningLM-0.1V-*.
  • BrahmicTokenizer-131K — an Indic-tokenizer beating all other tokenizers of its class, with first-class support for Devanagari, Telugu, Odia, Bengali, Tamil, Kannada, Gujarati, Malayalam, Gurmukhi, Assamese, and more. Published on arXiv.
  • Kronecker Embeddings — a factorized byte-level embedding construction that eliminates 91–94% of input-side parameters compared to a standard embedding table. Published on arXiv. (Link will be added in first week of June)
  • Systems paper — the full training methodology, growth strategy, and infrastructure paper covering the LightningLM 0.1V run is being finalized for public release.

V4 was largely executed as the last stage of the course, the Capstone.

V5 is built to change that, whole course IS the training and Capstone.

What V5 will train

A Large Mixture-of-Experts model at frontier-scale. Current SOTA at every layer of the stack: tokenizer, architecture, optimizer, kernels, parallelism, precision — assembled together rather than each studied in isolation. Pre-training, supervised fine-tuning, preference alignment, and inference serving are all covered before the run launches. The scaffold (the training framework itself) and the final model are released openly. The cohort writes and submits research papers on the contributions. Capstone sub-projects in ERA V4 is now our syllabus!

How V5 is different from any other LLM course

There really is no other course like it. Since other's objective is to add more students or earn just money, nobody else will focus on actual research, publishing something, creating something that India can be proud of.. none of these obective would meet their investors or stomach.


TSAI exists to build this capability in India and challenge the world. PERIOD. This is also why we offer 15 days no questions asked full refund, and charge 1/5th to 1/10th of what others charge. And how do we do this?


  • We actually train the model. As part of the course. Session 20 is not a write-up of someone else's training run, it is the kickoff of ours. Training continues past the formal calendar with students staffed into running roles.
  • You contribute to a public research artifact. The scaffold is released open-source. Real contribution earns named authorship on the systems paper.
  • No history lessons. We teach the current best — optimizer, attention, alignment — not the evolution that led there. Time is too short for archaeology.
  • Beginners welcome. Coasting impossible. We start from foundations, but you will be in deep technical territory by week six. You WILL fall behind if you are not dedicating your next 4-6 months. 1 missed class will set you back by weeks! Remember you are doing what others take years to learn and have 100s of professional staff to support! Here is you and your instructor!

Course structure

Duration~6 months, including the training run that continues past the formal calendar
Sessions20 classes, each up to 3 hours, live
ScheduleEvery Saturday, 7:00 AM IST
FormatLive coding + weekly assignments + ongoing lab contributions
AssignmentsAfter every class; minimum 70% to qualify for the completion certificate
CapstoneThe actual training run itself (starts around 22nd week), students are staffed into running roles (training tracking, evaluation, alignment, operations, ablation, narrative) and continue contributing past the formal calendar

Syllabus

#ClassFocus
1Transformer FoundationsAttention, multi-head attention, positional encodings; build a minimal transformer block from scratch
2Tokenization & Vocabulary DesignBPE, WordPiece, SentencePiece; vocab size, merges, frequency sorting; Indic and multilingual
3Data Collection & SourcingSourcing across the full lifecycle: pre-training corpora, SFT, preference, safety, evaluation
4Data Cleaning & DeduplicationQuality filters, MinHash/LSH dedup, toxicity/PII, contamination scans; reproducible at scale
5Data Mixtures & CurriculumDomain weighting, upsampling, mixture-shift effects on loss
6Building the Training DatasetSharding, packing, streaming dataloaders, tokenized binary formats; resumable data ordering
7Embeddings & Model InternalsToken, positional, factorized (Kronecker) embeddings; weight tying
8Modern Attention VariantsRoPE, ALiBi, GQA/MQA, sliding-window, linear-attention families; long-context extension
9Loss Functions & Output HeadsCross-entropy, adaptive softmax, fused linear CE kernels, multi-token prediction
10Training Loop FundamentalsForward/backward, gradient accumulation, mixed precision, gradient clipping
11Optimizers & Learning-Rate SchedulesAdamW, weight decay, warmup, cosine vs WSD, EMA; linear scaling rule
12Distributed Training I: Data Parallel & ZeRODDP, ZeRO 1/2/3; memory math for multi-GPU
13Distributed Training II: Model & Pipeline ParallelTensor, pipeline, sequence parallelism; communication overhead, topology-aware placement
14Mixture-of-ExpertsRouting, load balancing, expert sharding, active-vs-total params
15Stability, Debugging & Live MonitoringDivergence detection, frozen-layer constraints, live training dashboards
16Scaling Laws & Compute PlanningChinchilla-style token/param trade-offs, compute budgeting, run sizing
17Supervised Fine-TuningCurrent best SFT recipes; instruction datasets; LoRA/QLoRA; instruction-following benchmarks
18Preference Alignment & Inference ServingCurrent SOTA preference learning (GRPO/DPO family); vLLM serving, throughput/latency
19Infrastructure, Checkpointing & QuantizationCloud provisioning, fault tolerance, QAT; provisioning the actual cluster the run launches on
20Training Run Kickoff & Ongoing Lab OperationsLaunching the lab's flagship training run; ongoing roles continue past the formal calendar
Note

This is the 15th time this course is being written from scratch, starting from EVA V1!


What you need to bring

  • A laptop. No local GPU required.
  • Python comfort helps; we cover it. PyTorch is covered.
  • A working internet connection for live Saturday classes.
  • The willingness to write code every week and not skip assignments. Usage of Claude/AntiGravity/etc is allowed.

What we will not do

  • We will not teach historical methods that are no longer SOTA.
  • We will not pre-build the scaffold and hand it to you. You build it with us, live in class.
  • We will not assign roles. Natural selection plays its role. Students who put in work get the contribution credit.

Pricing and refund

The course is priced significantly below comparable IIT/IIIT programs while covering work those programs do not attempt. Pricing differs for returning TSAI students. AWS credits are included; you do not pay separately for compute.


15-day, no-questions-asked refund — equivalent to the first 2 classes. After that window your seat is committed. There are no transfers to future cohorts.


Pricing details will be shared with your enrollment form.


Schedule

MilestoneDate
Registration opens29th May 11:59PM
Registration closes18th June (or until capacity is met)
Enrolment opens9th June
Enrolment closes19th June (or until capacity is met)
First class21st, 7:00 AM IST (Saturday)

Seats are limited. In every prior cohort, registration has closed before the deadline.
Warning

This is not a course you finish by watching videos. ERA V5 demands real weekly work. The training run we are building is the kind of work that frontier labs do with 100+ experienced staff — we are doing it with 300 motivated students plus an instructor who has done it once before, solo. Once you cross the 15-day refund window, you are committed. Buckle up.

Note

Public links to LightningLM 0.1V, BrahmicTokenizer-131K, and Kronecker Embeddings are available at theschoolof.ai. Read them before you enrol — they show exactly the kind of work this course produces.

Register for ERA V5

Drop your details below. We'll email you when enrolment opens — no spam, ever.

About you

Used for your certificate — please use the exact spelling and capitalisation you want there.

This course can only work with a Gmail address — a personal @gmail.com or a corporate Gmail / Google-Workspace account. The LMS, the live classes, the shared drives all live on Google.

Include the country code. We rarely call — this is for delivery failures and emergencies only.

Your history with us

Have you done a TSAI course before?

Pricing differs for returning students. Tick every cohort you've enrolled in — if this is your first course with us, pick the last option.

A few things to confirm

Are you ready to commit?

In the last two years, AI has evolved at an exponential rate, and keeping pace requires an equally intense level of effort. Success hinges on your commitment and discipline over the next 6–7 months.

Are you aware that in this course you'll need to collect, clean and create public datasets required for training?

Data work is the price of admission. You will contribute to the cohort's training corpora before working on model architecture or optimisation.

Do you know live sessions run at 7:00 AM IST on Saturdays?

Sessions are recorded and made available on the LMS immediately after — but the live class is at 7 AM Indian time.

Do you know this registration is an expression of interest and does not guarantee enrolment?

Capacity is limited and enrolment is first-come, first-served. You'll be emailed your enrolment link on 8 June 2026, when enrolment opens.

Do you know this course is roughly 28 weeks / 6 months long?

Plan for 8–12 hours a week including the live session.

Do you know AWS credits are included to train LLMs on cloud?

AWS credits required for the course will be provided by TSAI, so you do not need to pay separately for course-related compute.

Have you read our 15-day, no-questions-asked refund policy?

To quit the course, just email admin@theschoolofai.in within 15 days or the first 2 classes, and we'll issue a full refund. Read the full policy.

Please join the Telegram group

We post enrolment updates on t.me/theschoolofai — if your email goes to spam, Telegram is the backup channel. Open the link on your phone; desktop join is unreliable.

A little more (optional)

Please read this before submitting

This is a registration form, not enrolment. You'll be emailed your enrolment form / payment link on 8 June 2026. If you haven't received it by then, reach out to admin@theschoolofai.in immediately — seats are limited.

Our last cohort, EAG V3, saw 414 enrolments, and more than 200 students missed out even though they registered earlier, simply because they didn't finish enrolment in time. Don't be one of them.

The School of AI