The School of AI, Bangalore Demonstrates India's Frontier-Scale AI Capability with LightningLM, a 120-Billion-Parameter Language Model

ANI
08 Jun 2026

HT Syndication

Bangalore (Karnataka) [India], June 8: The School of AI, Bangalore has built and pre-trained LightningLM, a 120-billion-parameter large language model, demonstrating that frontier-scale AI models can now be designed, trained, and scaled within India, outside the handful of global labs and well-funded national programs that have so far held this capability. Accompanying it are publicly released models and open research papers on arXiv, introducing original methods typically associated with the world's most advanced AI labs.

This is fundamentally a systems and engineering milestone, and the distinction matters: the lasting achievement is not the model but the pipeline that produced it. Training a frontier-scale model once is hard; building the reusable infrastructure to train such models repeatedly, and to scale them, is the capability that only a handful of global labs possess end to end. LightningLM is the proof that the pipeline works. The School of AI developed that entire stack in-house: the training infrastructure, orchestration pipeline, data curriculum, India-focused tokenizer, and a configurable expert-divergence schedule.

The School of AI is candid about the present stage. LightningLM has been pre-trained to 120 billion parameters as part of an ongoing run, with sustained loss reduction across each growth stage; work on Indic generation is underway. Pre-training remains the hardest stage to execute, and very few teams in India have carried it to this scale.

Rather than pursuing an extremely expensive 120-billion-parameter model from scratch, The School of AI adopted a progressive growth strategy: beginning with a small seed model and expanding it incrementally to full scale, while keeping training stable throughout each expansion. Such models conventionally require well over a hundred high-end GPUs and large teams; The School of AI achieved it on a single 8-GPU node, where preventing instability during growth and fitting an enormous model into a constrained memory footprint were problems with no established playbook. The final architecture routes computation across 460 expert networks, and stabilizing expert utilization was among the most technically demanding aspects of the project.

LightningLM was trained on approximately 100 billion tokens selected from a curated 1-trillion-token corpus, aggressively filtered for language and category fertility, deduplication, and quality, fed through a curriculum that increases in complexity over time. At every stage, Indic content was guaranteed to constitute at least 25 percent of each training batch. Growth-based training is known to reach comparable loss with substantially fewer tokens than from-scratch training at the same scale, a property the curriculum design exploits.

Pre-training to this stage required approximately 40 to 50 days and roughly $15,000 in compute, with the full training run on this stack projected at approximately $100,000. By conventional standards, a from-scratch run for a model of this category costs approximately $1-2 million per trillion training tokens, a five-to-ten-times efficiency gain that comes from method and systems innovation, not from reduced training scope.

The three accompanying research papers extend the contribution beyond the model itself.

- The first presents BrahmicTokenizer-131K, a custom India-focused tokenizer. AI models process text in units called tokens, and for Indian scripts most tokenizers break words into far more tokens than for English, making models slower, costlier, and weaker in Indian languages. At the same vocabulary size, BrahmicTokenizer-131K produces 26.7 percent fewer tokens on Indic text overall and more than four times fewer on Odia, while matching or beating leading tokenizers on English, code, and math, with wins of up to 14 percent on standard coding and math benchmarks.

- The second paper, on Kronecker embeddings, redesigns the component that converts words into numbers, normally a large table costing hundreds of millions of parameters. The School of AI replaces it with a deterministic computed structure that removes 91 to 94 percent of those input-side parameters and ships as megabytes instead of gigabytes, validated in controlled studies and deployed in the model.

- The third paper, Reversible Foundations: Training a 120B Sparse MoE Through State-Preserving Scaling, documents the core systems contribution: how a model can be grown from a small seed to full frontier-scale while preserving its learned state at each expansion, training stably on constrained hardware where conventional approaches require vastly larger clusters.

Foundational method work of this kind typically emerges from the largest global research labs. These contributions originated at The School of AI, Bangalore, released openly, standing alongside India's well-funded AI programs from the grassroots. The work was built natively for Brahmic scripts and Indian language realities, and it represents sovereign capability: while the knowledge to train frontier-scale AI stays concentrated in a few global institutions, others remain consumers rather than producers.

The deeper significance is that the road now exists. Building the pipeline is like laying the Autobahn before driving the high-speed car: the far harder, more durable work is the infrastructure itself. With that foundation in place, the question shifts from whether frontier-scale training is possible in India to what to train next, and how to measure it against the best in the world.

LightningLM was built as part of ERA V4, a cohort of more than 300 students, and as part of a commitment to open source, all models, code, papers and strategies are made available at www.lightninglm.theschoolofai.in.

(ADVERTORIAL DISCLAIMER: The above press release has been provided by HT Syndication. ANI will not be responsible in any way for the content of the same.)