SanskritKatha: 43 Reviewers and Counting

A few weeks ago I shared the SanskritKatha project publicly — 50,000 AI-generated Sanskrit stories, a feedback platform for human evaluation, and an open call for reviewers.

Here's where things stand.

The Numbers

49,987 stories in the corpus (BalaKatha for ages 4-5, KishoraKatha for ages 14-15)
43 reviewers signed up, 42 active
45 reviews submitted and growing
SLMs trained: 1M, 3M, 10M, 33M parameter models on Modal. Evals done, final training round happening now

The generation side is complete. The work now is evaluation — getting enough human judgment on these stories to know what's good, what's not, and what the models get wrong about Sanskrit.

What Surprised Me

The reviewer motivation.

I expected people to sign up out of curiosity, maybe academic interest. What I'm seeing is different. People are signing up because Sanskrit matters to them. There's no payment, no academic credit, no institutional backing. It's seva — that quiet instinct to serve the language and tradition when someone offers a genuine way to do it.

43 is a small number in absolute terms. For an unfunded, independent project with no marketing budget and no institutional network, it's remarkable. These are real people giving 25 minutes of their time to read AI-generated Sanskrit stories in Devanagari and carefully score them on grammar, coherence, vocabulary, dharmic integration, word usage, and cultural authenticity.

What Reviewers Do

The platform at sanskritkatha.com presents stories in Devanagari. Reviewers score each story on 6 dimensions, 1-5 scale:

Sanskrit grammar — sandhi, vibhakti, verb forms
Vocabulary level — age-appropriate word choices
Story coherence — clear narrative arc
Dharmic integration — does the principle feel lived or tacked on?
Word usage — required vocabulary fits organically
Cultural authenticity — Bharatiya settings, names, customs

Blind review — reviewers never know which LLM generated the story. 3 reviewers per story minimum. This is the Mechanical Turk methodology adapted for a volunteer context: consensus model, trust scoring, inter-annotator agreement tracking.

Some stories are genuinely good. Some have amusing errors — a model will nail the grammar and completely botch the cultural context, or get the dharmic principle right but produce a story that reads like a textbook exercise. Either way, every review teaches us something.

The SLMs

While evaluation runs, the small language models are being trained. Four sizes: 1M, 3M, 10M, and 33M parameters, following the TinyStories approach (Eldan & Li, 2023). Trained on Modal. Earlier eval batches informed the training, and the final round is in progress.

The hypothesis remains: small models trained on well-structured synthetic data can produce coherent, age-appropriate Sanskrit stories. The human evaluation will tell us whether that hypothesis holds, and where it breaks down.

What I Need

More reviewers. Specifically:

Sanskrit teachers — their sense of age-appropriateness and pedagogical quality is irreplaceable
Sanskrit students at any level — beginner perspectives catch different issues than expert perspectives
Scholars working with Sanskrit texts
Anyone who reads Devanagari comfortably

Each batch is 10 stories, about 25 minutes. Review at your own pace. No deadline pressure.

Reviewers whose trust score exceeds 0.5 (based on review consistency and calibration accuracy) get named acknowledgement in research publications. If you'd rather stay anonymous, that's fine too.

If you know Sanskrit teachers, pathashalas, or student groups who might be interested, please share this. The more diverse the reviewer base, the stronger the evaluation. That's not a platitude — in annotation research, evaluator diversity directly determines the quality of the ground truth.

What's Ahead

Complete the human evaluation (ongoing, need more reviews)
Finish final SLM training round
Cross-model analysis — how do Gemini and Claude compare on Sanskrit generation?
Paper draft
Dataset release on HuggingFace

This is independent, community-driven Sanskrit AI research. No gatekeepers. Just the work.

sanskritkatha.com

SanskritKatha: 43 Reviewers and Counting

The Numbers

What Surprised Me

What Reviewers Do

The SLMs

What I Need

What's Ahead

Tagged

Stay Connected

Continue Reading

The Mula-Bhashya Problem: Teaching AI About Textual Authority

When AI Sounds Dharmic But Doesn't Think Dharmic

Why Sanskrit Breaks Your Tokenizer (And What That Tells Us About Multilingual AI)