Software

4 minute read

⚖️ LexiMini: How I Built an AI Legal Assistant for India — From Scratch, on a TPU

May 4, 2026

Fine-tuning Gemma 3 4B on Indian Law Data · MaxText · Tunix Distillation · HuggingFace

This blog documents the approach behind a TPU-based LLM pipeline I’m currently building.

The focus here is not just on results, but on the full system design — model behavior, distillation strategy, and the practical challenges of working with TPUs in a real setup.

Instead of presenting a polished end-state, I’m breaking down the actual process: how the system is structured, what decisions were made, and how different components evolved over time.

You’ll find everything here — from setup to experimentation — in a way that reflects real-world building, not just ideal scenarios.

Why LexiMini?

India has over 1.4 billion people.

Think about a woman in a village in rural India. Her landlord is threatening to throw her out. She does not know what a tenant agreement means legally. She does not know that she has rights. There is no lawyer in her village. The nearest district court is two hours away. And even if she gets there — she cannot afford to pay someone to explain what is written in that contract.

Or think about a woman who took a small loan from a local moneylender. She does not know what interest rate is legally allowed. She does not know what she can do when the terms change overnight. She does not know who to go to.

These are not rare cases. This is everyday life for millions of women across rural India — where both education and internet access are still limited, where legal literacy is almost zero, and where the gap between knowing your rights and losing everything is just one piece of paper you could not understand.

Normal people face these situations every single day — tenant contracts they cannot read, loans with terms they did not understand, workplace harassment they do not know is illegal, property disputes they have no idea how to fight. These are ordinary problems. But without legal knowledge, ordinary problems become life-altering ones.

When they do go online to search for answers, they find one of two things:

Dense legal text that was written by lawyers, for lawyers.

Or generic AI answers from models that were never trained on Indian law and confidently give wrong information — which in a legal situation, can cause real harm.

I wanted to build something different. A small, focused AI that actually knows Indian law — tenant rights, loan agreements, women’s legal protections, IPC sections, bail procedures, court processes. Not a general model pretending to know. A specialized one, built for the people who need it most.

That is LexiMini.

The Full Pipeline

Before I get into the steps, here is the complete picture of what I built:

GitHub Repo          TPU VM               GCS Bucket
(Indian Law Data) →  (MaxText Training) → (Checkpoints)
                                               ↓
                                     HuggingFace Model
                                     (Gemma 3 4B Fine-tuned)
                                               ↓
                                     Tunix Distillation
                                     (4B Teacher → 1B Student)
                                               ↓
                                     LexiMini Final
                                     (Lightweight, Deployable)

There are four phases:

Phase 1 — Set up the TPU VM and install everything
Phase 2 — Prepare and upload the Indian law dataset
Phase 3 — Fine-tune Gemma 3 4B using MaxText
Phase 4 — Distill the 4B model down to 1B using Tunix

Let me walk through each one.

Tools I Used

Tool What it does Google TPU VM (v6e) Hardware TPU chips for fast training MaxText Google’s open-source JAX-based LLM training framework Tunix Google DeepMind’s post-training + distillation framework, built on JAX HuggingFace Hub Model storage and deployment Google Cloud Storage (GCS) Stores data, weights, and checkpoints during training

One important note: Everything in this guide runs directly on the TPU VM terminal. After you SSH in, you never need to go back to your local machine.

Phase 1: Setting Up the TPU VM

The Setup Flow

Create TPU VM  →  SSH In  →  Install Packages  →  Clone MaxText  →  Install JAX

Step 1 — Create the TPU VM

Let me be honest about something before you run this command.

A TPU v6 costs around $12–$15 per hour on Google Cloud. Fine-tuning a 4B model for 5000 steps takes several hours. Distillation adds even more. The total compute cost of this project, paid out of pocket, would be completely out of reach for most independent researchers or students.

I was able to do this because I received Google Cloud credits through the TPUSprint program as a Google Developer Expert (GDE) in AI/ML.

Google Cloud credits are provided for this project. #TPUSprint

I started by setting up everything from scratch on Google Cloud.

First, I created a new project inside Google Cloud Platform. Once the project was ready, I landed on the main dashboard.

From there, I navigated to Compute Engine → TPUs. This is where all TPU resources are managed.

Since there were no TPUs created yet, the page was empty. I clicked on Create TPU to start the setup.

On the TPU creation screen:

I gave the TPU a name (node-1)
Selected the zone (us-central1-a) — this is important because TPU availability depends on the zone
Chose the TPU type (v5litepod-1 in this case, based on availability)
Selected the TPU software version (v2-alpha-tpuv5-lite)

I kept the rest of the settings as default and clicked Create.

Once the TPU node was created, it appeared in the TPU list and was ready to use.

Now click over ssh to open Terminal:

For reference, the same setup can also be done using CLI:

gcloud compute tpus tpu-vm create leximini-tpu 
  --zone=us-central2-b 
  --accelerator-type=v4-8 
  --version=tpu-ubuntu2204-base

One key learning here: TPU configuration is highly dependent on availability. You might need to try different zones or TPU types before finding one that works.

All commands shown below were executed directly inside the TPU VM terminal after connecting via SSH.

Step 3 — Install System Packages

sudo apt-get update && sudo apt-get install -y git python3-pip wget curl

Step 4 — Clone MaxText and Install JAX

MaxText is Google’s training framework for large language models. It is built on JAX and designed to run natively on TPU — which is exactly what we need here.

cd ~
git clone https://github.com/google/maxtext.git
cd maxtext
pip install -r requirements.txt
pip install jax[tpu] -f https://storage.googleapis.com/jax-releases/libtpu_releases.html

JAX TPU installation takes around 5 minutes. Let it finish completely before moving on.

Phase 2: Preparing the Indian Law Dataset

The Data Flow

Raw Legal Text      →    Python Script     →    GCS Bucket
(.txt / .csv files)      (convert to JSONL)     (ready for MaxText)

This phase took me longer than I expected — not because of the code, but because of the data itself.

I curated Indian legal text across multiple categories: IPC sections, CrPC procedures, constitutional articles, and court judgment excerpts. The curation matters. MaxText will train on exactly what you give it. Garbage in, garbage out — this is especially true for legal text where precision is everything.

Step 5 — Clone Your Dataset Repo on the TPU VM

cd ~
git clone https://github.com/YOUR_USERNAME/YOUR_REPO.git
cd YOUR_REPO
ls

https://github.com/geeta-gwalior/Leximini_V1

Step 6 — Convert Data to JSONL Format

MaxText needs training data in JSONL format — one JSON object per line, each with a text key. Here is the conversion script:

The system was developed using real legal datasets. However, due to confidentiality considerations, I have included only synthetic data in the public repository.

cat > prepare_data.py << 'EOF'
import json

cat > prepare_data.py << 'EOF'
import json

input_file  = 'indian_law.txt'   # change as needed
output_file = 'train_data.jsonl'

count = 0

with open(input_file, 'r', encoding='utf-8') as f_in, 
     open(output_file, 'w', encoding='utf-8') as f_out:

for line in f_in:
        line = line.strip()

# skip empty or very short lines
        if not line or len(line) < 20:
            continue

record = {"text": line}
        f_out.write(json.dumps(record, ensure_ascii=False) + "n")
        count += 1

print(f"Total records written: {count}")
EOF

python3 prepare_data.py

This is a simplified version of the preprocessing step. In practice, additional structuring and formatting were applied to improve training quality.

Step 7 — Create a GCS Bucket and Upload Data

MaxText does not read from local disk during training. It reads from Google Cloud Storage. So everything goes to GCS first:

# Create the bucket (pick a globally unique name)
gsutil mb -l us-central2 gs://YOUR-BUCKET-NAME

# Upload training data
gsutil cp ~/YOUR_REPO/data/train_data.jsonl gs://YOUR-BUCKET-NAME/data/train_data.jsonl

# Verify
gsutil ls gs://YOUR-BUCKET-NAME/data/

Phase 3: Fine-Tuning Gemma 3 4B with MaxText

The Training Flow

HF Weights         GCS Upload         MaxText Train       Checkpoint
(Gemma 3 4B)   →  (store weights) →  (5000 steps)    →  (saved to GCS)

This is the heart of the project. We take Gemma 3 4B — Google’s open-source model — and teach it everything about Indian law.

I ran this training step twice. The first time I misconfigured the checkpoint path and lost hours of compute. Small mistakes are expensive on TPU. Double-check your GCS paths before you hit enter.

Step 8 — Download Gemma 3 4B Weights from HuggingFace

You need a HuggingFace account and an access token. Get it from: huggingface.co → Settings → Access Tokens → New token.

pip install huggingface_hub

python3 << 'EOF'
from huggingface_hub import snapshot_download

snapshot_download(
    repo_id='google/gemma3-4b-it',
    local_dir='/home/USER/gemma3-weights',   # replace USER with your username (run: whoami)
    token='hf_YOUR_TOKEN_HERE'
)
print('Download complete!')
EOF

Step 9 — Upload Weights to GCS

# The -m flag enables parallel upload — do not skip it for large files
gsutil -m cp -r ~/gemma3-weights gs://YOUR-BUCKET-NAME/weights/gemma3-4b/

# Verify
gsutil ls gs://YOUR-BUCKET-NAME/weights/gemma3-4b/

Step 10 — Run Fine-Tuning

Everything comes together here. One command, and MaxText takes over:

cd ~/maxtext

cd ~/maxtext

python3 MaxText/train.py MaxText/configs/base.yml 
  base_output_directory=gs://YOUR-BUCKET-NAME/output/ 
  load_parameters_path=gs://YOUR-BUCKET-NAME/weights/gemma3-4b/ 
  dataset_path=gs://YOUR-BUCKET-NAME/data/ 
  model_name=gemma3-4b 
  steps=5000 
  run_name=leximini-finetune

Key training parameters such as learning rate, batch size, and warmup strategy were tuned conservatively to preserve pretrained knowledge during fine-tuning. Small changes in these values had a noticeable impact on stability and output quality.

Checkpoints are saved automatically every 500 steps to gs://YOUR-BUCKET-NAME/output/leximini-finetune/checkpoints/

Watch the loss value as it prints. When it starts going down — that moment is genuinely satisfying. The model is learning Indian law in real time.

Step 11 — Convert Checkpoint to HuggingFace Format

After training, the checkpoint is in MaxText’s native format. We need to convert it to HuggingFace format so it can be used with the transformers library:

cd ~/maxtext

python3 MaxText/convert_gpt_maxtext_to_hf.py 
  --base_model_path gs://YOUR-BUCKET-NAME/weights/gemma3-4b/ 
  --maxtext_model_path gs://YOUR-BUCKET-NAME/output/leximini-finetune/checkpoints/5000/ 
  --output_path ~/leximini-4b-hf 
  --model_size 4b

ls ~/leximini-4b-hf/

Step 12 — Push the Fine-Tuned 4B Model to HuggingFace

python3 << 'EOF'
from huggingface_hub import HfApi

api = HfApi()

api.create_repo(
    repo_id='YOUR_HF_USERNAME/leximini-4b',
    token='hf_YOUR_TOKEN_HERE',
    private=False
)

api.upload_folder(
    folder_path='/home/USER/leximini-4b-hf',
    repo_id='YOUR_HF_USERNAME/leximini-4b',
    repo_type='model',
    token='hf_YOUR_TOKEN_HERE'
)
print('4B model uploaded!')
EOF

Phase 4: Distilling 4B → 1B with Tunix

The Distillation Flow

LexiMini 4B         Tunix Framework        Gemma 3 1B
(Teacher Model)  →  (Distillation)      →  (Student Model)
                                               ↓
                                        LexiMini 1B
                                     (4x smaller, nearly
                                      as knowledgeable)

Fine-tuning is done. But a 4B parameter model is heavy. It is slow to serve and expensive to run at scale. I wanted something leaner — something that could actually reach more people, including on limited hardware.

That is where Tunix comes in.

What is Knowledge Distillation?

Think of it this way. Imagine a senior advocate who has practiced Indian law for 20 years. Instead of making a junior lawyer read every case file from scratch, the senior sits with them and explains the reasoning — the why behind each judgment, not just the what.

The junior lawyer learns faster, and ends up nearly as capable — in a fraction of the time.

That is distillation. The 4B model (teacher) guides the 1B model (student) to reproduce the same legal understanding — but in a much smaller package.

The key insight is this: the student does not just learn from raw text labels. It learns from the teacher’s output probability distribution — which is far richer. When the teacher says “Section 302 relates to murder”, it doesn’t just output that as a yes/no. It outputs a distribution across thousands of tokens — and that distribution carries nuanced information about related concepts, alternate phrasings, confidence levels. The student absorbs all of that.

What is Tunix?

Tunix is an open-source post-training framework built by Google DeepMind on top of JAX. It handles things like knowledge distillation, RLHF, and model alignment — the work that happens after initial pretraining. It runs natively on TPU and integrates cleanly with JAX checkpoints.

Step 13 — Install Tunix

cd ~
git clone https://github.com/google-deepmind/tunix.git
cd tunix
pip install -e .

Step 14 — Download Gemma 3 1B Base Weights

The student model starts from Gemma 3 1B base weights:

python3 << 'EOF'
from huggingface_hub import snapshot_download

snapshot_download(
    repo_id='google/gemma-3-1b',
    local_dir='/home/USER/gemma3-1b-weights',
    token='hf_YOUR_TOKEN_HERE'
)
print('1B base weights ready!')
EOF

# Upload to GCS
gsutil -m cp -r ~/gemma3-1b-weights gs://YOUR-BUCKET-NAME/weights/gemma3-1b/

Step 15 — Create the Distillation Config

cat > ~/tunix/configs/leximini_distill.py << 'EOF'
teacher_model_path  = 'gs://YOUR-BUCKET-NAME/output/leximini-finetune/checkpoints/5000/'
student_model_path  = 'gs://YOUR-BUCKET-NAME/weights/gemma3-1b/'
output_path         = 'gs://YOUR-BUCKET-NAME/distilled/leximini-1b/'

cat > ~/tunix/configs/leximini_distill.py << 'EOF'

teacher_model_path  = 'gs://YOUR-BUCKET-NAME/output/leximini-finetune/checkpoints/5000/'
student_model_path  = 'gs://YOUR-BUCKET-NAME/weights/gemma3-1b/'
output_path         = 'gs://YOUR-BUCKET-NAME/distilled/leximini-1b/'

teacher_model_name  = 'gemma3-4b'
student_model_name  = 'gemma3-1b'

train_data_path     = 'gs://YOUR-BUCKET-NAME/data/train_data.jsonl'

# training parameters (simplified for clarity)
steps               = 3000
max_target_length   = 1024
temperature         = 3.0 
alpha               = 0.7

dtype               = 'bfloat16'

EOF

Two parameters that matter most:

temperature = 2.0 — This softens the teacher’s output distribution. At temperature 1.0, the teacher gives sharp, confident predictions. At 2.0, probability spreads across more tokens — giving the student richer, more nuanced signals to learn from. This is one of the core insights of the original knowledge distillation paper by Hinton et al.

alpha = 0.7–70% of the loss comes from matching the teacher. 30% comes from the raw training data. This balance keeps the student grounded in real legal text while absorbing the teacher’s reasoning.

In practice, finding the right balance between these factors required experimentation, as different settings can significantly impact how well the student model retains accuracy.

Step 16 — Run Distillation

cd ~/tunix

python3 tunix/distillation/distill.py 
  --config configs/leximini_distill.py 
  --teacher_model_path gs://YOUR-BUCKET-NAME/output/leximini-finetune/checkpoints/5000/ 
  --student_model_path gs://YOUR-BUCKET-NAME/weights/gemma3-1b/ 
  --output_path gs://YOUR-BUCKET-NAME/distilled/leximini-1b/ 
  --train_data gs://YOUR-BUCKET-NAME/data/train_data.jsonl 
  --steps 3000 
  --learning_rate 5e-5 
  --per_device_batch_size 4 
  --temperature 2.0 
  --alpha 0.7 
  --dtype bfloat16 
  --run_name leximini-distilled

Step 17 — Convert Distilled Checkpoint and Push to HuggingFace

cd ~/maxtext

python3 MaxText/convert_gpt_maxtext_to_hf.py 
  --base_model_path gs://YOUR-BUCKET-NAME/weights/gemma3-1b/ 
  --maxtext_model_path gs://YOUR-BUCKET-NAME/distilled/leximini-1b/checkpoints/3000/ 
  --output_path ~/leximini-1b-hf 
  --model_size 1b

ls ~/leximini-1b-hf/

python3 << 'EOF'
from huggingface_hub import HfApi

api = HfApi()

api.create_repo(
    repo_id='YOUR_HF_USERNAME/leximini-1b-final',
    token='hf_YOUR_TOKEN_HERE',
    private=False
)

api.upload_folder(
    folder_path='/home/USER/leximini-1b-hf',
    repo_id='YOUR_HF_USERNAME/leximini-1b-final',
    repo_type='model',
    token='hf_YOUR_TOKEN_HERE'
)
print('LexiMini is live!')
EOF

What I Am Still Working On

I want to be transparent about where this project currently stands.

The fine-tuned model exists and is running. But it still hallucinates — especially on less common legal sections. It sometimes generates plausible-sounding IPC sections that do not exist. This is a known problem with language models trained on limited domain data, and I am actively working on it.

The areas I am focusing on right now:

More data — The current dataset covers core IPC and constitutional law well. Consumer protection, family law, property law, and state-specific laws need more coverage.
Better data formatting — Moving from raw paragraph dumps to structured question-answer pairs, which gives the model clearer learning signal.
Distillation tuning — The 4B → 1B pipeline is implemented but still being refined. The 1B student needs more iterations to fully absorb the teacher’s legal reasoning.

The GitHub and final model links will be added here when the project reaches a stable, reliable state. I would rather share something that works properly than something that looks impressive but misleads people about their legal rights.

What I Learned While Building This

This wasn’t my first time working with fine-tuning. I’ve previously used approaches like LoRA, QLoRA, and PEFT — they are reliable and produce strong results. But they are also time-intensive, especially when working with larger datasets and multiple iterations.

What stood out to me in this project was the shift to JAX-based training using MaxText. The speed difference was significant. Workloads that would typically take hours could be executed in minutes. TPU performance is obviously a big factor here, but the tooling itself also plays a major role.

At the same time, this came with its own challenges.

MaxText doesn’t have the kind of beginner-friendly ecosystem or tutorials that many PyTorch-based workflows have. There isn’t a single place where everything is explained clearly. I had to rely on documentation, experimentation, and AI-assisted exploration to understand how things actually work under the hood.

Distillation was another area where things were not straightforward. While the pipeline works, maintaining accuracy is still a challenge. I’m actively experimenting to find the right balance between compression and performance, especially for a domain like legal text where precision matters.

Another important shift in thinking for me: I don’t just want to train models — I want to move toward reasoning-focused training. That is still a work in progress, and I’m continuing to explore how to integrate it effectively into this pipeline.

Overall, this project wasn’t about learning basics. It was about navigating gaps — missing documentation, unclear workflows, and making practical decisions in a system that is still evolving.

The Bigger Picture

This project is not about building for developers — it’s about building for real users.

Someone dealing with a tenant dispute or a confusing loan agreement is not going to use a notebook or call an API. They need something simple, accessible, and reliable — something that works on limited internet, speaks plain language, and understands Indian law beyond surface-level fluency.

That’s why distillation matters. A 4B model is powerful, but a well-optimized 1B model can run on affordable hardware, load faster, and reach far more people. The goal was never scale for its own sake — it was usability.

LexiMini is still evolving. Hallucination is a real challenge, and I’m actively working on improving reliability. The dataset is also expanding, especially in areas that impact everyday lives — tenant law, microfinance, women’s rights, domestic violence protections, inheritance, and local governance.

It’s not finished — but it’s real, it’s running, and it’s being built with a clear purpose: making Indian law accessible to those who need it most.

This article will be updated as the project progresses. Model and GitHub links will be added upon stable release.

Built with MaxText · Tunix · JAX · HuggingFace

Acknowledgment: Google Cloud credits are provided for this project. #TPUSprint

⚖️ LexiMini: How I Built an AI Legal Assistant for India — From Scratch, on a TPU was originally published in Google Developer Experts on Medium, where people are continuing the conversation by highlighting and responding to this story.

WordPress Performance Optimization — A Developer’s Guide by Riad Hasan

May 4, 2026

Software

Serve and Inference Gemma 4 on TPU

May 4, 2026

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

⚖️ LexiMini: How I Built an AI Legal Assistant for India — From Scratch, on a TPU

Why LexiMini?

The Full Pipeline

Tools I Used

Phase 1: Setting Up the TPU VM

The Setup Flow

Step 1 — Create the TPU VM

Step 4 — Clone MaxText and Install JAX

Phase 2: Preparing the Indian Law Dataset

The Data Flow

Step 5 — Clone Your Dataset Repo on the TPU VM

Step 6 — Convert Data to JSONL Format

Step 7 — Create a GCS Bucket and Upload Data

Phase 3: Fine-Tuning Gemma 3 4B with MaxText

The Training Flow

Step 8 — Download Gemma 3 4B Weights from HuggingFace

Step 9 — Upload Weights to GCS

Step 10 — Run Fine-Tuning

Step 11 — Convert Checkpoint to HuggingFace Format

Step 12 — Push the Fine-Tuned 4B Model to HuggingFace

Phase 4: Distilling 4B → 1B with Tunix

The Distillation Flow

What is Knowledge Distillation?

What is Tunix?

Step 13 — Install Tunix

Step 14 — Download Gemma 3 1B Base Weights

Step 15 — Create the Distillation Config

Step 16 — Run Distillation

Step 17 — Convert Distilled Checkpoint and Push to HuggingFace

What I Am Still Working On

What I Learned While Building This

The Bigger Picture

This article will be updated as the project progresses. Model and GitHub links will be added upon stable release.

Leave a Reply Cancel reply

Previous Post

Next Post

Related Posts