Software

2 minute read

I Fine-Tuned a 270M Model on My Laptop (Full Fine-Tuning, From Scratch)

June 21, 2026

Series — Fine-Tuning, Smallest to Largest (same task, three techniques, smallest model to largest):

Full Fine-Tuning (270M) ← you are here

LoRA (1.5B)

QLoRA (7B)

If the small model worked, why go bigger?

I wanted to actually understand fine-tuning — not run a tutorial and nod along. So I gave myself a constraint: same task, three techniques, smallest model to largest. Full fine-tuning, then LoRA, then QLoRA. Hold the task fixed and the only variable is the method.

This first post is full fine-tuning — the most powerful and most expensive option: update every weight in the model.

The task

Banking77: ~13,000 real bank customer-support messages, 77 intents like card_arrival, lost_or_stolen_card, exchange_rate. The model reads a message and names the intent.

The model: deliberately tiny

I picked Gemma 3, 270M parameters — small enough to fully fine-tune on a laptop (Apple Silicon / MPS). That’s intentional: full fine-tuning stores gradients and optimizer state for every parameter, roughly 4× the model’s size in memory. I wanted to feel that, not read about it.

One design decision: generate the label, don’t classify it

The obvious approach is to bolt a 77-way classification head onto the model. I didn’t. Instead I had the model generate the intent as text — literally output card_arrival. Why? Because that’s the same shape as instruction-tuning, so the later LoRA/QLoRA projects build naturally on this one.

The key detail is masking the loss so the model is graded only on the label tokens, not the prompt:

# build "prompt + label", but set prompt tokens to -100 so the loss ignores them
prompt_ids = tokenizer(prompt, add_special_tokens=False)["input_ids"]
target_ids = tokenizer(" " + label_name + tokenizer.eos_token,
                       add_special_tokens=False)["input_ids"]
input_ids = prompt_ids + target_ids
labels    = [-100] * len(prompt_ids) + target_ids   # only the label is graded

If you skip that masking, the model spends its capacity learning to reproduce the prompt instead of the answer.

The thing that surprised me: full FT is fragile

Because you’re updating all the pretrained weights, a too-high learning rate shreds the model’s existing knowledge. I used 5e-5 and it trained cleanly. Bumping to 2e-4 destabilized it. The training config is otherwise unremarkable — and that’s the point:

TrainingArguments(
    num_train_epochs=3,
    per_device_train_batch_size=16,
    learning_rate=5e-5,            # small, on purpose
    lr_scheduler_type="cosine",
    bf16=False, fp16=False,        # fp32 on MPS for stability
)

(The later projects freeze the base, which is exactly why they can tolerate a much higher learning rate — there’s no fragile pretrained knowledge to wreck.)

Result

~96% on the common intents. A near-perfect diagonal confusion matrix. A 270M model, fully fine-tuned on a laptop, nailing the task.

The one persistent slip: it confused card_arrival with card_delivery_estimate. Keep that in mind — it shows up in every project in this series, and the reason why is the punchline of Part 4.

What’s next

In Part 2, I take a model 5× bigger and train less than 1% of it — and get the same accuracy. That’s LoRA.

📓 Full runnable notebook on Kaggle: https://www.kaggle.com/code/sumannath88/01-full-finetune-gemma270m

Built with PyTorch + Hugging Face Transformers. Questions or corrections welcome in the comments.

1.4.10 Planner Hook: When It Fires, How to Use It

June 21, 2026

AI - Artificial-Intelligence

Beyond Siri: Here are the practical AI features coming to your iPhone in iOS 27

June 21, 2026

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Hand-Picked Top-Read Stories

Sampling Smarter, Not Less

How to Audit Hidden Reminders and Context Usage in Claude Code Logs

ISO 9001:2026 FDIS: Why Quality Management is becoming a strategic business function

Trending Tags

I Fine-Tuned a 270M Model on My Laptop (Full Fine-Tuning, From Scratch)

The task

The model: deliberately tiny

One design decision: generate the label, don’t classify it

The thing that surprised me: full FT is fragile

Result

What’s next

Leave a Reply Cancel reply

Previous Post

1.4.10 Planner Hook: When It Fires, How to Use It

Next Post

Beyond Siri: Here are the practical AI features coming to your iPhone in iOS 27

I Fine-Tuned a 270M Model on My Laptop (Full Fine-Tuning, From Scratch)

The task

The model: deliberately tiny

One design decision: generate the label, don’t classify it

The thing that surprised me: full FT is fragile

Result

What’s next

Leave a Reply Cancel reply

Previous Post

Next Post

Related Posts