Software

3 minute read

Did My LoRA Learn Tenacious Style—or Just Memorize Augmented Patterns?

May 7, 2026

In Week 11 Tenacious-Bench, we trained a LoRA adapter on Tenacious-style B2B sales emails using Supervised Fine-Tuning (SFT).
We got a real performance lift: Delta A = +0.263 (p < 0.0001).

But that result exposed a harder question:

Did the adapter learn how Tenacious writes, or just what repeated Tenacious-like samples looked like?

This post answers that at the mechanism level: cross-entropy token-by-token, LoRA gradient flow, and why low-diversity augmentation can make convergence look better than generalization.

1) What SFT cross-entropy actually optimizes
In autoregressive SFT, the model predicts the next token at each step.
Cross-entropy loss measures how much probability mass the model gave the correct next token.

So the objective is:

not “be honest,”
not “be cautious,”
not “be Tenacious,”
but: assign high probability to target tokens in the training distribution.

If your targets consistently reflect Tenacious behavior, style improves indirectly.
But the optimization target is still token prediction.

2) How gradients flow in LoRA when base weights are frozen
For each adapted layer:
W = W0 + BA
W 0is frozen
only A and B are trainable
During backprop, gradients pass through the full forward graph, but updates only changeA/B.
That means LoRA acts as a low-rank steering update on top of a fixed backbone.

Practical interpretation: you are not retraining the model’s full knowledge. You are learning a compact directional adjustment that shifts output tendencies.

3) What your seven target modules imply
You adapted:
attention projections: q_proj, k_proj, v_proj, o_proj
feed-forward projections: gate_proj, up_proj, down_proj
A useful diagnostic lens:

Attention-heavy updates often correlate with better context routing (e.g., weak signal -> interrogative phrasing).
MLP-heavy updates often correlate with lexical/phrase-shape adaptation (which can be desired style—or shortcut memorization).
This is why module-level gradient norms matter. Without them, “it improved” is under-explained.

4) Why low diversity is a gradient problem, not just a datasheet warning
Your datasheet states that 94.3% of training pairs are augmented variants of only 128 originals.
That has direct optimization consequences.

Near-duplicate examples produce highly aligned gradient directions repeatedly.
Cross-entropy rewards those repeated token patterns quickly. Training loss falls. Metrics can rise.

But this can represent two different realities:

Generalizable policy learning (what you want)
Surface-pattern reinforcement (what you fear)
Cross-entropy alone cannot tell which one happened.

5) Why your Delta A is real but not fully sufficient
A statistically strong Delta A means the adapter improved on your evaluation distribution.
It does not automatically prove robust style generalization out-of-family.

The defensible claim is:

“The adapter improved predictive behavior on measured data; generalization vs memorization requires additional diagnostics.”

That is stronger science and better engineering.

6) Minimal diagnostics to separate style learning from memorization
A) Grouped holdout by original family
Do not split augmentation siblings across train/held-out.
Keep all variants of one original together in one split.

Stable performance on grouped holdout -> stronger evidence of true style learning
Large drop -> evidence of augmentation-family memorization
B) Gradient norm breakdown by LoRA module
Log gradient norms for LoRA params and aggregate by:

q/k/v/o
gate/up/down
This doesn’t “prove style” alone, but it makes your mechanism claim concrete: where did training pressure concentrate?

7) Practical conclusion for FDE fine-tuning work
This issue generalizes to any narrow, augmented SFT project (sales writing, summarization, code style, domain formatting):

loss convergence is necessary,
benchmark gain is valuable,
but neither alone proves intended behavior learning.
If you want to claim “learned policy,” add grouped holdout and module-level gradient diagnostics as standard evidence.

Final takeaway
Your LoRA adapter likely learned a useful steering update.
But with heavy augmentation concentration, the safest conclusion is:

“We improved next-token policy on this distribution; we are validating whether that policy generalizes beyond augmentation families.”

That framing is honest, technically grounded, and production-defensible.

How Anthropic’s Mythos has rewritten Firefox’s approach to cybersecurity

May 7, 2026

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Hand-Picked Top-Read Stories

Did My LoRA Learn Tenacious Style—or Just Memorize Augmented Patterns?

How Anthropic’s Mythos has rewritten Firefox’s approach to cybersecurity

How to Fix Screenshot Shortcut Conflicts in Ubuntu 26.04 (Resolute)

Trending Tags

Did My LoRA Learn Tenacious Style—or Just Memorize Augmented Patterns?

Leave a Reply Cancel reply

Previous Post

How Anthropic’s Mythos has rewritten Firefox’s approach to cybersecurity

Did My LoRA Learn Tenacious Style—or Just Memorize Augmented Patterns?

How Anthropic’s Mythos has rewritten Firefox’s approach to cybersecurity

How to Fix Screenshot Shortcut Conflicts in Ubuntu 26.04 (Resolute)

Did My LoRA Learn Tenacious Style—or Just Memorize Augmented Patterns?

Leave a Reply Cancel reply

Previous Post

How Anthropic’s Mythos has rewritten Firefox’s approach to cybersecurity

Related Posts

Stop Creating Everything: The Art of Terraform Data Sources

Placeholder Pledge

TanStack for Beginners: A Complete Guide & Tutorial