Software

1 minute read

Understanding Transformers Part 8: Shared Weights in Self-Attention

April 16, 2026

In the previous article, we started calculating the self-attention values.

Let’s now calculate the self-attention values for the word “go”.

We do not need to recalculate the keys and values.

Instead, we only need to create the query that represents the word “go”, and then perform the same calculations as before.

After completing the calculations, we get the self-attention values for “go” as:

2.5 and -2.1

Key Observations About Self-Attention

The weights used to calculate queries are the same for both “Let’s” and “go”.
- This means that regardless of the number of words, we use one shared set of weights.
- Similarly, the same sets of weights are reused to calculate keys and values for every input word.
- No matter how many words are given as input, the transformer reuses the same weights for queries, keys, and values.
We do not need to compute queries, keys, and values sequentially.
- All of them can be computed at the same time.
- This allows transformers to take advantage of parallel computation, making them very efficient.

We will continue building our transformer step by step in the next article.

Looking for an easier way to install tools, libraries, or entire repositories?
Try Installerpedia: a community-driven, structured installation platform that lets you install almost anything with minimal hassle and clear, reliable guidance.

Just run:

ipm install repo-name

… and you’re done! 🚀

🔗 Explore Installerpedia here

Upscale AI in talks to raise at $2B valuation, says report

April 16, 2026

Quality Assurance

VIDEO PODCAST | Building A Resilient Supply Chain Through Quality

April 16, 2026

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Hand-Picked Top-Read Stories

Flopping

Monorepo vs polyrepo

Lovable reportedly in talks to double its valuation to $13.2B

Trending Tags

Understanding Transformers Part 8: Shared Weights in Self-Attention

Key Observations About Self-Attention

Leave a Reply Cancel reply

Previous Post

Upscale AI in talks to raise at $2B valuation, says report

Next Post

VIDEO PODCAST | Building A Resilient Supply Chain Through Quality

Understanding Transformers Part 8: Shared Weights in Self-Attention

Key Observations About Self-Attention

Leave a Reply Cancel reply

Previous Post

Next Post

Related Posts