Hello Dev community!
I’m noshishi, a apprentice engineer in Tokyo.
This article is about understanding Git from the inside by creating a simple program that add
and commit
.
But it’s a very long story, so I’ll post the development section separately!
Foreword
The starting point is ‘If I could understand git, I could make it?!!’
I took this opportunity to try out a new programming language, so I decided to try Rust
this time. The repository I actually created is My original git nss. The quality of the code isn’t quite there yet and many parts are still incomplete, but you can do a straight line of local development!
If you give me a star, I’ll be happy to fly, and of course I’ll be waiting for your contributions! Feel free to touch this repository any way you like!
Please forgive us for not being able to explain some of the details in this article alone. Also, we use Rust
for development, but Python
for the stage where we uncover Git’s internals!
TOC
-
Git Inside?
- Where is repository
- Object
- Index
-
Analyze Object
- blob
- tree
- commit
- Summary
-
Analyze Index
- Specification
- Index
- Summary
-
Background of Command
- add
- commit
-
Digression
- Deciphering Tree
- HEAD and Branch
- Plumbing commands
- Finally
- What you need
Git Inside
First, we will unpack how Git handles data, based on the official documentation.
The Git command system is very complex.
But, Git data structure is very simple!
Where is repository
A repository is the directory under the control of Git, and the folder .git
in the directory created by init
or clone
is the actual state of the repository.
Let’s put an empty folder called project
under Git’s control.
$ pwd
/home/noshishi/project
$ ls -a
# nothing yet
$ git init
Initialized empty Git repository in /home/noshishi/project/.git/
$ ls -a
.git
This .git
directory consists of the following.
.git
โโโ HEAD
โโโ (index) // Not created by `init`!
โโโ config
*
โโโ objects/
โโโ refs/
โโโ heads/
โโโ tags/
(info)
The path types of Git repositories are difficult to understand at first glance. We have added /
to the directory path so that you can refer to it. Also, we have omitted parts that are not explained in this article.
Object
Git manage versions by file data called objects.
Objects are stored in .git/objects
.
Types
Objects has four types, blob
ใtree
ใcommit
ใtag
.
The contents of each and the corresponding data will be as follows.
-
blob
… File data -
tree
… Directory data -
commit
… Metadata to manage thetree
of the repository -
tag
… Metadata for a specificcommit
* Not explained at this article.
Image with first.txt
in the project
repository
Structure
The Object is FILE DATA, so it has a file name (path)
and the data
stored in it, just like a normal file.
File name (path)
The file name (path) is 40-character string. This is a hash (sha-1
2) of object data.
Actually, the first two are the directory path and the remaining 38 are the file path.
Data
Object data is compressed by zlib
1. The decompressed data consists of two parts: header
and content
. The two elements are then separated by