Unlocking the Hidden Mechanisms of GitOps
Written on
Chapter 1: Understanding Git Operations
In the realm of Git, I averaged over five commits daily last year, not accounting for numerous CLI commands like git pull, git checkout, git fetch, and git add. This could easily escalate the figure into the tens of thousands. My connection with Git is profound.
Just as we nurture friendships by getting to know others, we must familiarize ourselves with tools to harness their full potential. What processes underpin Git's operations? Why is it vital to differentiate between git rebase and git merge? By posing these questions and reflecting on my answers—despite their imperfections—I am on a journey to rediscover Git and uncover its "hidden mechanisms."
Git Workspaces
Git projects comprise three main areas: the working directory, the staging area, and the Git repository.
Git Repository
Let’s begin with the Git repository, starting with git init and creating some files. The structure of directories and subdirectories within .git can be outlined as follows:
What do these directories and files represent?
- HEAD: Indicates the current commit.
- config: Contains your repository settings, including remote URLs, email, and username, which can be configured via git config.
- description: Offers a brief description of the repository.
- hooks: Includes various scripts that Git runs before or after commits, rebases, or pulls. You can create a pre-push hook to check code before pushing.
- index: Represents the staging area, stored as a binary file.
- info: Contains exclude records for files and directories you wish to ignore locally, as opposed to using .gitignore.
- objects: This is where all Git objects are stored. The folder names correspond to the first two characters of the object's SHA1 hash, while the object filenames include the remaining 38 bits.
- refs: Typically contains three subdirectories: heads, remotes, and tags. The files within heads identify the current commit that each branch points to.
Working Directory
When you check the status of files using git status, the system prompts you about modifications that haven’t been recorded by Git. If needed, you can revert all changes to the HEAD commit with git checkout.
It's crucial to understand the two distinct file states: Untracked and Modified. Here’s how they differ:
- Untracked: Files not recorded in .git, which can be added to Git management with git add.
- Modified: Files whose content differs from those in .git/objects, which can be reverted to their original state using git checkout or git restore.
Staging Area
Upon executing git add, files are stored in .git/objects, but the information in HEAD and refs will remain unchanged until a commit is made. The staging area serves to hold related commits prior to execution, explaining the role of index files. The command ls-files allows us to view the contents of this area.
It's worth noting that the git stash command stores information not in the index but in the .git/refs/stash file.
Internal Git Logic
After running git add, you’ll discover that the directories in .git/objects cannot directly view the files as they are stored as binary data. To facilitate binary file reading, Git offers the cat-file command, with -t displaying the type (blob) and -p for content.
What is a blob?
A blob is a Binary Large Object, representing the most fundamental element among Git object types. It solely stores content, obtaining the filename after hashing the file’s contents.
To simulate, use the following command. Remember to pay attention to the trailing n in the file.
Creating a Commit
Let’s create a commit and observe the objects in the directories. Executing the cat-file command reveals two new files.
A commit generates two objects in two directories. One is a tree object that holds a snapshot of the current directory structure, including permissions, types, corresponding IDs (SHA1 values), and names of files or subfolders. The other is a commit object containing the current commit’s hash value, timestamp, author, email, and other configuration details, similar to what you see in git log.
The current Git directory's tree structure is now clearer.
Branch Information
From the git log, we observe that HEAD points to master, the default branch. But where is this information stored?
cat .git/HEAD
ref: refs/heads/master
cat .git/refs/heads/master
679b1cf8c6cd1253ca82dd06fd644268d1058d7f
In a Git repository, HEAD, branches, and ordinary tags can be viewed as pointers to the SHA1 value of the corresponding commit in objects.
Git Storage Tree: From HEAD to Blob
Thus far, we understand the Git repository's storage structure, which indexes through SHA1, linking to files and associating with the current branch and commit information. The Git tree is essentially a Directed Acyclic Graph (DAG), and as more branches and commits are created, the changes in tree nodes become increasingly pronounced.
Modifying Files and Committing
To visualize the formation of the Git tree, we can track changes across the three Git zones during various operations. When a file is modified, the workspace reflects those changes, but nothing in .git is affected. After executing git add, the updated file mirrors the corresponding object in .git/objects, simultaneously altering the index area. A commit then submits the index's content to form a new commit object within .git/objects, altering HEAD and refs/head/master to point to the latest commit.
Merging Master Changes to a Branch
Let’s delve into branching. Create a new branch and observe the changes in the .git directory. You’ll discover that HEAD now points to the current branch. Files related to this branch are added under refs and logs.
Next, perform a series of operations that are common in practice. Add a new file, c.txt, to the branch, then execute git add and git commit. This process generates new objects in both the index and objects, as well as modifying HEAD. Switch back to the master branch, add d.txt, and commit, resulting in similar modifications to the .git files.
Executing git merge new-branch introduces new elements: one directory and two files in objects—a merge commit and a new tree object that includes hashes for four files. Note that merging creates a new commit and designates it as HEAD.
Changes Caused by Rebase
Now, let’s examine how Git manages rebase. Create a new branch named new-branch-2, add a file e.txt, and commit it. Switch back to the master branch, create a file f.txt, and commit it.
By checking files in the master branch’s .git/objects, you can execute git rebase. Upon reviewing objects, you will see a new file and a new directory reflecting the latest commits from the new-branch-2 branch, while the master’s HEAD remains at the previous commit of f.txt.
Merge vs. Rebase
At this point, it’s easier to differentiate between merge and rebase operations. Here’s a comparison:
- Merge: Creates a new commit that merges changes, modifies HEAD, and adds logs.
- Rebase: Integrates new commits from other branches into the current branch without altering HEAD or adding logs.
- Both operations halt if conflicts arise; however, after resolving conflicts, merge requires a new commit, whereas rebase can continue seamlessly.
Choosing Between Merge and Rebase
The choice between merge and rebase depends on the user’s Git strategy. A merge’s commit history accurately reflects what transpired, while rebase extracts modifications and integrates them into the current branch, focusing on the development process. Teams favoring a clean history often prefer rebase, while those working with multiple branches may opt for merge to maintain an accurate historical record.
Conclusion
Tools are crafted to enhance our tasks, and they possess unique "personalities" as they are created by people. Git is a streamlined and sophisticated tool; understanding its underlying mechanics can simplify its complex commands.
As a final note, refrain from altering the content within .git, unless you’re working in hooks. If you must, ensure you back up first!
This video titled "Storing Secrets in GIT | GitOps | Kubernetes" elaborates on the methods of securely managing secrets within GitOps environments.
In "100,000 Different Ways to Manage Secrets in GitOps - Andrew Block, Red Hat," discover numerous strategies for effective secret management in GitOps.
Thank you for reading!