Git gc

Git gc Overview

The abbreviation of gc is 'Garbage Collection'. In Git, there is a command used to maintain the repository which is git gc. When you execute this command, you are instructing Git to clean up the mess you have created in the current repository. Garbage collection is a concept of high-end programming language which performs dynamic memory allocation. Therefore, using the gc command, the garbage can be collected from the interpreted language being used in your repository along with the garbage that has become difficult to access from the executing program.

Git gc

Besides, Git can also accumulate piles of garbage just like the dump-yards. By default, the garbage present in Git is inaccessible and orphaned commits. The urgency of using git gc commands comes when you want to alter the repository using the git rebase or git reset command. This action generates some garbage heaps that can also create redundancy. Moreover, Git does not delete its detached commits. To check detached commits you will need to examine the git log or try out cherry-picking or check-out commands.

Another task that git gc serves is to perform compression techniques on Git Object. Compression frees-up the important disk space. Git gc is a powerful command which identifies redundant or similar objects in the repository and compresses them into a 'pack'. Packs are nothing but the garbage data that might be a copy of the hash generated at the end of rebasing or resetting. They behave just like zipped files and reside in the .git/object/pack directory within a repository.

Garbage Collection options

$ cd gc --aggressive

Usually, the execution speed of the git gc command is scarily fast along with perfect disk space utilization and optimal performance. Therefore, the aggressive command will enhance the memory optimization better at the expense of reducing the speed of the execution. The effects of aggressive command are mostly persistent as this can take more time than expected.

$ cd gc --auto

This option enables you to check whether any warehouse is required or not. If you don't need it, it simply moves out without performing any task. The housekeeping mechanism is automatically triggered when the configuration options like gc.auto or gc.autoPackLimit are carried alongside while you carry out the git auto command.

$ cd gc --prune=<date>

This command is similar to the prune command. The main task of this command is to remove or keep losing control of the objects that have been specified on some date. It simply rolls out the older objects present under the period of some date. Therefore, the aging and risk of corruption are increased if another execution is carried out in the repository concurrently.

$ cd gc --no-prune

This command simply does not prune any loose object from the repository.

$ cd gc --quite

This command is used to suppress all the progress report made so far.

$ cd gc --force

This command is used to run the current command even though there might be another git gc command that is being executed in the repository. This command overrides the previous running git gc command and runs the new one.

$ cd gc --keep-largest-pack

As discussed before, the pack command keeps all the aspects into one packet. Therefore, when you execute the above-given command, all the similar data are compressed into one pack except the largest pack. The gc.bigPackThreshold pack is simply ignored while executing this command.

Insights

It is quite hard for git gc not to delete any referencing objects from the repository. Thus, it keeps, in particular, the referencing objects from the branches and tags intact by the index and enables to remote track branches or the commits that were rewound or amended. It is important to keep note that references don't keep the objects alive. If it is expected that you are deleting some objects, you need to check all the locations and conclude that the deletion will make some sense or not while removing those references.

In another perspective, when you run git gc command with concurrently running process, there might arise a risk of deletion of another process that has not yet created its reference. This may also cause the other process to fail if that process is closely associated with the concurrently running process. To curb this problem Git provides two features.

The two features are as follows:

  1. Git keeps any newer object if it finds that the object is somewhere taking reference or is reachable.
  2. The above step is taken into account when the modification of the time interval is already present. However, the feature falls apart from the complete solution and users prefer not to run this command in practice with the risk of corruption.