Git repositories always get bigger. I noticed than one of GitHub repository was above 500Mb. I was wondering how I could make that size smaller. First, let see the size.
git count-objects -v
count: 0 size: 0 in-pack: 19644 packs: 1 size-pack: 222397 prune-packable: 0 garbage: 0 size-garbage: 0
The size is size-pack. To clean, the first option is to rebase the repository so basically to clean everything and to commit the current state of the content. One solution is to keep only the latest commits (see Reduce repository size).
git log -n N git reset --hard HEAD~N git push --force
If this does not work, another strategy is to create an empty branch, to commit everything as the first commit, to delete the master branch, to replace it by the new one and to clean unused files (see Make the current commit the only (initial) commit in a Git repository?).
git checkout --orphan newBranch git add -A # Add all files and commit them git commit git branch -D master # Deletes the master branch git branch -m master # Rename the current branch to master git push -f origin master # Force push master branch to github git gc --aggressive --prune=all # Remove the old files
A third option is to remove files added to the repository and then deleted. To do that, you need to follow the steps described into: Removing sensitive data from a repository. That leaves the problem of finding files you can remove. You can go to git_dataframes.ipynb. I tried it on my own repo. I added and removed a file log.txt.
If then run:
git filter-branch --force --index-filter "git rm --cached --ignore-unmatch log.txt" --prune-empty --tag-name-filter cat -- --all
It displayed the following message:
I finally typed:
git push origin --force --all
And its content disappeared from the commit.
<-- --> |