Ignoring Things
Last updated on 2024-03-05 | Edit this page
Estimated time: 5 minutes
Overview
Questions
- How can I tell Git to ignore files I don’t want to track?
Objectives
- Configure Git to ignore specific files.
- Explain why ignoring files can be useful.
What not to put under Version Control?
In the section called “What Not to Put Under Version Control” from G. Wilson et al. 2017 on Good Enough Practices in Scientific Computing, authors emphasize, among other notes, in three points:
- Version control systems are optimized for plain-text files;
- Avoid raw data, since it should not change;
- Avoid intermediate files or results that can be re-generated from raw data and software.
Ignore files
What if we have files that we do not want Git to track for us, like
backup files created by our editor or intermediate files created during
data analysis? For example, .csv
files exported from the
reader::write_csv()
function or .png
files
from ggplot2::ggsave()
. Let’s create a few dummy files
in the Terminal:
and see what Git says:
OUTPUT
On branch main
Untracked files:
(use "git add <file>..." to include in what will be committed)
a.csv
b.csv
c.csv
figures/
nothing added to commit but untracked files present (use "git add" to track)
Putting these files under version control would be a waste of disk space. What’s worse, having them all listed could distract us from changes that actually matter, so let’s tell Git to ignore them.
We do this by creating a file in the root directory of our project
called .gitignore
. In the console, let’s
use:
R
usethis::edit_file(".gitignore")
Write these patterns and save the file:
OUTPUT
*.csv
figures/
These patterns tell Git to ignore any file whose name ends in
.csv
and everything in the figures
directory.
(If any of these files were already being tracked, Git would continue to
track them.)
Once we have created this file, the output of git status
is much cleaner. In the terminal:
OUTPUT
On branch main
Untracked files:
(use "git add <file>..." to include in what will be committed)
.gitignore
nothing added to commit but untracked files present (use "git add" to track)
The only thing Git notices now is the newly-created
.gitignore
file. You might think we wouldn’t want to track
it, but everyone we’re sharing our repository with will probably want to
ignore the same things that we’re ignoring. Let’s add and commit
.gitignore
:
OUTPUT
On branch main
nothing to commit, working tree clean
As a bonus, using .gitignore
helps us avoid accidentally
adding files to the repository that we don’t want to track:
OUTPUT
The following paths are ignored by one of your .gitignore files:
a.csv
Use -f if you really want to add them.
If we really want to override our ignore settings, we can use
git add -f
to force Git to add something. For example,
git add -f a.csv
. We can also always see the status of
ignored files if we want:
OUTPUT
On branch main
Ignored files:
(use "git add -f <file>..." to include in what will be committed)
a.csv
b.csv
c.csv
figures/
nothing to commit, working tree clean
Good Practice
A good version control project:
Use it mostly for plain-text files
Avoid raw data.
Avoid intermediate files or results.
Your turn!
Take 5 minutes to the following two challenges:
- Ignoring Nested Files
- Including Specific Files
If you want to keep practicing, move to the last one:
- Home challenges
Live challenges
If you only want to ignore the contents of
outputs/templates
, you can change your
.gitignore
to ignore only the /templates/
subfolder by adding the following line to your .gitignore:
OUTPUT
outputs/templates/
This line will ensure only the contents of
outputs/templates
is ignored, and not the contents of
outputs/paper
.
As with most programming issues, there are a few alternative ways that one may ensure this ignore rule is followed. The “Ignoring Nested Files: Variation” exercise has a slightly different directory structure that presents an alternative solution. Further, the discussion page has more detail on ignore rules.
Including Specific Files
How would you ignore all .csv
files in your root
directory except for final.csv
?
Find out what !
(the exclamation point operator) does
for gitignore
in the Git Reference
documentation.
You would add the following two lines to your .gitignore:
OUTPUT
*.csv # ignore all data files
!final.csv # except final.csv
The exclamation point operator will include a previously excluded entry.
Note also that because you’ve previously committed .csv
files in this lesson they will not be ignored with this new rule. Only
future additions of .csv
files added to the root directory
will be ignored.
Home challenges
Ignoring Nested Files: Variation
Given a directory structure that looks similar to the earlier Nested Files exercise, but with a slightly different directory structure:
How would you ignore all of the contents in the outputs folder, but
not outputs/paper
?
Hint: think a bit about how you created an exception with the
!
operator before.
If you want to ignore the contents of outputs/
but not
those of outputs/paper/
, you can change your
.gitignore
to ignore the contents of outputs folder, but
create an exception for the contents of the outputs/paper
subfolder. Your .gitignore would look like this:
OUTPUT
outputs/* # ignore everything in outputs folder
!outputs/paper/ # do not ignore outputs/paper/ contents
Ignoring all data Files in a Directory
Assuming you have an empty .gitignore file, and given a directory structure that looks like:
BASH
data/household/position/gps/a.csv
data/household/position/gps/b.csv
data/household/position/gps/c.csv
data/household/position/gps/info.txt
data/survey
What’s the shortest .gitignore
rule you could write to
ignore all .csv
files in
data/household/position/gps
? Do not ignore the
info.txt
.
Appending data/household/position/gps/*.csv
will match
every file in data/household/position/gps
that ends with
.csv
. The file
data/household/position/gps/info.txt
will not be
ignored.
Ignoring all CSV data Files in the repository
Let us assume you have many .csv
files in different
subdirectories of your repository. For example, you might have:
How do you ignore all the .csv
files, without explicitly
listing the names of the corresponding folders?
In the .gitignore
file, write:
OUTPUT
**/*.csv
This will ignore all the .csv
files, regardless of their
position in the directory tree. You can still include some specific
exception with the exclamation point operator.
The !
modifier will negate an entry from a previously
defined ignore pattern. Because the !*.csv
entry negates
all of the previous .csv
files in the
.gitignore
, none of them will be ignored, and all
.csv
files will be tracked.
Log Files
You wrote a script that creates many intermediate log-files of the
form log_01
, log_02
, log_03
, etc.
You want to keep them but you do not want to track them through
git
.
Write one
.gitignore
entry that excludes files of the formlog_01
,log_02
, etc.Test your “ignore pattern” by creating some dummy files of the form
log_01
, etc.You find that the file
log_01
is very important after all, add it to the tracked files without changing the.gitignore
again.Discuss with your neighbor what other types of files could reside in your directory that you do not want to track and thus would exclude via
.gitignore
.
- append either
log_*
orlog*
as a new entry in your .gitignore - track
log_01
usinggit add -f log_01
Key Points
- Avoid the version control of raw data, intermediate files and results that can be re-generated from raw data and software.
- The
.gitignore
file tells Git what files to ignore.