Instructor Notes
Using a software tool to handle the versions of your project files lets you focus on the more interesting/innovative aspects of your project.
- Version control’s advantages
- It’s easy to set up
- Every copy of a Git repository is a full backup of a project and its history
- A few easy-to-remember commands are all you need for most day-to-day version control tasks
- The GitHub hosting service provides a web-based collaboration service
- Two main concepts
- commit: a recorded set of changes in your project’s files
- repository: the history of all your project’s commits
- Why use GitHub?
- No need for a server: easy to set up
- GitHub’s strong community: your colleagues are probably already there
Overall
Version control might be the most important topic we teach, but Git is definitely the most complicated tool. However, GitHub presently dominates the open software repository landscape, so the time and effort required to teach fundamental Git is justified and worthwhile.
Because of this complexity, we don’t teach novice learners about many interesting topics, such as branching, hashes, and commit objects.
Instead we try to convince them that version control is useful for researchers working in teams or not, because it is
- a better way to “undo” changes,
- a better way to collaborate than mailing files back and forth, and
- a better way to share your code and other scientific work with the world.
Teaching Notes
You can “split” your shell so that recent commands remain in view using this script.
Make sure the network is working before starting this lesson.
Drawings are particularly useful in this lesson: if you have a whiteboard, use it!
Version control is usually not the first subject in a workshop, so get learners to create a GitHub account after the session before. Remind learners that the username and email they use for GitHub (and setup during Git configuration) will be viewable to the public by default. However, there are many reasons why a learner may not want their personal information viewable, and GitHub has resources for keeping an email address private.
If some learners are using Windows, there will inevitably be issues merging files with different line endings. (Even if everyone’s on some flavor of Unix, different editors may or may not add a newline to the last line of a file.) Take a moment to explain these issues, since learners will almost certainly trip over them again. If learners are running into line ending problems, GitHub has a page that helps with troubleshooting. Specifically, the section on refreshing a repository may be helpful if learners need to change the
core.autocrlf
setting after already having made one or more commits.We don’t use a Git GUI in these notes because we haven’t found one that installs easily and runs reliably on the three major operating systems, and because we want learners to understand what commands are being run. That said, instructors should demo a GUI on their desktop at some point during this lesson and point learners at this page.
Instructors should show learners graphical diff/merge tools like DiffMerge.
When appropriate, explain that we teach Git rather than CVS, Subversion, or Mercurial primarily because of GitHub’s growing popularity: CVS and Subversion are now seen as legacy systems, and Mercurial isn’t nearly as widely used in the sciences right now.
-
Further resources:
- git-it is a self-paced command-line Git demo, with git-it-electron its GitHub Desktop successor.
- Code School has a free interactive course, Try Git.
- for instructors, the Git parable is useful background reading
Automated Version Control
Ask, “Who uses ‘undo’ in their editor?” All say “Me”. ‘Undo’ is the simplest form of version control.
Give learners a five-minute overview of what version control does for them before diving into the watch-and-do practicals. Most of them will have tried to co-author papers by emailing files back and forth, or will have biked into the office only to realize that the USB key with last night’s work is still on the kitchen table. Instructors can also make jokes about directories with names like “final version”, “final version revised”, “final version with reviewer three’s corrections”, “really final version”, and, “come on this really has to be the last version” to motivate version control as a better way to collaborate and as a better way to back work up.
Setting Up Git
-
We suggest instructors and students use
nano
as the text editor for this lessons because- it runs in all three major operating systems,
- it runs inside the shell (switching windows can be confusing to students), and
- it has shortcut help at the bottom of the window.
Please point out to students during setup that they can and should use another text editor if they’re already familiar with it.
When setting up Git, be very clear what learners have to enter: it is common for them to edit the instructor’s details (e.g. email). Check at the end using
git config --list
.When setting up the default branch name, if learners have a Git version older than 2.28, the default branch name can be changed for the lesson using
git branch -M main
if there are currently commits in the repository, orgit checkout -b main
if there are no commits/the repository is completely empty.
Creating a Repository
When you do
git status
, Mac users may see a.DS_Store
file showing as untracked. This a file that Mac OS creates in each directory.-
The challenge “Places to create repositories” tries to reinforce the idea that the
.git
folder contains the whole Git repo and deleting this folder undoes agit init
. It also gives the learner the way to fix the common mistake of putting unwanted folders (likeDesktop
) under version control.Instead of removing the
.git
folder directly, you can choose to move it first to a safer directory and remove it from there:The challenge suggests that it is a bad idea to create a Git repo inside another repo. For more discussion on this topic, please see this issue.
Tracking Changes
It’s important that learners do a full commit cycle by themselves (make changes,
git diff
,git add
, andgit commit
). The “bio
repository” challenge does that.This is a good moment to show a diff with a graphical diff tool. If you skip it because you’re short on time, show it once in GitHub.
One thing may cause confusion is recovering old versions. If, instead of doing
$ git checkout f22b25e mars.txt
, someone does$ git checkout f22b25e
, they wind up in the “detached HEAD” state and confusion abounds. It’s then possible to keep on committing, but things likegit push origin main
a bit later will not give easily comprehensible results. It also makes it look like commits can be lost. To “re-attach” HEAD, usegit checkout main
.This is a good moment to show a log within a Git GUI. If you skip it because you’re short on time, show it once in GitHub.
Ignoring Things
Just remember that you can use wildcards and regular expressions to
ignore a particular set of files in .gitignore
.
Remotes in GitHub
Make it clear that Git and GitHub are not the same thing: Git is an open source version control tool, GitHub is a company that hosts Git repositories in the web and provides a web interface to interact with repos they host.
It is very useful to draw a diagram showing the different repositories involved.
When pushing to a remote, the output from Git can vary slightly depending on what leaners execute. The lesson displays the output from git if a learner executes
git push origin main
. However, some learners might use syntax suggested by GitHub for pushing to a remote with an existing repository, which isgit push -u origin main
. Learners using syntax from GitHub,git push -u origin main
, will have slightly different output, including the lineBranch main set up to track remote branch main from origin by rebasing.
Collaborating
Decide in advance whether all the learners will work in one shared repository, or whether they will work in pairs (or other small groups) in separate repositories. The former is easier to set up; the latter runs more smoothly.
Role playing between two instructors can be effective when teaching the collaboration and conflict sections of the lesson. One instructor can play the role of the repository owner, while the second instructor can play the role of the collaborator. If it is possible, try to use two projectors so that the computer screens of both instructors can be seen. This makes for a very clear illustration to the students as to who does what.
It is also effective to pair up students during this lesson and assign one member of the pair to take the role of the owner and the other the role of the collaborator. In this setup, challenges can include asking the collaborator to make a change, commit it, and push the change to the remote repository so that the owner can then retrieve it, and vice-versa. The role playing between the instructors can get a bit “dramatic” in the conflicts part of the lesson if the instructors want to inject some humor into the room.
If you don’t have two projectors, have two instructors at the front of the room. Each instructor does their piece of the collaboration demonstration on their own computer and then passes the projector cord back and forth with the other instructor when it’s time for them to do the other part of the collaborative workflow. It takes less than 10 seconds for each switchover, so it doesn’t interrupt the flow of the lesson. And of course it helps to give each of the instructors a different-colored hat, or put different-colored sticky notes on their foreheads.
-
If you’re the only instructor, the best way to create is clone the two repos in your Desktop, but under different names, e.g., pretend one is your computer at work:
-
It’s very common that learners mistype the remote alias or the remote URL when adding a remote, so they cannot
push
. You can diagnose this withgit remote -v
and checking carefully for typos.- To fix a wrong alias, you can do
git remote rename <old> <new>
. - To fix a wrong URL, you can do
git remote set-url <alias> <newurl>
.
- To fix a wrong alias, you can do
Before cloning the repo, be sure that nobody is inside another repo. The best way to achieve this is moving to the
Desktop
before cloning:cd && cd Desktop
.-
If both repos are in the
Desktop
, have them to clone their collaborator repo under a given directory using a second argument: The most common mistake is that learners
push
beforepull
ing. If theypull
afterward, they may get a conflict.Conflicts, sometimes weird, will start to arise. Stay tight: conflicts are next.
Learners may have slightly different output from
git push
andgit pull
depending on the version of git, and if upstream (-u
) is used.
Conflicts
Expect the learners to make mistakes. Expect yourself to make mistakes. This happens because it is late in the lesson and everyone is tired.
-
If you’re the only instructor, the best way to create a conflict is:
- Clone your repo in a different directory, pretending is your
computer at work:
git clone https://github.com/vlad/planets.git planets-at-work
. - At the office, you make a change, commit and push.
- At your laptop repo, you (forget to pull and) make a change, commit and try to push.
-
git pull
now and show the conflict.
- Clone your repo in a different directory, pretending is your
computer at work:
Learners usually forget to
git add
the file after fixing the conflict and just (try to) commit. You can diagnose this withgit status
.-
Remember that you can discard one of the two parents of the merge:
- discard the remote file,
git checkout --ours conflicted_file.txt
- discard the local file,
git checkout --theirs conflicted_file.txt
You still have to
git add
andgit commit
after this. This is particularly useful when working with binary files. - discard the remote file,
Keep in mind that depending on the Git version used, the outputs for
git push
andgit pull
can vary slightly.
Open Science
Licensing
We teach about licensing because questions about who owns what, or can use what, arise naturally once we start talking about using public services like GitHub to store files. Also, the discussion gives learners a chance to catch their breath after what is often a frustrating couple of hours.
The Creative Commons family of licenses is recommended for many types of works (including software documentation and images used in software) but not software itself. Creative Commons recommends a software-specific license instead.
Citation
Hosting
A common concern for learners is having their work publicly available on GitHub. While we encourage open science, sometimes private repos are the only choice. It’s always interesting to mention the options to have web-hosted private repositories.
Automated Version Control
Instructor Note
We can use the comments or live participation as ice-breaker.
The Long History of Version Control Systems
Automated version control systems are nothing new. Tools like RCS, CVS, or Subversion have been around since the early 1980s and are used by many large companies. However, many of these are now considered legacy systems (i.e., outdated) due to various limitations in their capabilities. More modern systems, such as Git and Mercurial, are distributed, meaning that they do not need a centralized server to host the repository. These modern systems also include powerful merging tools that make it possible for multiple authors to work on the same files concurrently.
Concept map
Instructor Note
This exercise can be solved in the shared document of the training.
Setting Up Git
Default Git branch naming
Source file changes are associated with a “branch.” For new learners
in this lesson, it’s enough to know that branches exist, and this lesson
uses one branch.
By default, Git will create a branch called master
when you
create a new repository with git init
(as explained in the
next Episode). This term evokes the racist practice of human slavery and
the software development
community has moved to adopt more inclusive language.
In 2020, most Git code hosting services transitioned to using
main
as the default branch. As an example, any new
repository that is opened in GitHub and GitLab default to
main
. However, Git has not yet made the same change. As a
result, local repositories must be manually configured have the same
main branch name as most cloud services.
For versions of Git prior to 2.28, the change can be made on an
individual repository level. The command for this is in the next
episode. Note that if this value is unset in your local Git
configuration, the init.defaultBranch
value defaults to
master
.
Instructor Note
When using usethis::git_sitrep()
, check if there is no
✖ ...
line in the output with an error message.
Some common errors are solved here:
Creating a Repository
Tracking Changes
Definition: Staging area
The staging area is the middle ground between what you have done to your files (also known as the working directory) and what you had last committed (the HEAD commit). As the name implies, the staging area gives you space to prepare (stage) the changes that will be reflected on the next commit. (Coderefinery, 2023)