Summary and Schedule
This is an Epiverse-TRACE tutorial built with The Carpentries Workbench.
Setup Instructions | Download files required for the lesson | |
Duration: 00h 00m | 1. Automated Version Control | What is version control and why should I use it? |
Duration: 00h 25m | 2. Setting Up Git |
How do I set up Git? What is a token? What is a commit ?What is a repository ?What is a branch ?
|
Duration: 00h 35m | 3. Creating a Repository | Where does Git store information? |
Duration: 01h 00m | 4. Tracking Changes |
How do I record changes in Git? How do I check the status of my version control repository? How do I record notes about what changes I made and why? |
Duration: 01h 45m | 5. Ignoring Things | How can I tell Git to ignore files I don’t want to track? |
Duration: 01h 50m | 6. Remotes in GitHub | How do I share my changes with others on the web? |
Duration: 02h 35m | 7. Collaborating | How can I use version control to collaborate with other people? |
Duration: 03h 00m | 8. Conflicts | What do I do when my changes conflict with someone else’s? |
Duration: 03h 15m | 9. Wrap-up |
Where is a full view of the concepts covered today? Where can I ask for questions after this workshop? Where can I write my feedback on this workshop? |
Duration: 03h 20m | 10. Supplemental: Using RStudio for Git | How can I use Git with RStudio? |
Duration: 03h 30m | 11. Exploring History |
How can I identify old versions of files? How do I review my changes? How can I recover old versions of files? |
Duration: 03h 55m | 12. Open Science | How can version control help me make my work more open? |
Duration: 04h 05m | 13. Licensing | What licensing information should I include with my work? |
Duration: 04h 10m | 14. Citation | How can I make my work easier to cite? |
Duration: 04h 12m | 15. Hosting | Where should I host my version control repositories? |
Duration: 04h 22m | Finish |
The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.
Motivation
Reproducible science not only reduce errors, but speeds up the process of re-runing your analysis and auto-generate updated documents with the results. Git is an essential part of this process!
The scenario
Wolfman and Dracula have been hired as data analysts by Outbreak Missions (a Rapid Support Team for outbreak response services) to investigate a disease outbreak. They want to be able to work on the development of a reproducible Situation report at the same time, but they have run into problems doing this in the past.
- If they take turns, each one will spend a lot of time waiting for the other to finish, but
- if they work on their own copies and email changes back and forth things will be lost, overwritten, or duplicated.
A colleague suggests using version control to manage their work!
Version control is better than mailing files back and forth:
Nothing that is saved to version control is ever lost, unless you work really, really hard at it.
Since all old versions of files are saved, it’s always possible to go back in time to see exactly who wrote what on a particular day, or what version of a program was used to generate a particular set of results.
- When several people collaborate in the same project, it’s possible to accidentally overlook or overwrite someone’s changes. The version control system automatically notifies users whenever there’s a conflict between one person’s work and another’s.
Version control is the lab notebook of the digital world: it’s what professionals use to keep track of what they’ve done and to collaborate with other people. Every large software development project relies on it, and most programmers use it for their small jobs as well. And it isn’t just for software: books, papers, small data sets, and anything that changes over time or needs to be shared can and should be stored in a version control system.
Prerequisites
In this tutorial we use Git from within RStudio. Some previous experience with R and Bash is expected, but isn’t mandatory.
Software Setup
Follow these five steps:
1. Install or upgrade R and RStudio
To install R and RStudio, follow these instructions https://posit.co/download/rstudio-desktop/.
Already installed? Hold on: This is a great time to make sure your R installation is current.
This tutorial requires R version 4.0.0 or later.
To check if your R version is up to date:
In RStudio your R version will be printed in the console window. Or run
sessionInfo()
there.-
To update R, download and install the latest version from the R project website for your operating system.
- After installing a new version, you will have to reinstall all your packages with the new version.
- For Windows, the installr package can upgrade your R version and migrate your package library.
To update RStudio, open RStudio and click on
Help > Check for Updates
. If a new version is available follow the instructions on the screen.
On updates
While this may sound scary, it is far more common to run into issues due to using out-of-date versions of R or R packages. Keeping up with the latest versions of R, RStudio, and any packages you regularly use is a good practice.
2. Install R packages
Open RStudio and copy and paste the following code chunk into the console window, then press the Enter (Windows and Linux) or Return (MacOS) to execute the command:
R
if(!require("pak")) install.packages("pak")
new <- c("gh",
"gitcreds",
"usethis")
pak::pak(new)
These installation steps could ask you
? Do you want to continue (Y/n)
write Y
and
press Enter.
You should update all of the packages required for the tutorial, even if you installed them relatively recently. New versions bring improvements and important bug fixes.
When the installation has finished, you can try to load the packages by pasting the following code into the console:
R
library(gh)
library(gitcreds)
library(usethis)
If you do NOT see an Error like
there is no package called ‘...’
you are good to go! If you
do, contact us!
3. Installing Git
Follow the “Happy Git with R” instructions on installing Git for each Operating system:
4. Creating a GitHub Account
You will need an account for GitHub to follow the last episodes in this tutorial.
Follow all these steps
- Go to https://github.com and follow the “Sign up” link at the top-right of the window.
- Follow the instructions to create an account.
- Verify your email address with GitHub.
- Configure the Multi-factor Authentication (see below).
Multi-factor Authentication
GitHub requires all accounts to have multi-factor authentication (2FA) configured for extra security. Several options exist for setting up 2FA, which are summarised here:
- If you already use an authenticator app, like Google Authenticator or Duo Mobile on your smartphone for example, add GitHub to that app.
- If you have access to a smartphone but do not already use an authenticator app, install one and add GitHub to the app.
- If you do not have access to a smartphone or do not want to install an authenticator app, you have two options:
The GitHub documentation provides more details about configuring 2FA.
5. Preparing Your Working Directory
We need to be out of any R project. In Rstudio, close your Project
from File
> Close Project
. You can confirm
this in the upper right corner, displaying
Project: (None)
.
Make sure NOT to work in a folder sync-up with any file hosting services like OneDrive or Dropbox.
If you have any of these services on your computer, go to the console window and run:
R
getwd()
Verify your current working directory is out of any sync folder.
Your Questions
If you need any assistance installing the software or have any other questions about this tutorial, please send an email to andree.valle-campos@lshtm.ac.uk