Automated Version Control
- Version control record changes you make “step-by-step”.
-
Git
is a Version control software optimized for plain text files, like.R
and.Rmd
files.
Setting Up Git
- Use the usethis package to configure a user name, email address, and other preferences once per machine.
- Use
usethis::use_git_config()
to configure Git in Rstudio. - Use
usethis::git_sitrep()
to verify your configuration.
Creating a Repository
- Use
usethis::use_git()
to initialize a repository. - Git stores all of its repository data in the
.git
directory. - Use
git status
in the Terminal to check the status of a repository.
Tracking Changes
-
git status
shows the status of a repository. - Files can be stored in a project’s working directory (which users see), the staging area (where the next commit is being built up) and the local repository (where commits are permanently recorded).
-
git add
puts files in the staging area. -
git commit
saves the staged content as a new commit in the local repository. - Write a commit message that accurately describes your changes.
Ignoring Things
- Avoid the version control of raw data, intermediate files and results that can be re-generated from raw data and software.
- The
.gitignore
file tells Git what files to ignore.
Remotes in GitHub
- A local Git repository can be connected to one or more remote repositories.
- Use
usethis::use_github()
to connect to a remote repository. -
git push
copies changes from a local repository to a remote repository. -
git pull
copies changes from a remote repository to a local repository.
Collaborating
-
git clone
copies a remote repository to create a local repository with a remote calledorigin
automatically set up.
Conflicts
- Version control also allows many people to work in parallel.
- Conflicts occur when two or more people change the same lines of the same file.
- The version control system does not allow people to overwrite each other’s changes blindly, but highlights conflicts so that they can be resolved.
- Feature branches help to segregate work and minimize conflicts.
- Atomic commits prioritize one unit of change instead of the quantity of changes.
Wrap-up
- Read the tutorial on ‘Improve your code for Epidemic analysis with R’.
- Use the
GitHub Discussions
as our communication forum after the workshop. - Use the feedback form to share your constructive comments.
Supplemental: Using RStudio for Git
- Using RStudio’s Git integration allows you to version control a project over time.
- Feature branches help you experiment with code without breaking the main project.
Exploring History
-
git diff
displays differences between commits. -
git checkout
recovers old versions of files.
Open Science
- Open scientific work is more useful and more highly cited than closed.
Licensing
- The
LICENSE
,LICENSE.md
, orLICENSE.txt
file is often used in a repository to indicate how the contents of the repo may be used by others. - People who incorporate General Public License (GPL’d) software into their own software must make their software also open under the GPL license; most other open licenses do not require this.
- The Creative Commons family of licenses allow people to mix and match requirements and restrictions on attribution, creation of derivative works, further sharing, and commercialization.
- People who are not lawyers should not try to write licenses from scratch.
Citation
- Add a CITATION file to a repository to explain how you want your work cited.
Hosting
- Projects can be hosted on university servers, on personal domains, or on a public hosting service.
- Rules regarding intellectual property and storage of sensitive information apply no matter where code and data are hosted.