Automated Version Control


  • Version control record changes you make “step-by-step”.
  • Git is a Version control software optimized for plain text files, like .R and .Rmd files.

Setting Up Git


  • Use the usethis package to configure a user name, email address, and other preferences once per machine.
  • Use usethis::use_git_config() to configure Git in Rstudio.
  • Use usethis::git_sitrep() to verify your configuration.

Creating a Repository


  • Use usethis::use_git() to initialize a repository.
  • Git stores all of its repository data in the .git directory.
  • Use git status in the Terminal to check the status of a repository.

Tracking Changes


  • git status shows the status of a repository.
  • Files can be stored in a project’s working directory (which users see), the staging area (where the next commit is being built up) and the local repository (where commits are permanently recorded).
  • git add puts files in the staging area.
  • git commit saves the staged content as a new commit in the local repository.
  • Write a commit message that accurately describes your changes.

Ignoring Things


  • Avoid the version control of raw data, intermediate files and results that can be re-generated from raw data and software.
  • The .gitignore file tells Git what files to ignore.

Remotes in GitHub


  • A local Git repository can be connected to one or more remote repositories.
  • Use usethis::use_github() to connect to a remote repository.
  • git push copies changes from a local repository to a remote repository.
  • git pull copies changes from a remote repository to a local repository.

Collaborating


  • git clone copies a remote repository to create a local repository with a remote called origin automatically set up.

Conflicts


  • Version control also allows many people to work in parallel.
  • Conflicts occur when two or more people change the same lines of the same file.
  • The version control system does not allow people to overwrite each other’s changes blindly, but highlights conflicts so that they can be resolved.
  • Feature branches help to segregate work and minimize conflicts.
  • Atomic commits prioritize one unit of change instead of the quantity of changes.

Wrap-up


  • Read the tutorial on ‘Improve your code for Epidemic analysis with R’.
  • Use the GitHub Discussions as our communication forum after the workshop.
  • Use the feedback form to share your constructive comments.

Supplemental: Using RStudio for Git


  • Using RStudio’s Git integration allows you to version control a project over time.
  • Feature branches help you experiment with code without breaking the main project.

Exploring History


  • git diff displays differences between commits.
  • git checkout recovers old versions of files.

Open Science


  • Open scientific work is more useful and more highly cited than closed.

Licensing


  • The LICENSE, LICENSE.md, or LICENSE.txt file is often used in a repository to indicate how the contents of the repo may be used by others.
  • People who incorporate General Public License (GPL’d) software into their own software must make their software also open under the GPL license; most other open licenses do not require this.
  • The Creative Commons family of licenses allow people to mix and match requirements and restrictions on attribution, creation of derivative works, further sharing, and commercialization.
  • People who are not lawyers should not try to write licenses from scratch.

Citation


  • Add a CITATION file to a repository to explain how you want your work cited.

Hosting


  • Projects can be hosted on university servers, on personal domains, or on a public hosting service.
  • Rules regarding intellectual property and storage of sensitive information apply no matter where code and data are hosted.