Training documents and materials for the DAA 2022, held at Uppsala University in Sweden 1-5 August
View this repository on GitHubglow-gh/daa
This module will introduce you to GitHub and the use of the distributed version control coding language Git. The aim is to provide you with the skills to manage and monitor data and text files for research in an online multi-user environment, and to give you some experience in the basics of version control for keeping track of files.
GitHub is an online hosting service for software development and version control, built around Git, a free and open-source distributed version control system.
The good thing about GitHub is that it provides you with an online and highly versatile platform for creating, storing, sharing, and publishing text and data. So, to keep all of the files and scripts that you will be producing during the Digital Applications in Assyriology Nordic Summer School in one easily accessible place, we thought it best to start out by introducing you to GitHub.
With a GitHub user account, you will be able to store and publish virtually any type of file from a remote location, while being able at the same time to create, edit, and upload files from a local work station with detailed version control. This means that you will also be able to restore previous versions of data files if anything goes wrong, and that you will be able to share data with other users on GitHub - and make use of theirs too!
This class will take you through the following processes:
Take a moment to familiarise yourself with the user interface. Note that your user profile has a URL, with the format https://github.com/ followed by your user name. This user profile is publicly accessible by default.
Through Edit profile on the left, you can add links to your personal website, Twitter handle, and include a short bio. In the Search field in the top left corner, you can search for users, repositories, projects, and organisations.
In the Search field, try searching for cdli_gh. The results will be sorted according to type, with Repositories at the top. Scroll down to Users, and select CDLI cdli-gh from the results list.
You have now reached the GitHub organisation page of the Cuneiform Digital Library Initiative. If you choose Follow on the right, you can follow this organisation, which means that you will receive notifications to your own user profile about activities by the users and organisations that you are following.
With a GitHub account, you can then follow, inspect, and reuse data and code published by other GitHub users and organisations.
Now, let us proceed to create a file directory, or a repository, as it is called on GitHub. When you have created a GitHub user profile, you also need a repository in order to work with and store files and data associated with that user profile. Essentially, a repository is a file directory, just like a folder on a computer.
You should now have a repository named daa associated with your GitHub account. Note that this also will have received a dedicated URL, namely https://github.com/yourusername/daa/.
The point about a distributed version control system such as GitHub is that you can store files with an online hosting service and work on those locally, either individually or as a group.
This is done by cloning a repository on your online GitHub user profile to your local drive. To work with cloned GitHub repositories on your local drive, you will usually need a GitHub desktop client. For this task, we will use GitHub Desktop, but later today, you will also be introduced to code editors with built-in Git capability (see 1.3 Markdown). A good source code editor can essentially speaking perform most of the tasks that you will need a desktop client such as GitHub Desktop for, and allow you to add, delete, and edit files in your cloned repositories at the same time.
For now, let’s go with GitHub Desktop. Through this link (https://desktop.github.com), you can download and install the GitHub Desktop client on your laptop. Please do so before continuing.
To clone a GitHub repository to your local drive, you should first decide on where on your local drive this repository should be located. It’s a good thing to think a couple of steps ahead, and if you want to work more with GitHub files, you will most likely be needing more repositories in the future for your projects. The easiest thing is to keep all of your cloned repositories in a similar location on your local drive.
Because GitHub Desktop is not a text or code editor, but a file management application, you cannot see the files included in the daa repository before changes have been made in these files.
What you have done now is to add changes to a file which is part of a cloned GitHub repository. This means that you now have two different versions of the README.md; one in your online GitHub repository, another on your local drive. To update the the version online with the changes on your local drive, you will now need to make a commit. A commit pushes the file version stored on your local drive to the online repository, thus updating the online master version to correspond to your local version.
If you return to the GitHub Desktop interface, you will see that the README.md has appeared in the left-hand Changes pane and is marked with a yellow square tag. The yellow square tag indicates that the file has been modified, and in the main pane you can see a summary of the changes constituting these modifications. Most file managers and IDEs with embedded Git employs a simple colour coding for file changes:
colour | meaning |
---|---|
green | New or untracked file |
yellow | Modified file |
red | Deleted or removed file |
To update the online version of the file with these modifications, let us go through the various steps of making a commit and pushing a file to GitHub. This may sound overly complicated to start with, but as we will see, there are good reasons for every step of this process.
Whenever you have completed and saved edits to a file, the file will be listed as modified in the Changes pane of GitHub Desktop. This merely means that GitHub Desktop has registered a change in the file relative to the last version of the file logged by the embedded Git, but this change has not been stored or distributed anywhere else.
To commit changes to a file means that the changes to the currently saved local version will be committed to the version control system Git. This stores a snapshot of the current version and changes relative to previous versions on your local harddrive. The various Git versions of a file can be inspected in GitHub Desktop by clicking the History tab next to the Changes tab in the lefthand window
In order to forward these changes to the version of the file stored online on your GitHub account, an additional step is needed. This is called a push, in that it pushes the commit to the relevant GitHub repository. If multiple people are working on files from the same GitHub repository, you would also occasionally want to update your local files from the versions stored in the GitHub repository. This is called a pull. Performing a push and a pull at the same time is called a sync (as we will see in the next module, cf. 1.3)
As a final note, we have not dealt with branches of a GitHub repository in the present module. The core version of a repository on GitHub is called main, and what we have done so far has been to pull and push commits directly to and from the main branch. When becoming more accostumed to the GitHub workflow, and especially when multiple users are working on the same repository, it is not advisable to constantly make changes directly to the main branch of a repository, as this is liable to cause inconsistencies between the changes that you and others make.
Instead, when you want to work on developing a feature or document without having to include changes that other users are making on the main version of the repository, it is advisable to create a new branch from the main branch and work on this branch instead.
Note that you can only create a branch from a repository if you have write access to this repository.
You can read a lot more about this under the GitHub Docs Working with branches entry.
If you do not have write access, the corresponding action is called a fork. A fork can be performed on any public GitHub repository by any GitHub user, and is designed to allow users to further develop and augment code and documentation prepared by other users or projects. A fork is then essentially a clone of a GitHub repository to which you yourself do not have access rights.
You can read more about forks under the GitHub Docs About forks entry.
In order to integrate your work on a separate branch with the main branch of the repository, you will subsequently need to create a pull request (typically referred to as a PR in GitHub jargon). Once accepted by the repository manager (yourself or someone else with editorial access), changes in the head branch (the one you have been working on) are included in the base or main branch (the primary repository branch), and the head branch can be deleted.
More information about pull requests and merging are available from the GitHub Doc Collaborating with pull requests entry.
To conclude, this module has taught you to set up and use a GitHub account and repository for local editing and online storing of files, and how to maintain version control using Git to oversee changes in this files. When thinking about it, this setup can be used to work with a wide range of materials in a flexible way while maintaing the ability to backtrack and correct errors in previous edits. A GitHub account and repository will also enable you to prepare and publish projects online using Markdown and GitHub Pages, something that we will cover in the next module.