Learn the Git workflow

This section includes links to a Git tutor called GitStream. GitStream allows you to practice Git on your machine: for each exercise, you clone a GitStream repository, then follow the instructions on the web page. GitStream will give you feedback in both the terminal and on the web as you complete each exercise.

As you read: complete the GitStream exercises to practice using Git.

You can find general instructions and a list of exercises on the GitStream page.

GitStream is a new project! If you encounter a problem or a bug, please post on Piazza.

It may not work with multiple exercise pages open at the same time. If an exercise doesn’t work, please close other exercise windows and try again.

What is Git?

Git is a Version Control System (VCS). If you have used other version control software before, like SVN or CVS, many of the concepts and procedures of git will be familiar to you. The Pro Git book (which you can read online) describes what Git is used for:

What is version control, and why should you care? Version control is a system that records changes to a file or set of files over time so that you can recall specific versions later. […] It allows you to revert files back to a previous state, revert the entire project back to a previous state, compare changes over time, see who last modified something that might be causing a problem, who introduced an issue and when, and more. Using a VCS also generally means that if you screw things up or lose files, you can easily recover.

Some of the most important Git concepts:

repository: A folder containing all the files associated with a project (e.g. a 6.178 problem set), as well as the entire history of commits to those files.
commit (or “revision”): A snapshot of the files in a repository at a given point in time.
add (or “stage”): Before changes to a file can be committed to a repository, the files in question must beadded or staged (before each commit). This lets you commit changes to only certain files of your choosing at a time, but can also be a bit of a pain if you accidentally forget to add all the files you wanted to commit before committing.
clone: Since git is a “distributed” version control system, there is no concept of a centralized git “server” that holds the latest official version of your code. Instead, developers “clone” remote repositories that contain files they want access to, and then commit to their local clones. Only when they push their local commits to the original remote repository are other developers able to see their changes.
push: The act of sending your local commits to a remote repository. Again, until you add, commit, and push your changes, no one else can see them.
pull: The act of retrieving commits made to a remote repository and writing them into your local repository. This is how you are able to see commits made by others after the time at which you made an initial clone.

Cloning

You start working with Git repos in 6.178 by cloning a remote repository into a local repository on your computer.

To do this, open the terminal (or the Git Bash for Windows) and use the cd command to change to the directory where you would like to store your code. Then run:

git clone URI-of-remote-repo

git clone URI-of-remote-repo project-name

Replace URI-of-remote-repo with the location of the remote repository, and replace project-name with the appropriate project name, like ps0. The result will be a new directory project-name with the contents of the repository. This is your local repository.

GitStream → Practice git clone

Cloning problem sets: for each problem set in 6.178, you will have a Git repository in github.mit.edu.

Initially this remote repository only contains some template code.

To start working on the problem set, you will clone that repository onto your machine.

As you complete each part of the problem set, you will commit your changes to the local repository and then pushthem to the remote repository.

When the time comes for grading your assignment, we will clone the remote repository and look at the last commit you made and pushed there before the deadline.

Getting the history of the repository

After you have cloned the repository, you should navigate into the repository on your command prompt using cd. This lets you run git commands on the repository.

For example, you can see the last commit on the repository using git show. This will show you the commit message as well as all the modifications.

You can see the list of all the commits you made (along with their commit messages) with git log. If you do git log -p, it will show you the full commit history, including the changes each commit made.

Long output: if git show or git log generate more output than fits on one page, you will see a colon (:) symbol at the bottom of the screen. You will not be able to type another command! Use the arrow keys to scroll up and down, and quit the output viewer by pressing q.

Commit IDs: every Git commit has a unique ID, the long hexadecimal numbers you see in git log or git show. These numbers are in fact a cryptographic hash of the contents of your commit. One neat feature is that this ID is unique not just within your repository, but actually within the universe of Git commits. In other words, if your commit ID is something like ab1312313febc241..., that commit is (extremely likely) to be the only commit in the world with that name.

You can reference a commit by its ID (or frequently just by the first several characters). This is most useful with something like git show, where you can look at a particular commit, rather than just the most recent one.

You will also see commits identified by ID in tools like gitweb and Didit.

Creating a commit

The basic building block of data in Git is called a “commit”. A commit represents some change to one or more files (or the creation of one or more files).

When you first create a file or change a file, that data is unknown. To add it, run:

git add file.txt (where file.txt is the file you want to add)

You’ll either need to run that command from the same directory as the file, or include directory names in the file path.

This “stages” the file. Once you’ve staged all your changes, run:

git commit

This will pop up an editor that will give you a chance to write a commit message. When you save and close the editor, the commit will be created.

Getting the status of your repository

Git has some nice commands for seeing the status of your repository.

The most basic of these is git status. You can run this at any point to see which files Git sees have been modified and are still unstaged and which files have been modified and staged (so that if you git commit those changes will be included in the commit). Note that the same file might have both staged and unstaged changes, if you changed the file more after running git add.

When you have unstaged changes, you can see what the changes were (relative to the last commit) by runninggit diff. Note that this will not include changes that were staged (but not committed). You can see those if you run git diff --staged.

Pushing

After you’ve made some commits, you might want to push them to a remote repository. Again, in 6.178, you really only have one remote repository to push to, called origin. To push to it, you run the command:

git push origin master

The origin in the command specifies that you’re pushing to the origin remote. The master refers to themaster branch. Branches are an advanced feature of Git that we’re not going to be using in 6.178, but since Git has them, you do have to specify a branch. For now, just include this part when you push.

Once you run this, you will be prompted for your password and hopefully everything will push. You’ll get a line like this:


a67cc45..b4db9b0  master -> master

GitStream → Practice the add-commit-push workflow
GitStream → Practice adding a new file
GitStream → Practice deleting a file

Merges

Sometimes, when you try to push, things will go wrong. You might get an output like this:


! [rejected]      master -> master (non-fast-forward)

What’s going on here is that Git won’t let you push to a repository unless all your commits come after all the ones already in your remote repository. If you get an error message like that, it means that there is a commit in your remote repository that you don’t have in your local one (probably because a teammate pushed before you did). If you find yourself in this situation, you have to pull first and then push.

Pulling

To perform a pull, you should run git pull. When you run this, Git actually does two things:

It downloads the changes and stores them in its internal state. At this point, your repository doesn’t appear any different—it just knows what the state of the remote repository is and what the state of your repository is.
It incorporates the changes from the remote repository into the new repository via a process called merging(see next section).

Merging

If you made some changes to your repository and you’re trying to incorporate the changes from another repository, you need to merge them together somehow. In terms of commits, what actually needs to happen is that you have to create a special merge commit which encompasses both changes. How this process actually happens depends on the changes.

If you’re lucky, then the changes you made and the changes that you downloaded from the remote repository don’t conflict. For example, maybe you changed one file and your partner changed another. In this case, it’s safe to just include both changes. Similarly, maybe you changed different functions of the same file. In these cases, Git can do the merge automatically. When you run git pull, it will pop up an editor as if you were making a commit—in fact, this is the commit message of the merge commit that Git automatically generated. Once you save and close this editor, the merge commit will be made and you will have incorporated the changes. At this point, you can try to git push again and hopefully it will work this time.

Merge conflicts

Sometimes, you’re not so lucky. If the changes you made and the changes you pulled edit the same part of the same file, Git won’t know how to resolve it. This is called a merge conflict. In this case, you will get an output that says CONFLICT in big letters. If you run git status, it will show the conflicting files with the label Both modified. You now have to edit these files and resolve them by hand.

First, open them up in your text editor (probably Eclipse for 6.178). The parts that are conflicted will be really obviously marked with obnoxious <<<<<<<<<<<<<<<<<<, ==================, and >>>>>>>>>>>>>>>>>> lines. Everything between the <<<< and the ==== lines are the changes you made. Everything between the ==== and the >>>> lines are the changes you pulled in. It’s your job to figure out how to combine these. The answer will of course depend on the situation. Maybe one change logically supercedes the other, or maybe they can be merged somehow. You should edit the file to your satisfaction and remove the <<<</====/>>>> markers when you’re done.

Once you have resolved all the conflicts (note that there can be several conflicting files, and also several conflicts per file), git add all the affected files and then git commit. You will have an opportunity to write the merge commit message (where you should describe how you did the merge). Now you should be able to push.

GitStream → Practice resolving a merge conflict

Avoid merges and merge conflicts:

Pull before you start working.

Before you start working, always git pull. That way, you’ll be working from the latest version of your code, and you’ll be less likely to have to perform a merge later.

Reverting to previous versions

If you’d like to practice using the version history to undo a change:

GitStream → (optional exercise) Practice git log and git revert

Technical detail: what is the remote repository?

Unlike other similar systems, Git doesn’t have built-in a notion of a “central repository.” Instead, any repository can push to any other repository by specifying it as a “remote.” A “remote” is just a pair of a name (which can be anything) and a URI, which is a string indicating how it can find the other repository.

In 6.178, all of your repositories are created by cloning a remote repository which we create (and which acts as the “central” repository). You’ve done this with the git clone URI directory command. This actually does a couple of things:

Create an empty directory called directory (i.e. the last argument to git clone).
Initialize it as an empty Git repository.
Add a remote with the URI you specified and the name origin.
Download the data from the remote.

So for those of you who were wondering, that’s what the origin means. It’s just the default name of the remote repository that you cloned your repository from.

This document is a derivative of the 6.005 Getting Started Notes