Compiled on: 2024-05-07 — printable version
Did you ever need to roll back some project or assignment to a previous version?
How did you track the history of the project?
Inefficient!
Did you ever need to develop some project or assignment as a team?
How did you organize the work to maximize the productivity?
Tools meant to support the development of projects by:
Distributed: Every copy of the repository contains (i.e., every developer locally have) the entire history.
Centralized: A reference copy of the repository contains the whole history; developers work on a subset of such history
Git is now the dominant DVCS (although Mercurial is still in use, e.g., for Python, Java, Facebook).
At a first glance, the history of a project looks like a line.
Anything that can go wrong will go wrong
$1^{st}$ Murphy’s law
If anything simply cannot go wrong, it will anyway $5^{th}$ Murphy’s law
Go back in time to a previous state where things work
Then fix the mistake
If you consider rollbacks, history is a tree!
Alice and Bob work together for some time, then they go home and work separately, in parallel
They have a diverging history!
If you have the possibility to reconcile diverging developments, the history becomes a graph!
Reconciling diverging developments is usually referred to as merge
Project meta-data. Includes the whole project history
Usually, stored in a hidden folder in the root folder of the project
(or worktree, or working directory)
the collection of files (usually, inside a root folder) that constitute the project, excluding the meta-data.
A saved status of the project.
A named sequence of commits
If no branch has been created at the first commit, a default name is used.
To be able to go back in time or change branch, we need to refer to commits
tree-ish
esHEAD
, which refers to the current commitWhen committing, the HEAD
moves forward to the new commit:
HEAD
, which refers to the current commitWhen checking out some previous commit, the HEAD
moves backward that commit:
5
Appending ~
and a number i
to a valid tree-ish means “i-th
parent of this tree-ish”
this can be exploited w.r.t. the HEAD
commit…
… or w.r.t. any other reference commit
The operation of moving to another commit
Moves the HEAD
to the specified target tree-ish
Let us try to see what happens when ve develop some project, step by step.
Oh, no, there was a mistake in commit 4
! We need to roll back!
6
whenever we want to.4
, I’d like to have it into new-branch
Notice that:
8
is a merge commit, as it has two parents: 7
and 5
De-facto reference distributed version control system
¹ Less difference now, Facebook vastly improved Mercurial
Git is a command line tool
Although graphical interfaces exsist, it makes no sense to learn a GUI:
Configuration in Git happens at two level
Set up the global options reasonably, then override them at the repository level, if needed.
git config
The config
subcommand sets the configuration options
--global
option, configures the tool globallygit config [--global] category.option value
option
of category
to value
As said, --global
can be omitted to override the global settings locally
user.name
and user.email
A name and a contact are always saved as metadata, so they need to be set up
git config --global user.name "Your Real Name"
git config --global user.email "your.email.address@your.provider"
Some operations pop up a text editor.
It is convenient to set it to a tool that you know how to use
(to prevent, e.g., being “locked” inside vi
or vim
).
Any editor that you can invoke from the terminal works.
git config --global core.editor nano
How to name the default branch.
Two reasonable choices are main
and master
git config --global init.defaultbranch master
git init
.git
folder.git
folder marks the root of the repository
cd
to locate yourself inside the folder that contains (or will containe the project)
mkdir
)git init
.git
folder.Git has the concept of stage (or index).
git add <files>
moves the current state of the files into the stage as changesgit reset <files>
removes currently staged changes of the files from stagegit commit
creates a new changeset with the contents of the stageIt is extremely important to understand clearly what the current state of affairs is
git status
prints the current state of the repository, example output:
❯ git status
On branch master
Your branch is up to date with 'origin/master'.
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
modified: content/_index.md
new file: content/dvcs-basics/_index.md
new file: content/dvcs-basics/staging.png
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: layouts/shortcodes/gravizo.html
modified: layouts/shortcodes/today.html
git config --global user.name 'Your Real Name'
git config --global user.email 'your@email.com'
git config user.name 'Your Real Name'
git config user.email 'your@email.com'
-m
, otherwise Git will pop up the default editor
git commit -m 'my very clear and explanatory message'
At the first commit, there is no branch and no HEAD
.
Depending on the version of Git, the following behavior may happen upon the first commit:
master
master
, but warns that it is a deprecated behavior
main
as seen as more inclusivegit config --global init.defaultbranch default-branch-name
In general, we do not want to track all the files in the repository folder:
Of course, we could just not add
them, but the error is around the corner!
It would be much better to just tell Git to ignore some files.
This is achieved through a special .gitignore
file.
.gitignore
, names like foo.gitignore
or gitignore.txt
won’t work
echo whatWeWantToIgnore >> .gitignore
(multiplatform command)git add
is called with the --force
option).gitignore
example# ignore the bin folder and all its contents
bin/
# ignore every pdf file
*.pdf
# rule exception (beginning with a !): pdf files named 'myImportantFile.pdf' should be tracked
!myImportantFile.pdf
Going to a new line is a two-phased operation:
In electromechanic teletypewriters (and in typewriters, too), they were two distinct operations:
Terminals were designed to behave like virtual teletypewriters
tty
LF
was sufficient in virtual TTYs to go to a new line
\n
means “newline”we would get
lines
like these
CR
character followed by an LF
character: \r\n
LF
character: \n
CR
character: \r
\n
If your team uses multiple OSs, it is likely that, by default, the text editors use either LF
(on Unix) or CRLF
It is also very likely that, upon saving, the whole file gets rewritten with the “locally correct” line endings
Git tries to tackle this issue by converting the line endings so that they match the initial line endings of the file,
resulting in repositories with illogically mixed line endings
(depending on who created a file first)
and loads of warnings about LF
/CRLF
conversions.
Line endings should instead be configured per file type!
.gitattributes
LF
everywhere, but for Windows scripts (bat
, cmd
, ps1
).gitattributes
file in the repository root
* text=auto eol=lf
*.[cC][mM][dD] text eol=crlf
*.[bB][aA][tT] text eol=crlf
*.[pP][sS]1 text eol=crlf
modular-calculator
repositoryDownload the code as a ZIP and extract it somewhere
Create a new directory elsewhere, and make it a Git repository with git init
Ensure that your username and email are set up correctly, either globally or locally
Add the .gitignore
and the .gitattributes
files, and commit them
modular-calculator
repository to the stage and commit themWhat criterion to decide which and how many files to include per commit?
Atomicity: a commit should be a single change
Coherence: a commit should contain coherent changes
Frequency: commits should be small and frequent
Commit toghether edits that are related to the same conceptual change
git add
adds a change to the stagegit add someDeletedFile
is a correct command, that will stage the fact that someDeletedFile
does not exist anymore, and its deletion must be registered at the next commit
.
foo
into bar
:
git add foo bar
foo
has been deleted and bar
has been createdcalculator.gui
module a packageCreate a new directory gui
inside the calculator
directory
Move the gui.py
file inside the gui
directory, and rename it to __init__.py
Add all the files to the stage and look at the status
Commit the changes
Of course, it is useful to visualize the history of commits. Git provides a dedicated sub-command:
git log
HEAD
commit (the current commit) backwards
git log --oneline
git log --all
git log --graph
git log --oneline --all --graph
git log --oneline --all --graph
* d114802 (HEAD -> master, origin/master, origin/HEAD) moar contribution
| * edb658b (origin/renovate/gohugoio-hugo-0.94.x) ci(deps): update gohugoio/hugo action to v0.94.2
|/
* 4ce3431 ci(deps): update gohugoio/hugo action to v0.94.1
* 9efa88a ci(deps): update gohugoio/hugo action to v0.93.3
* bf32a8b begin with build slides
* b803a65 lesson 1 looks ready
* 6a85f8f ci(deps): update gohugoio/hugo action to v0.93.2
* b474d2a write more on the introductory lesson
* 8a7105e ci(deps): update gohugoio/hugo action to v0.93.1
* 6e40642 begin writing the first lesson
<tree-ish>
esIn git, a reference to a commit is called <tree-ish>
. Valid <tree-ish>
es are:
b82f7567961ba13b1794566dde97dda1e501cf88
.b82f7567
.HEAD
, a special name referring to the current commit (the head, indeed).It is possible to build relative references, e.g., “get me the commit before this <tree-ish>
”,
by following the commit <tree-ish>
with a tilde (~
) and with the number of parents to get to:
<tree-ish>~STEPS
where STEPS
is an integer number produces a reference to the STEPS-th
parent of the provided <tree-ish>
:
b82f7567~1
references the parent of commit b82f7567
.some_branch~2
refers to the parent of the parent of the last commit of branch some_branch
.HEAD~3
refers to the parent of the parent of the parent of the current commit.In case of merge commits (with multiple parents), ~
selects the first one
Selection of parents can be performed with caret in case of multiple parents (^
)
git rev-parse
reference on specifying revision is publicly availableWe want to see which differences a commit introduced, or what we modified in some files of the work tree
Git provides support to visualize the changes in terms of modified lines through git diff
:
git diff
shows the difference between the stage and the working tree
git add
git diff --staged
shows the difference between HEAD
and the working treegit diff <tree-ish>
shows the difference between <tree-ish>
and the working tree (stage excluded)git diff --staged <tree-ish>
shows the difference between <tree-ish>
and the working tree, including staged changesgit diff <from> <to>
, where <from>
and <to>
are <tree-ish>
es, shows the differences between <from>
and <to>
git diff
Example output:diff --git a/.github/workflows/build-and-deploy.yml b/.github/workflows/build-and-deploy.yml
index b492a8c..28302ff 100644
--- a/.github/workflows/build-and-deploy.yml
+++ b/.github/workflows/build-and-deploy.yml
@@ -28,7 +28,7 @@ jobs:
# Idea: the regex matcher of Renovate keeps this string up to date automatically
# The version is extracted and used to access the correct version of the scripts
USES=$(cat <<TRICK_RENOVATE
- - uses: gohugoio/hugo@v0.94.1
+ - uses: gohugoio/hugo@v0.93.3
TRICK_RENOVATE
)
echo "Scripts update line: \"$USES\""
The output is compatible with the Unix commands diff
and patch
Still, binary files are an issue! Tracking the right files is paramount.
Use git log --oneline
to find the hashes of two commits
Use git diff HEAD <to>
to visualize the differences between the last commit and any prior commit of your choice
~
notationNavigation of the history concretely means to move the head (in Git, HEAD
) to arbitrary points of the history
In Git, this is performed with the checkout
commit:
git checkout <tree-ish>
HEAD
to the provided <tree-ish>
<tree-ish>
The command can be used to selectively checkout a file from another revision:
git checkout <tree-ish> -- foo bar baz
foo
, bar
, and baz
from commit <tree-ish>
, and adds them to the stage (unless there are uncommitted changes that could be lost)--
is surrounded by whitespaces, it is not a --foo
option, it is just used as a separator between the <tree-ish>
and the list of files
<tree-ish>
and we need disambiguationUse git log --oneline
to find the hash of the last commit where the file gui.py
was present
Use git checkout <tree-ish> -- gui.py
to restore the file gui.py
from the chosen commit
Use git status
to check the status of the file
Git does not allow multiple heads per branch
(other DVCS do, in particular Mercurial):
for a commit to be valid, HEAD
must be at the “end” of a branch (on its last commit), as follows:
When an old commit is checked out this condition doesn’t hold!
If we run git checkout HEAD~4
:
The system enters a special workmode called detached head.
When in detached head, Git allows to make commits, but they are lost!
(Not really, but to retrieve them we need git reflog
and git cherry-pick
, that we won’t discuss)
Use git log --oneline
to find the hash of some previous commit of choice
Use git checkout <tree-ish>
to enter the detached head state
Try to edit some file and commit the changes
Check the current status with git status
Use git checkout master
to return to the last commit of the master
branch
Use git log --oneline
to find the hash of the commit you made in the detached head state
One copy of the project history is stored in the cloud
Every developer has a local copy of the repository
Mechanisms and protocols are in place to synchronize the local copies with the cloud copy
Several Web services allow the creation of shared repositories on the cloud.
They enrich the base git model with services built around the tool:
repositories are uniquely identified by an owner and a repository name
owner/repo
is a name unique to every repositorysupports two kind of authentications:
repo
access scope at https://github.com/settings/tokens/newhttps://github.com/owner/repo.git
becomes: https://token@github.com/owner/repo.git
Disclaimer: this is a “quick and dirty” way of generating and using SSH keys.
You are warmly recommended to learn how it works and the best security practices.
ssh-keygen
cat ~/.ssh/id_rsa.pub
ssh-rsa AAAAB3Nza<snip, a lot of seemingly random chars>PIl+qZfZ9+M= you@your_hostname
You are all set! Enjoy your secure authentication.
Browse to https://github.com, log in with your credentials
Create a new private repository, named modular-calculator
Create a new personal access token with repo
access scope
my first token
)Save the token in a safe place (from which it is easy to copy&paste it)
Remotes are local names for the known copies of a repository that exist somewhere on the Internet
init
, no remote is known.origin
by defaultThe remote
subcommand is used to inspect and manage remotes:
git remote -v
lists the known remotes
git remote add a-remote URL
adds a new remote named a-remote
and pointing to URL
git remote show a-remote
displays extended information on a-remote
git remote remove a-remote
removes a-remote
(it does not delete information on the remote, it locally forgets that it exits)
From scratch: git init
creates an empty repository
From a remote: git clone URL
creates a local copy of a remote repository (a clone)
Initialize a repository locally, and put some commits in it
Create a new repository on (or any other service of choice)
https://somesite.com/repo.git
Create a new remote in the local repository, let’s call it origin
git remote add origin https://somesite.com/repo.git
Set the upstream for the current branch, while publishing it to the remote
master
:
git push -u origin master
The situation will be as follows:
git@somesite.com/repo.git
is saved as origin
HEAD
is attached, in our case master
) on origin
gets checked out locally with the same namemaster
is set up to track origin/master
as upstreamFrom now on, local commits can be pushed to the remote with git push
Add the remote origin
to your local repository
git remote add origin https://github.com/YOUR_USERNAME/modular-calculator.git
Push the local repository to the remote, while simultaneously setting the upstream
git push -u origin master
Check the status of the repository on via your browser
Do some more commits, locally…
… and push them to the remote
git push
is sufficient from now ongit init
Git provides a clone
subcommand that copies the whole history of a repository locally
git clone URL destination
creates the folder destination
and clones the repository found at URL
destination
is not empty, failsdestination
is omitted, a folder with the same namen of the last segment of URL
is createdURL
can be remote or local, Git supports the file://
, https://
, and ssh
protocols
ssh
recommended when availableclone
subcommand checks out the remote branch where the HEAD
is attached (default branch)git clone /some/repository/on/my/file/system destination
destination
and copies the repository from the local directorygit clone https://somewebsite.com/someRepository.git myfolder
myfolder
and copies the repository located at the specified URL
git clone user@sshserver.com:SomePath/SomeRepo.git
SomeRepo
and copies the repository located at the specified URL
git@somesite.com/repo.git
is saved as origin
HEAD
is attached, in our case master
) on origin
gets checked out locally with the same namemaster
is set up to track origin/master
as upstreamgit clone https://github.com/unibo-dtm-se/repository-example.git
Such repository is an instance of
template-project-work
, i.e. a template for your final reports. It consists of static Web-site, based on the Jekyll technology. You write.md
files, and Jekyll generates the HTML for you. The site is then hosted on GitHub pages, i.e. here.
Wait for the teacher to create and push a few more commits
Pull the commits from the remote
git pull
Ensure you have the teacher’s commits locally
git log --oneline
Let’s now try to exemplify a potential situation of conflict
Let’s select a few volunteers
The volunteer will be asked to edit one file and push the changes (say, file sections/01-concept/index.md
)
.md
files, possibly, deleting some prior contentThe volunteer will be asked to push their changes
git add <edited files here>
2. git commit -m "Description of the changes"
3. git push
The teacher will edit some other file (different from the one the volunteer edited) and then commit
The teacher will attempt to push their changes
git push
Attempting to push shall result in a message like:
To somesite.com/repo.git
! [rejected] main -> main (fetch first)
error: failed to push some refs to 'somesite.com/repo.git'
hint: Updates were rejected because the remote contains work that you do
not have locally. This is usually caused by another repository pushing
to the same ref. You may want to first integrate the remote changes
(e.g., 'git pull ...') before pushing again.
See the 'Note about fast-forwards' in 'git push --help' for details.
hint
explains it pretty well)The situation is as follows:
9
10
, and the local history has commit 9
⬇️ git pull
⬇️
Now, the local history is a superset of the remote history
A new merge commit (12
) is created in the local history, which has 10
and 11
as parents
Beware: in general, when creating the merging commits,
conflicts might arise if the same files were edited
We’ll discuss conflict resolution in a few slides
Assuming that you manage to create the merge commit with no issues…
⬇️ git push
⬇️
The push succeeds now!
(i.e., they are equal)
The teacher will pull the changes from the remote
git log --oneline
The teacher will push the changes to the remote, successfully
git push
The volunteer will be asked to pull the changes from the remote
The volunteer will be asked to edit some more file
.md
files, possibly, deleting some prior content
sections/02-requirements/index.md
The volunteer will be asked to push their changes
The teacher will edit the same file (the one the volunteer edited) and then commit
The teacher will attempt to pull the volunteer’s changes
git pull
Merge conflicts cannot be resolved automatically by Git, they require human intervention
Git tries to resolve most conflicts by itself
In case of conflict on one or more files, Git marks the subject files as conflicted, and modifies them adding merge markers:
<<<<<<< Current changes
Changes made on the branch that is being merged into,
this is the branch currently checked out (HEAD).
=======
Changes made on the branch that is being merged in.
>>>>>>> Incoming changes
git add
git commit
git commit --no-edit
can be used to use it without editingThe teacher will solve the conflicts manually and then commit
git commit --no-edit
git log
The teacher will attempt to push the changes to the remote
git push
Students will be asked to pull the changes from the remote
Pull before starting your working session
Make your commits locally
Push your changes to the remote, as frequently as possible
Make sure to push before your working session ends
Most commonly, while releasing version N
, development teams are already working to version N+1
Most commonly, the development team is working on multiple features at the same time
To support many (different) development activities to occur simultaneously, developers exploit branches
A branch is a coherent development line
(cf. https://nvie.com/posts/a-successful-git-branching-model/)
master
(or main
) branch contains commits describing the stable ($\approx$ publicly available & working) versions of the codedevelop
branch contains commits where novel features under development are being integrated to create the next stable version
master
branchhotfix
branches (one per hotfix) are created whenever an urgent fix is needed on the stable version
develop
branch toofeature
branches (one per feature) are created whenever a new feature is being developed
develop
branch, and eventually to the master
branchrelease
branches (one per release) are created whenever a new version is being prepared for release
develop
and master
branchesmaster
branch contains the initial version of the report (which is equal to the template)
you create a develop
branch
you create a feature/section-N
branch for each section (N
= 1, 2, …)
as soon as a section is completed, the corresponding feature/section-N
branch is merged into develop
feature/section-N
branch is deleteddevelop
to better integrate the section with the restonce satisfied with the whole report you create a release
branch, from develop
release
branch (e.g. date and version number in the front page)once satisfied with the whole report you merge the release
branch into master
release
branch is deletedif revisions are requested by the teacher, you may create a hotfix
branch from master
… and so on
Visualising branches
git branch
– list the branches
git branch -a
– list all the branches, including the remote onesgit branch -d branch-name
– delete the branch branch-name
Switching among branches
git checkout BRANCH_NAME
– switches to branch BRANCH_NAME
git checkout -b BRANCH_NAME
– creates a new branch BRANCH_NAME
and switches to itMering two branches
git merge BRANCH_NAME
– merges the branch BRANCH_NAME
into the current branchPushing a branch to the remote
git push
– pushes the current branch to the origin
remote
Pulling a branch from the remote ($\approx$ download + merge)
git pull
– pulls the current branch from the origin
remote
The teacher will select 12 volounteers
Each volounteer will be asked to create a new branch, named feature/section-N
(N
= 1, 2, …, 12)
Each volounteer will be asked to edit the file sections/N-concept/index.md
and push the changes
The teacher will pull the changes from the remote and merge them all into the develop
branch
0. download edits from the remote: git fetch
feature/section-N
branch: git checkout feature/section-N
git pull
develop
branch: git checkout develop
feature/section-N
branch: git merge feature/section-N
The teacher will merge the develop
branch into the master
branch (and delete the feature branches)
git checkout master
git merge develop
git branch -d feature/section-N
+ deletion on GitHubDoing all the above via the CLI is fine, but GitHub provides a nice UI to visualise, and track cooperative work.
Here are a bunch of GitHub features that are useful for cooperative work (cf. numpy
’s repository)
Forks: create a personal copy of someone else’s project
Issues: keep track of bugs, enhancement proposals, and tasks to do, etc.
Pull requests: a proposal to merge changes from one branch to another, or from another fork
Wikis: (one per repository, made up by several pages) a place to document the project
Organizations: groups of users, with shared repositories, teams, and projects
Choose a name for your project (e.g. calculator
)
Create the unibo-dtm-se-ACADEMIC_YEAR-calculator
free organization on
ACADEMIC_YEAR
is the academic year (e.g. 22-23
, 23-24
)Instantiate the template-project-work
template to create the report
repository inside the organization
Inside the report
repository, create a develop
branch, from GitHub’s UI
Create a new issue, e.g. for the abstract, and create a branch (say feature/abstract
) from it
The volounteer should commit & push the new abstract on the feature/abstract
The volounteer should now create a new pull request to merge the feature/abstract
into develop
The teacher will review the changes and merge the feature/abstract
into develop