So, have you enjoyed all the previous conflicts? I bet you have. Well... if you liked
those, you will love this one!
3.2.1 Example 17
From OpenJDK’s JDK repo, checkout revision 9d3e0870754 and merge
2978ffb9f9d4 .
You will see a conflict pop up on file
langtools/src/share/classes/com/sun/tools/doclets/internal/toolkit/resources/doclet.xml.
But it is not just a conflict. It is the conflict. Instead of having a few conflicted lines,
the whole file is conflicted. Here are some excerpts from it (line numbers are visible
on the left side):
Before I go into the details, let me show you what happened in the other
branch in terms of changes for this file:
Listing 3.19:example 17 - changes from the other branch
$ git diff HEAD...MERGE_HEAD -- langtools/src/share/classes/com/sun/tools/doclets/internal/toolkit/resources/doclet.xml diff --git a/langtools/src/share/classes/com/sun/tools/doclets/internal/toolkit/resources/doclet.xml b/langtools/src/share/classes/com/sun/tools/doclets/internal/toolkit/resources/doclet.xml index 8eaa2d77abc..29c473790e4 100644 --- a/langtools/src/share/classes/com/sun/tools/doclets/internal/toolkit/resources/doclet.xml +++ b/langtools/src/share/classes/com/sun/tools/doclets/internal/toolkit/resources/doclet.xml @@ -1,7 +1,7 @@ <?xml version=’1.0’ encoding=’utf-8’?> <!-- - Copyright 2003 Sun Microsystems, Inc. All Rights Reserved. + Copyright 2003-2009 Sun Microsystems, Inc. All Rights Reserved.^M DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. This code is free software; you can redistribute it and/or modify it
A simple one-liner of a change. Is the line different from what we have in
HEAD?
Listing 3.20:example 17 - file as it is on HEAD
$ git show HEAD:langtools/src/share/classes/com/sun/tools/doclets/internal/toolkit/resources/doclet.xml | head -n 5 <?xml version=’1.0’ encoding=’utf-8’?> <!-- Copyright 2003 Sun Microsystems, Inc. All Rights Reserved. DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
The line looks exactly like what is removed on the other branch... this looks
like a no-brainer, right? There should be no conflict, whatsoever. What happened
then? There’s a short-answer to this question but.... how will I spare us the pleasure
of a long-story?
Text files are made up of lines. git considers the separate lines that make up a file
to see which lines are the same and which lines have changed in order to merge
code. Now, have you ever asked yourself how a line is defined? Each line is
separated from the next by a marker called End Of Line (or EOL, for short).
But there is a tiny little problem: there are not one or two but three(!!!)
different EOL markers and each one is associated with a different operating
system.
Mac: CR (Carriage Return, char 0x0d, what we normally treat as ∖r
in programming languages)
NIX: LF (Line Feed, char 0x0a, what we normally treat as ∖n in
programming languages)
Windows: CRLF (Carriage Return followed by Line Feed, chars
0x0d0a)
Here’s a little text file with different EOL formats so you can see the difference at
the binary level:
On a new file, text editors tend to set the EOL format to the one associated with
the Operating System that the editor is running on. If the file already existed, text
editors5
will keep using the format that the file had when it was opened, even if it is different
from the OS that the editor is running on. Go save a text file with different EOL
formats and open it on a hex editor and see how the markers between the lines
change.
You might be asking yourselves “How did we end up in this EOL mess?” It’s a
fair question to ask and, just like any other good story, it’s about betrayals,
backstabbings from close friends and greed but it’s way outside of the focus of this
manual so I will kindly ask you to read Wikipedia’s Newline History if you are
curious to know how this mess came to be.
Coming back to our problem at hand, from the point of view of git,
a change in EOL format will effectively change the content of each and
every single line that makes up the file. So, you open a preexisting file,
you change a single line and then save using the wrong EOL format, and
to git it’s like the whole file was cleared up and rewritten as a whole... in
Esperanto. It’s an entirely different content from top to bottom. Let me say it
again, just so that the concept sinks in: Completely (dramatic pause)
Different (dramatic pause) File. None of the preexisting lines survived that
revision.
Now, is that really the case of what happened over here? Well, let’s get
the info from unix2dos to see how many line-breaks of each type there
are:
Listing 3.21:example 17 - file types
$ git show HEAD:langtools/src/share/classes/com/sun/tools/doclets/internal/toolkit/resources/doclet.xml | unix2dos -imud 0 205 0 $ git show MERGE_HEAD:langtools/src/share/classes/com/sun/tools/doclets/internal/toolkit/resources/doclet.xml | unix2dos -imud 205 0 0 $ git show 4ae52d7dc1ef:langtools/src/share/classes/com/sun/tools/doclets/internal/toolkit/resources/doclet.xml | unix2dos -imud 205 0 0
And we can see how on HEAD the line breaks are on a different column, which
indicates a change of EOL format.
Before considering what to do about it, let us ask ourselves “why did this
happen in the first place?” To our own amazement, there are some rather simple
possibilities for this to come up:
Developer changed it on purpose. There might be a technical reason
for that (Hard to come up with one but...). But if it was just for the sake
of it, this deserves a call of attention because of the burden downstream
that it creates (you will see).
Editor (IDE, text editor) changed it without the Developer being
aware of it. Hey, it happens! Have you ever opened a NIX file in Window’s
notepad? Saved the file without thinking about it? There you go! 6
And I bet there are other editors out there that don’t keep the original
EOL format at all. Either way, this should have been caught by the
developer before publishing the changes for other people to pull because if
another developer changes the file and then pulls (merges/rebases/cherry-picks)
the change where the other developer changed the EOL, they end up with
a nasty full-length conflict like the one we are talking about... even if the
revisions are related to single-line changes. If you use a decent git
front-end to see what a revision looks like, you would see that even if you
meant to change a few lines from the file, the whole file will show up as
being cleared up and then added back, from top to bottom. That’s the
tell-tale symptom that should make you realize that there is a problem
and you should correct it before publishing it.
git itself is getting in the way. git has a few tricks that can be used
to set EOL format of files. Personally, I find them extremely difficult to
set up correctly, specially considering developers using different operating
systems, IDEs, etc. I always recommend to set up git to not care about
EOL formats and let developers take care of them. And this can be
done with a small setting in .gitattributes so that it can be shared
by whoever works on the project. And if you are certain that you want
to have git take care of EOL format, then make sure to read all the
details about it starting with git help attributes, specifically the section
related to checking-out and checking-in. That will do for a really
nice read. Last but not least, do not fall for the core.autocrlf trap.
Read what is is about before deciding what value you should use for
that7 .
Ok, ok... enough theory. Let’s consider the different approaches you might follow
to get out of this mess.
Stay on HEAD, bring over changes from the other branch
First thing to notice is that this is kind of offsetting the purpose of the merge, right?
You would like to get all changes from the other branch automagically copied over
on your code and get conflicts on the pieces that are rightly so. Now we are in a
situation where we will need to do everything by hand. Luckily for us, in this case,
we have already seen what is required to bring over from the other branch. We
need to adjust the years covered in the copyright. So, this would be the resulting file,
the first few lines:
Listing 3.22:example 17 - HEAD with changes from the other branch
1<<<<<<< HEAD 2<?xml version=’1.0’ encoding=’utf-8’?> 3 4<!-- 5 Copyright 2003-2009 Sun Microsystems, Inc. All Rights Reserved. 6 DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
Then we remove the other parts and the conflict markers. Then wrap up the
merge:
Listing 3.23:example 17 - Wrap up the merge
$ git add langtools/src/share/classes/com/sun/tools/doclets/internal/toolkit/resources/doclet.xml $ git merge --continue [detached HEAD b4a035c194c] Merge commit ’2978ffb9f9d’ into HEAD
That’s good. As as a quick fix, this does the job... but you haven’t really tackled
the root cause of the problem. There is still an EOL-format discrepancy
between the branches. As a little experiment, let’s checkout the revision we just
asked to merge, 2978ffb9f9d, let’s write a little change on this file and
then let’s come back to the merge revision we just created and ask to merge
again.
Listing 3.24:example 17 - checkout 2978ffb9f9d
$ git checkout 2978ffb9f9d Warning: you are leaving 1 commit behind, not connected to any of your branches: b4a035c194c Merge commit ’2978ffb9f9d’ into HEAD If you want to keep it by creating a new branch, this may be a good time to do so with: git branch <new-branch-name> b4a035c194c HEAD is now at 2978ffb9f9d Merge
I will set the copyright to be 2003-2020, now, for the sake of the
example8 :
Listing 3.25:example 17 - modified file on top of 2978ffb9f9d
1<?xml version=’1.0’ encoding=’utf-8’?> 2 3<!-- 4 Copyright 2003-2020 Sun Microsystems, Inc. All Rights Reserved. 5 DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
Then we finish the revision:
Listing 3.26:example 17 - creating new revision
$ git add langtools/src/share/classes/com/sun/tools/doclets/internal/toolkit/resources/doclet.xml $ git commit -m "A tiny change" [detached HEAD 0d57690bf36] A tiny change 1 file changed, 1 insertion(+), 1 deletion(-)
At this point, the two revisions I want to merge now look like this (look at the
two revisions at the top):
Listing 3.27:example 17 - history of revisions
* 0d57690bf36 A tiny change | * b4a035c194c Merge commit ’2978ffb9f9d’ into HEAD | |\ | |/ |/| * | 2978ffb9f9d Merge | * 9d3e0870754 Merge |/ * 4ae52d7dc1e Added tag jdk7-b50 for changeset 7faffd237305
Now we will attempt the merge:
Listing 3.28:example 17 - new merge
$ git checkout b4a035c194c Warning: you are leaving 1 commit behind, not connected to any of your branches: 0d57690bf36 A tiny change If you want to keep it by creating a new branch, this may be a good time to do so with: git branch <new-branch-name> 0d57690bf36 HEAD is now at b4a035c194c Merge commit ’2978ffb9f9d’ into HEAD $ git merge 0d57690bf36 Auto-merging langtools/src/share/classes/com/sun/tools/doclets/internal/toolkit/resources/doclet.xml CONFLICT (content): Merge conflict in langtools/src/share/classes/com/sun/tools/doclets/internal/toolkit/resources/doclet.xml Automatic merge failed; fix conflicts and then commit the result.
Well, we got a conflict. And if you look inside, it will be again a full-file
conflict:
Listing 3.29:example 17 - conflicted file... again
1 <<<<<<< HEAD 2 <?xml version=’1.0’ encoding=’utf-8’?> 3 4 <!-- 5 Copyright 2003-2009 Sun Microsystems, Inc. All Rights Reserved. 6 DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. . . . 205 </SerializedForm> 206 </Doclet> 207 ||||||| 2978ffb9f9d 208 <?xml version=’1.0’ encoding=’utf-8’?> 209 210 <!-- 211 Copyright 2003-2009 Sun Microsystems, Inc. All Rights Reserved. 212 DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. . . . 411 </SerializedForm> 412 </Doclet> 413 ======= 414 <?xml version=’1.0’ encoding=’utf-8’?> 415 416 <!-- 417 Copyright 2003-2020 Sun Microsystems, Inc. All Rights Reserved. 418 DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. . . . 617 </SerializedForm> 618 </Doclet> 619 >>>>>>> 0d57690bf36
And I am kidding you not. As long as there’s different EOL-formats between
the branches involved in the merge (...or rebase, ... or cherry-pick... or
revert), you will get these totally useless full-file conflicts.... every.... single....
time.
Consider that, in this case, it was a very tiny change that had to be carried over
from the other branch. What if the changes were bigger? Multiple pieces? A few
fine changes that should get no conflict and then some conflicting ones? Are you
willing to sit down to copy stuff all day long? Not the best way to spend
your trained-for-tough-merges brain CPU cycles, right? Yeah, I thought
so.
Given that, in our example, the EOL-format change took place in our
branch, should we change the EOL format back to the original format
before attempting to merge? That might work. Let’s see. First, let’s start
over.
3.2.2 Example 17 - again... with a twist
From OpenJDK’s JDK repo, checkout revision 9d3e0870754. Open the file and change the EOL
format to Windows9 .
Then commit. Then merge 2978ffb9f9d.
And this time merge went fine. We are so good. It’s not specified in that
console output, but the merge revision is cddf9316c72. So this is what we
should have done in the first place, right? Get both branches involved to have
the same EOL-format as the last common ancestor and then merge.
Well, yes, it does work. You can merge (even better, we got no conflict this
time!). But, as an additional twist, let’s checkout the original revision we
started working from, 9d3e0870754, let’s modify some line from the file, a
harmless change, let’s commit, let’s come back to this new merge revision we
just created and let’s try to merge again, shall we? What do you think will
happen?
Listing 3.31:example 17 - checkout 9d3e0870754
$ git checkout 9d3e0870754 HEAD is now at 9d3e0870754 Merge
Listing 3.32:example 17 - modified file on top of 9d3e0870754
1<?xml version=’1.0’ encoding=’utf-8’?> 2 3 4<!-- 5 Copyright 2003 Sun Microsystems, Inc. All Rights Reserved. 6 DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
I added a second empty line before the XML comment.
And now, let’s checkout cddf9316c72 and try to merge the revision we just
created.
Listing 3.35:example 17 - current history
$ git checkout cddf9316c72 Warning: you are leaving 1 commit behind, not connected to any of your branches: 91df2b0b521 Adding empty line If you want to keep it by creating a new branch, this may be a good time to do so with: git branch <new-branch-name> 91df2b0b521 HEAD is now at cddf9316c72 Merge commit ’2978ffb9f9d’ into HEAD $ git merge 91df2b0b521 Auto-merging langtools/src/share/classes/com/sun/tools/doclets/internal/toolkit/resources/doclet.xml CONFLICT (content): Merge conflict in langtools/src/share/classes/com/sun/tools/doclets/internal/toolkit/resources/doclet.xml Automatic merge failed; fix conflicts and then commit the result.
And I bet by now you have a good idea of how big that conflict is, don’t you? If
you don’t, here’s a tip. It starts with full and ends with -file conflict. And you can
see how by merging a branch that was started before we changed back the
EOL-format to what it was originally, we got a mess just as big.
Then how do we get out of this mess? The best approach would be to rewrite
history so that the EOL format never happens. I know, I know... it’s not
recommended as a general principle, but there are situations where it’s
worth it. I would say this is one of those situations. In the recipe’s section I
provide a rather-simple way to rewrite history of a branch provided that it’s
straight (in other words, it has no merges), since the files had the correct EOL
format.
If rewriting history is not an option then just set the EOL format to the original
EOL format on the branch where it is changed and deal with the consequences.
Unfortunately you are dealing with a situation that shouldn’t have happened in
the fist place. If a developer changed the EOL-format of a file in a PR, it
should never have been accepted. It should have been rejected, the developer
should have corrected the history of the branch so that the EOL-format never
happens (the branch can be corrected rather effortlessly using the recipe,
ok?).
3.2.3 Exercises
Exercise 8 - using the script
From the exercises repo, checkout branch exercise8/branchA and merge
exercise8/branchB. We should get a full-file conflict in primes.py.
Try to do this merge correcting history using the script from this recipe. Solution
is here.
3.2.4 Tips
Make sure that files never change their EOL-format... unless it is really
required.
If a file got an EOL-format change (for no legitimate reason), do not
accept that change.
Use the script described here to correct the history of a branch if you spot
an EOL-format change before it is merged into other branches.
Make sure to have your client-side tools set to not hide EOL-format
changes.
Copyright 2020 Edmundo Carmona Antoranz
More content is coming soon.