Diff Package
by Doug Williams
m.douglas.williams at gmail.com
This package provides a simple diff-like capability in Racket. This includes diffs of arbitrary lists or of text files. It also includes an implementation of the longest common subsequence (LCS) algorithm, which is the basis of the diff algorithm.
The diff package is available from the PLaneT repository.
(require (planet williams/diff/diff)) |
1 Longest Common Subsequence (LCS)
The longest common subsequence (LCS) is the longest subsequence that is common to a set of sequences. We are solving the special case with exactly two sequences, which are represented as lists. This is the basis of the diff algorithm.
(list-lcs list-1 list-2 [test]) → list? |
list-1 : list? |
list-2 : list? |
test : (-> any/c any/c boolean?) = equal? |
Example:
(list-lcs '(a b c d f g h j q z) '(a b c d e f g h i j k r x y z) eq?)
(a b c d f g j z)
2 Diff
The diff algorithm computes the differences between two sequences in the form of what changes are required to the first sequence to produce the second sequence. This is based on the longest common subsequence.
Once the longest common subsequence (LCS) is computed (using list-lcs) it’s easy to produce the diff-like output: if an item is absent in the LCS but present in the first sequence, it must have been deleted; and, it an item is absent in the LCS by present in the second sequence, it must have been added.
2.1 List Diff
(list-diff list-1 list-2 [test]) → list? |
list-1 : list? |
list-2 : list? |
test : (-> any/c any/c boolean?) = equal? |
Examples:
(list-diff '(a b c d f g h j q z) '(a b c d e f g h i j k r x y z) eq?)
(a b c d (#:added e) f g (#:removed h) (#:added i) j (#:removed q) (#:added k r x y) z)
2.2 File Diff
A common use of diff is to print the differences between two text files - list the Unix diff command. The files are read as sequences of lines (i.e., string delimited by end-of-line sequences) and list-diff is used to compute the differences between files. The differences are then printed. Lines common to both files are denoted by "=|", lines that have been added are denoted by ">|", and lines that have been removed are denoted by "<|".
Example:
This part of the |
document has stayed the |
same from version to |
version. It shouldn't |
be shown if it doesn't |
change. Otherwise, that |
would not be helping to |
compress the size of the |
changes. |
|
This paragraph contains |
text that is outdated. |
It will be deleted in the |
near future. |
|
It is important to spell |
check this dokument. On |
the other hand, a |
misspelled word isn't |
the end of the world. |
Nothing in the rest of |
this paragraph needs to |
be changed. Things can |
be added after it. |
This is an important |
notice! It should |
therefore be located at |
the beginning of this |
document! |
|
This part of the |
document has stayed the |
same from version to |
version. It shouldn't |
be shown if it doesn't |
change. Otherwise, that |
would not be helping to |
compress anything. |
|
It is important to spell |
check this document. On |
the other hand, a |
misspelled word isn't |
the end of the world. |
Nothing in the rest of |
this paragraph needs to |
be changed. Things can |
be added after it. |
|
This paragraph contains |
important new additions |
to this document. |
>|This is an important |
>|notice! It should |
>|therefore be located at |
>|the beginning of this |
>|document! |
>| |
=|This part of the |
=|document has stayed the |
=|same from version to |
=|version. It shouldn't |
=|be shown if it doesn't |
=|change. Otherwise, that |
=|would not be helping to |
<|compress the size of the |
<|changes. |
>|compress anything. |
=| |
<|This paragraph contains |
<|text that is outdated. |
<|It will be deleted in the |
<|near future. |
<| |
=|It is important to spell |
<|check this dokument. On |
>|check this document. On |
=|the other hand, a |
=|misspelled word isn't |
=|the end of the world. |
=|Nothing in the rest of |
=|this paragraph needs to |
=|be changed. Things can |
=|be added after it. |
>| |
>|This paragraph contains |
>|important new additions |
>|to this document. |
3 References
"Longest common subsequence problem", Wikipedia, The Free Encyclopedia, http://en.wikipedia.org/wiki/Longest_common_subsequence_problem (Accessed January 13, 2011).
"Diff", Wikipedia, The Free Encyclopedia, http://en.wikipedia.org/wiki/Diff (Accessed January 13, 2011).
"Algorithm Implementation/Strings/Longest common subsequence", WikiBooks, Open books for an open world, http://en.wikibooks.org/wiki/Algorithm_implementation/Strings/Longest_common_subsequence (Accessed January 13, 2011).