Wednesday, August 26, 2009
Modeling a Diff tool…
Ok, here's the problem:
"Given two array of characters (in real world situations, one would be modified version of the other), determine the set of substrings that are "additions", or "changes" or "deletes" w.r.t the first array".
Example: If the first array is “To understand the solution you must understand the Prototype” and the second one “To really understand the solution, you should see the prototype”. The analysis result of the second array w.r.t first would be (for simplicity, I’ve taken a word break view) :
| To - Common really - Addition understand - Common the - Common solution, - Update (comma) you - Common should - Update (Previously: must) see - Update (Previously: understand) the - Common prototype - Update (first letter casing changed) |
The problem is very well known in the computer algorithm world & there are algorithms that solves it in linear time [ O(n) ]. You could use these algorithms in making "Diff" tools (example - WinMerge ).
What you see above is a very simple model for the Diff operation. For any client (web, desktop or otherwise … as long as they can talk to an implementation of above model) the starting point is the DiffTool class. Set the Base and Target. One can easily map a string , file content, serialized objects or memory segments to a byte array representation. I had another reason to choose byte array over stream, the reason being I had to expose the DiffTool as a web-service, hence clients would be remote to this service. Rest of the design of this model is pretty straightforward … client calls the Process method & gets DiffResult instance which has collection instance for commons, additions, deletes and updates. The All collection in the DiffResult contains the ordered (top-down) list of all Diffs as they appear.
Subscribe to Posts [Atom]
Post a Comment