Wednesday, August 26, 2009

 

Modeling a Diff tool…

Ok, here's the problem:
"Given two array of characters (in real world situations, one would be modified version of the other), determine the set of substrings that are "additions", or "changes" or "deletes" w.r.t the first array".
Example: If the first array is “To understand the solution you must understand the Prototype” and the second one “To really understand the solution, you should see the prototype”. The analysis result of the second array w.r.t first would be (for simplicity, I’ve taken a word break view) :
To                     - Common
really                 - Addition 
understand         - Common
the                    - Common 
solution,            - Update (comma) 
you                   - Common
should               - Update (Previously: must)
see                   - Update (Previously: understand) 
the                    - Common
prototype           - Update (first letter casing changed) 

The problem is very well known in the computer algorithm world & there are algorithms that solves it in linear time [ O(n) ]. You could use these algorithms in making "Diff" tools (example - WinMerge ). 
ClassDiagram1
What you see above is a very simple model for the Diff operation. For any client (web, desktop or otherwise … as long as they can talk to an implementation of above model) the starting point is the DiffTool class. Set the Base and Target. One can easily map a string , file content, serialized objects or memory segments to a byte array representation. I had another reason to choose byte array over stream, the reason being I had to expose the DiffTool as a web-service, hence clients would be remote to this service. Rest of the design of this model is pretty straightforward … client calls the Process method & gets DiffResult instance which has collection instance for commons, additions, deletes and updates. The All collection in the DiffResult contains the ordered (top-down) list of all Diffs as they appear.





Comments:

Post a Comment

Subscribe to Post Comments [Atom]





<< Home

This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]