[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Subject: comparison current update - TEXT
Some update on comparison technique for TEXT method. We proceed with TEXT method similar as we do with XML output method. We produce XML document ("infoset" of text) that contains only info about original text relevant for comparison. For text it means that we drop difference between platform dependant representation of line breaks. Example: original text document: This is the text we want to compare This is corresponding "text infoset" <text encoding="US-ASCII"> <line>This is the text </line> <line>we want to </line> <line>compare</line> </text> Text infoset is always serialized in "UTF-8" leaving original encoding info on the root node. Note: In the first version I propose to ignore encoding info when do actual comparison. We can still put it for reference. Note: Drawback of this approach is that there is no standard way to create such infoset - though it is simple and achievable in any script environment. Reason to produce XML as "text infoset" is to be able to reuse same consistent "canonical" serialization method we use in XML comparison.
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Powered by eList eXpress LLC