OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

office message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]

Subject: JTC1/SC34/WG5 Liaison Report: Measuring Interoperability

On 2013-06-17 I attended a meeting of ISO/IEC JTC1/SC34 WG5.  

The single topic of conversation was a proposal from China on a Measurement Model for Document Interoperability.

The contribution is by HOU Xia of Beijing Information Science and Technology University (BISTU).

There is the basis for developing a JTC1 Technical Report.  The proposal is to create a New Work Item for the development of the model and its application.  This may be in the category of Exploratory Work rather than a standards-track effort, at least for now.  There will be further discussion at the September SC34 Plenary, although the NWI proposal might not come in until later, possibly early 2014.

The measurement model is being used experimentally at a proof-of-concept level at this time.


The central feature of the measurement model is a hierarchical identification of document (format) features.  The idea is to capture essential features that are carried by document formats.  This is meant to be an abstraction to features that individuals perceive and control in applications that consume, present, and produce the documents.  

The identification of features is meant to be as independent of format specifics as possible, apart from the necessary dependence on the nature of electronic documents and ways that users interact with them.  

That creates the CONCEPTUAL MODEL.  For rich documents such as those supported by OOXML and ODF applications, the detailing of features is an extensive undertaking. 


A key step in the application of the model is to standard formats.  That is, a particular format specification can be analyzed to determine how it supports features in the conceptual model, down to the finest details of the model.  Iterations in the application of the model to standards will lead to refinement of the conceptual model and possible adjustment of the identification of how features are found from one document standard to another.  This will be a substantial effort and it clearly must start at some small level and be refined over time.  

A standard might not reflect a feature or might express a feature in quite different ways than is accomplished using other formats.  This narrows the scope to features that are supported in one or more formats under consideration.   The number of document features is still extensive.

My presumption is that, as feature details in individual standards are rolled up into being reflected in the conceptual model, one will end up with a downward look at the extent to which the detailed conceptual features are supported in a given format.


Analysis of a corpus of documents is proposed to occur mechanically.  By examining documents in a particular format and identifying the features that are expressed in such document, it is possible to estimate the prevalence of conceptual-feature usage.  This can be used to create weights with respect to feature significance in some domain of document creation and use.


The identification of features with respect to a standard allows for comparison of the degree to which the overall conceptual feature set is supported in a given specification.  Although one can identify degrees of commonality, and possibly assess some sort of difficulty of feature-preserving translations between formats, there is no particular way to give any weight to such determinations.  

By considering cross-standard interoperability in the context of statistical document profiling, however, it is possible to ascertain the degree to which fidelity can be preserved by translation of features of such documents from one standard format to another.  

It may also be possible and desirable to create document templates that guide users to employment of features that satisfy any requirements there are for cross-format interoperability.


The metrics proposed include 

 - conversion difficulty of preserving an individual feature from format A to B.  

 - the relative importance of a feature in format A (as determined statistically)

The crude single measure is a normalized value of the interoperability from format A to format B given the importance of the features in the corpus of format A documents of concern.

The formulas for the normalized measurements have been derived.  

Presumably, one can also factor in an implementation's feature coverage as well, and implementers might be interested in making such determinations with regard to their intended community of software adopters.


I think it is extremely difficult to analyze specifications to the level that the representation of conceptual features is identified.  There is a tremendous number of details.  This is something that has to be handled by progressive deepening and refinement.  There will also be disagreements, making it necessary to be willing to iterate and also to gain expertise in different ways of accomplishing the same thing in the same and different formats.

The agreement on weightings (difficult of conversion, importance of features, etc.) will be troublesome as well.  The question will be how is this methodology to be applied in a constructive manner that does not create barriers to contribution of expertise by competitors.

 - Dennis

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]