oiic-formation-discuss message
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]
Subject: Re: [oiic-formation-discuss] Welcome!
- From: robert_weir@us.ibm.com
- To: "Dave Pawson" <dave.pawson@gmail.com>
- Date: Thu, 5 Jun 2008 13:53:55 -0400
Hi Dave,
I don't want to take us too far down
the implementation on this list -- we're really supposed to be discussing
the charter and terms of the proposed TC. However, it is a fair point
that we should be able to demonstrate feasibility, and that the proposed
goals of the TC are reasonable and achievable.
>
> >> Perhaps NVDL would help there, certainly xproc will be a
useful tool.
> >>
> >
> > To my knowledge, no ODF implementation today actually writes
additional
> > content to an ODF document in a foreign namespace, although the
standard
> > allows this. But what I have seen is an application adding
additional
> > attributes into an existing ODF namespace. But this is
a simple validity
> > error and is caught directly by any validating parser. But
Rick's
> > pre-validation NVDL is something we can add to our bag of tricks,
in case it
> > ever comes up in the future.
>
> So a 'valid' file, with other namespaced content would be deemed
> invalid by a simple validating parser check?
> Are additional attributes (namespaced) also allowed?
>
>
It is a little more complicated than that. For
example, the math:math element in ODF 1.0 is defined to allow any content
under it:
<define name="mathMarkup">
<zeroOrMore>
<choice>
<attribute>
<anyName/>
</attribute>
<text/>
<element>
<anyName/>
<ref name="mathMarkup"/>
</element>
</choice>
</zeroOrMore>
</define>
There are cleaner ways of doing
this now, with NVDL, to describe compound documents, some defined with
DTD's, some with XML Schema, some with Relax NG, etc., but that is what
we had for ODF 1.0 back in 2005. So it is quite possible for someone
to place markup under math:math that is in a foreign namespace, and it
would still validate. But the text of the standard makes it clear
that only MathML is allowed there, so we could use other logic to verify
that this constraint is met.
There are a few other places where
the RNG allows anything, but the text of the standard is more restrictive.
>
> >
> > When we talk about conformance with ODF, we're really talking
about two
> > things, since the ODF standard defines document conformance as
well as
> > application conformance. The former is the easier one to
test, and lends
> > itself to automation.
> >
> > A full check of ODF document conformance would need to do something
like:
> >
> > 1) Verify the document file name extensions and/or MIME content
type and
> > verify that it matches the contents of the underlying document.
An ODT file
> > containing a spreadsheet should be noted, for example.
> >
> > 2) Verify the correctness of the Zip container. Is it actually
following
> > the referenced Zip specification?
> >
> > 3) Verify the referential integrity of the package. Does
the manifest
> > reference files that don't exist, for example? Are all
the required parts
> > present?
>
> Generally a programming task?
> File name matching etc,
> What about the compression?
>
>
Checking the compression structures would be part
of #2, I think. I'd avoid any approach that simply uses PkZip or
WinZip and passes anything that doesn't give an error. We really
need to verify the zip compression structures themselves.
> >
> > 4) Verify the Relax NG validity of each of the contained XML
documents,
> > pre-processing as needed.
>
> How would you define pre-processing then?
> As needed seems a bit vague?
> 1. Remove all non ODF specified namespaced elements?
> 2. Remove all non ODF specified attributes?
> (Or not, since there is a potential invalidity here?)
> (what of namespaced attributes in non ODF namespaces?)
>
>
Yes and Yes.
> >
> > 5) Verify additional referential integrity constraints. For
example, the
> > content XML typically refers to named styles in the syles xml.
These
> > cross-document references need to be checked.
>
> Schematron sounds ideal for this.
>
>
> >
> > 6) Verify the various micro-formats contained in ODF. There
are some things
> > that are not easily expressable as a schema type, even using
a regex. For
> > example, spreadsheet functions, with its hundreds of functions,
some with
> > variable arguments, which could take cell ranges, named ranges,
orconstants
> > as parameters. These are defined in the standard via EBNF.
A full
> > conformance test would take each of these attributes and verify
that they
> > match the production rules defined by the EBNF.
>
> Has anyone created a full grammar for these?
> Is grammar based validation most appropriate?
> How to collect them for validation?
>
There are around 14 places in ODF that have some sort
of micro-format. This ranges from 3D transforms, to spreadsheet formuls
to SVG-like paths, etc. In ODF 1.0 these are not all described in
formal grammars. But the intent for ODF 1.2 is that they will all
have IETF style EBNF's.
I don't know if there are more modern approaches to
doing this, but when I was a student we would use lex/yacc to create scanners
and parsers for each of these EBNF's, and call them from the appropriate
spots. Maybe there is a better way today?
> Not an easy one one by the sounds of it.
> More appropriately, how to formally define validity for these cases.
>
>
> >
> > 7) Other recommendations of the ODF standard, even where not
conformance
> > requirements. These should be checked, and warnings (not
errors) emitted.
>
> Good. Second definition required, when warnings and when 'errors'
>
> > For example, we have a number of accessibility best practices
that could be
> > statically verifiable. Similarly, we can have portability warnings.
For
> > example, a spreadsheet can have as many rows as it wished, but
for
> > portability we might recommend no more than 64K rows.
>
> Might? Shouldn't this spec be explicit? How to validate against 'might'
:-)
> How to recognise these 'recommendations' in the spec?
>
>
By "might" I mean I'm too lazy to lookup
whether we actually make that recommendation. But I do know that
David Wheeler has been putting similar portability recommendations into
his OpenFormula drafts. In any case, formal provisions of
the standard will clearly state what is mandatory ("shall") as
well as what is recommended ("should"). A reasonable mapping
would be to consider violations of the former to be errors, and violations
of the latter to be warnings.
> >
> > There are probably other pieces as well, but that's an outline
of what we
> > could do for document conformance. Ideally I'd like any
such tool to be
> > event-driven (like SAX) and pluggable, so other modules can be
independently
> > developed and later added.
>
> xproc seems a good candidate wrapper.
> Ordering and when to halt then becomes an issue.
>
I'm not familiar with xproc, but it looks interesting.
> How to link in programmatic(or shell scripted) validation
> with xml based validation.
>
>
> Far more solid start Robert, thanks.
> Is there any requirement for an instance to 'look alike' in two
> implementations?
>
> I've heard that expressed as a definition of portability in the past
>
From the end-users' perspective, this is certainly
an expectation, that interoperability means that the document looks and
behaves the same regardless of what ODF editor they use. But not
all uses of ODF involve end-users on a desktop with a display. So
the ODF standard does not say that "bold" text must be displayed
with 200% font weight or else the implementation is not conformant. If
we did that, then a search engine that doesn't display the text at all,
but uses the bold tag to increase the weight of the bold terms in the term
index would not be conformant. And the screen reader that reads the
bold text with vocal emphasis would not be conformant. So ODF essentially
says bold indicates bold and an application should do whatever it does
with bold text.
However an application that has runtime semantics
that are repugnant to the semantics of ODF should be at least warned.
For example, if an application takes bolded text and reverses the
letters in those words and moves them into a footnote on the previous page,
then that would be certainly hurt interoperability.
Similarly, colors in ODF are expressed as RGB values.
So they are relative to an color model where the actual rendered
colors will be device dependent. So a circle filled with 'red' will
be whatever the device considers to be 'red', combined with whatever ambient
lighting conditions add to the color.
Now we could strictly define absolute colors and the
exact typographical meaning of "bold", and nail down every detail
of how ODF renders, but in the end you would have something quite different
than ODF. It is a trade-off. HTML's rendering model has much
greater latitude than PDF does, especially when dealing with text flow
and different window sizes. HTML can reflow. PDF just scales.
So which is more interoperable? The pre-press person and the
person trying to read the document on a Blackberry might respond differently.
That said, I think there is room for this proposed
TC to tackle some of the rendering issues. We're not going to turn
ODF into PDF. But we can certainly identify the areas where implementations'
divergent renderings cause the greatest interoperability problems, and
propose changes to the vendors and to the ODF TC to improve the situation.
In the end, interoperability problems can come from
problems in the standard or problems in the application. On the standard
side we have:
1) Ambiguities — The specification may describe a
feature in a way that is open to more than one interpretation. This may
be caused by imprecise language, or by incomplete description of the feature.
For example, if a specification defines a sine and cosine function, but
fails to say whether their inputs are in degrees or radians, then this
function is ambiguous.
2) Out of scope features — The specification totally
lacks description of a feature, making it out of scope for the standard.
For example, neither ODF nor OOXML specifes the storage model, the syntax
or the semantics of embedded scripts. If a feature is out of scope, then
there is no expectation of interoperability with that feature.
3) Undefined behaviors — These may be intentional
or accidental. A specification may explicitly call out some behaviors as
"undefined", "implementation-dependent" or "implementation-defined".
This is often done in order to allow an implementation to implement the
feature in the best performing way. For example, the size of integers are
implementation-defined in the C/C++ programming languages, so they are
free to take advantage of the capabilities of different machine architectures.
Even a language like Java, which goes much further than many to ensure
interoperability, has undefined behaviors in the area of multi-threading,
for performance reasons. There is a trade-off here. A specification that
specifies everything and leaves nothing to the discretion of the implementation
will be unable to take advantage of the features of a particular platform.
But a specification that leaves too much to the whim of the implementation
will hinder interoperability.
3) Errors — These may range from typographical errors,
to incorrect use of control language like "shall" or "shall
not", to missing pages or sections in the specification, to inconsistency
in provisions. If one part of the specification says X is required, and
another says it is not, then implementations may vary in how feature X
is treated.
4) Feature Creep — A standard can collapse under
its own weight. There is often a trade-off between expressiveness of a
standard (what features it can describe) and the ease of implementation.
The ideal is to be very expressive as well as easy to implement. If a standard
attempts to do everything that everyone could possibly want, and does so
indiscriminately, then the unwieldy complexity of the standard will make
it more difficult for implementations to implement, and this will hinder
interoperability.
And on the application side we have:
1) Implementation bugs — Conformance to a standard,
like any other product feature, gets weighed against a long list of priorities
for any given product release. There is always more work to do than time
to do it. Whether a high-quality implementation of a standard becomes a
priority will depend on factors such as user-demand, competition, and for
open source projects, the level of interest of developers contributing
to the community.
2) Functional subsets — Even in heavily funded commercial
ventures standards support can be partial. Look at Microsoft's Internet
Explorer, for example. How many years did it take to get reasonable CSS2
support? When an application supports only a subset of a standard, interoperability
with applications that allow the full feature set of the standard, or a
different subset of the standard, will suffer.
3) Functional supersets — Similarly, an application
can extend the standard, often using mechanisms allowed and defined by
the standard, to create functional supersets that, if poorly designed,
can cause interoperability issues.
4) Varying conceptual models — For example, a traditional
WYSIWYG word processor has a page layout that is determined by the metrics
of the printer the document will eventually print to. But a web-based editor
is free from those constraints. In fact, if the eventual target of the
document is a web page, these constraints are irrelevant. So we have here
a conceptual difference, where one implementation sees the printed page
as a constraint on layout, and another application is in an environment
where page width is more flexible. Document exchange between two editors
with different conceptual models of page size will require extra effort
to ensure interoperability.
(Users are also part of the interoperability equation.
A user who enters "see page 23" rather than using a dynamic
link, or who right aligns a page header with by inserting 57 spaces, that
user is creating a non-portable document, in the same way that a programmer
writing C code that depends on the size of an integer is writing non-portable
code.)
From a practical standpoint, what I've found, based
on the interoperability workshop we had in Barcelona last year, was that
the functional subsets problem was the major contributor to rendering interoperability
problems. This was obviously based on the relative maturity of implementations.
Not everyone had implemented all of the standards. We also
found plenty of implementation bugs. But I didn't see any cases where
one vendor said "I thought the spec said X" and another said
"I thought the spec said Y". So that is why I think creating
an ODF test suite is a worthwhile endeavor.
Regards
-Rob
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]