OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

dita-lightweight-dita message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [dita-lightweight-dita] Full DITA compatibility


There are several approaches.

Same doctype, segregated storage: This is the "use DITA more simply" approach. It works well for expeDITA because the data managed by the site is not directly updated by any tools from outside the site. You cannot use the site to import and browse latest DITA 1.3 content because support exists only for DITA 1.1. On the other hand, content developed within a subsetted ecosystem is directly consumable by other standard tools. Just don't use a round trip excursion to add beyond-1.1 dependencies into the subsetted content.

Same doctype, subsetted DTD, segregrated storage: This is the Bernard Aschwanden school of modifying the .mod content models to remove elements. A resulting document constrained in this way is fully a subset of the referenced DTD. But externalized content has the same problem of "complexity infection" during its round trip that would make it invalid when imported back into the constrained context. Still fully consumable by standard tools. This has the advantage of actually making the DTD loading overhead somewhat smaller for dynamic-rendering applications. Content originated under a subsetted DTD can be authored in a more conventional DITA environment to have access to all the usual DITA features, but at that point it cannot go back to the simplied authoring interface.

Different doctype: When the unique Lightweight DITA doctype is present on a document, it can be shared freely across applications that are aware of that doctype. In a way, Michael's current LwD DTDs are a form of subsetted DTD with a new doctype, which averts the problem of unintended "complexity infection" and invalidation that can occur in the Aschwanden approach. But note! At this point, you CAN go further and reduce the DTDs by subsetting a la Aschwanden to produce an even more constrained authoring subset, as long as you abide by the guidelines for "same doctype, subsetted DTD." In other words, given any standard full definition of an application scope, you can always subset it, but your new application must manage the export/import of such content to the outside world.

Markdown: This form of content has an implicit doctype that is fairly easy to unpack reliably into any uphill DITA application (per Jarno's on the fly import for build). The authoring DTD (as it were) can be viewed as a subsetted form of HTML relying strongly on SGML-ish shorttag-defined impliable markup. It is clever and workable; I personally don't like learning another way to author, but if someone knows and love this, power to them, and we can support them. The complexity infection problem still exists because once this content is uplifted to DITA and new DITA features added at that time, there is no going back without loss of that added function.

Paragraph styles: We can abstract the experience back to word processors and dictation systems and form fields as long as editors in those systems can be guided by very simple schemas for allowable "new component" insertion rules that fit a desired information type. I explained this approach here: http://learningbywrote.com/blog/2011/04/dictation-for-structured-writing/ . Note that the dictation chunks are very little different from fields in a form where the form allows new fields to be inserted as allowed by the schema's contextual cues.

Template-informed compliance: A master template can generate a form as it is parsed, and hints in the template can make the forms engine aware of allowable changes to the ingested structure. I like this approach quite a bit because the template can be authored in a standard DITA environment to ensure its own validity. But I have not gotten it to work smoothly yet. Still, it is an approach that some parties may elect. Did I say I like it quite a bit? I have a demo of the ability to ingest anything; I just have not put in the controls to allow modifying the form options once loaded, and pulling it all back out into storage.

And at this point, let's acknowledge that if we can parse any of the above file-based formats into form fields, then we can store the "document" as fields in a relational database. At this point, queries against the content become much easier since we don't have to run grep across all the files in a folder (apart from XML databases, which preserve the document as an entity). One analogy to this is the "bottle tossed into the sea": is the bottle in the sea, or is the sea in the bottle? Form and content are merely views applied by the desired storage model. With great effort the file and the database can be made to store content in fully equivalent modes.

In all the simplified approaches, we tend to lose the ability to provide semantically enriched phrases and lists. The semantics can be inferred by context (ie, a <em>value</em> in an API reference is usually a var) or by sidebar-availed widgets that apply selected terms to the metadata associated with a field (popularized by Rick Yagodich).

Given the ability to infer segmentation when an apparent doctype is known, we can skip on some amount of direct representation of section-level scoping in any of the markups. It comes with tradeoffs, of course. I have racked my  head over how to support cross-enterprise organizations using common authoring tools and I have concluded that the problem is not easily solved without a lot of investment: either in separate authoring tools appropriate for each author group, or a rich but expensive editor adapted to each author group. Either way, the content flows still need to be managed as per allowable "complexity infection" in the directions that the content may flow.

I believe the spec team does not need to define the authoring shortcuts as part of the spec; that is for user communities to do as they need. But in being aware of how far down the complexity scale those users may go, we can try to ensure that we don't prevent such flexible interpretations.

And yes, if I let this pattern of thinking get to me, it does keep me awake at night. We'll need to be careful not to boil the ocean. At least not at first!
--
Don

On 5/14/2015 6:56 AM, Noz Urbina wrote:
So then you're suggesting we could define both the production DTD and an
authoring DTD and guidelines on inferring from one to the other? As we're
not developing the tools, this seems like a very new design approach
indeed for the spec team.

On Mon, May 11, 2015 6:08 pm, Don R Day wrote:
Whether the scope of a section could be derived from a heading; whether
a section wrapper needs to be reflected in the authoring interface (it
would not need to be if the scope of the section were represented by a
form field); whether title/body chunks are topics or sections, which
perhaps an author need not distinguish anyway... and the overall
observation that if we are looking at things from an author's viewpoint,
containing divisions are not always apparent anyway. My point is that an
authoring DTD can represent the parts that need to be made salient for
authors while still being a proper or inferrable subset of a more
complete model--a freedom of design that we can take advantage of.

On 5/11/2015 10:21 AM, Noz Urbina wrote:
What specifically would be in the authoring DTD vs Production in this
discussion? I'm not clear.


On Mon, May 11, 2015 4:21 pm, Don R Day wrote:
It may be useful during this discussion to keep in mind the distinction
between an authoring DTD and a production DTD. The authoring DTD
represents the necessary aspects of the information model that a writer
needs to be concerned with; in many ways it describes the ideal
authoring experience (where I will plug Rick Yagodich's useful book
Author Experience) that dovetails into (but doesn't surface) the more
arcane requirements of the underlying information model in the CMS.

Much of this discussion seems to be about the authoring model, and that
is fine as long as we keep that model separate from the underlying
information model (to avoid saying data model, which is more accurate
but not in the typical marketer's vocabulary).

On the other hand, after we've discussed this idealized authoring
model,
will the representative deep information model be at all satisfying to
the actual business requirements of the organization, and will all
constituencies buy into it?

This begets a hard question: can we ever get XML into the typical tools
that support Web-based organizations?

To be honest, selling XML is itself like selling fly strips, which are
still useful but out-convenienced by dozens of more visually appealing
_javascript_-based alternatives. The ideal role for XML may lie in
defining the relationship between the authoring model and the actual
systems where users will be storing their data, which is largely in
field-oriented databases rather than file systems. XML defines and
guides the templates; the DITA processing logic itself gets transferred
to libraries that enable existing CMS publishing systems to emulate
DITA
behaviors that are described in terms that have value to Web
programmers: reuse of components and the ability to apply
personalization/adaptation to content requests (and perhaps
others--these need to be teased out as DITA's value-add to the Web
production stack used by marketers).

By the way, I totally agree that HTML5 should still consider the role
of
an unleveled heading. My preference would be for <label> which could
then be used in fig, section, table, and other places where our HTML
transforms have crudely mapped to an H5 or a bolded paragraph for lack
of better match on the HTML side. The presence of <figcaption>in HTML5
justifies the general concept; they just need to get it into more
contexts where it can be used the same way--to label chunks that are
inline, not hierarchical. Off my soapbox now.
--
Don

On 5/11/2015 5:55 AM, Joe Pairman wrote:
Cheers Noz. Just two quick clarifications before I need to leave the
thread for now:

I think that having users set whether a title is a title for nav or
not
is
a simple work-around. Although that doesn't really fit well
semantically
into @chunk (not that @chunks is particularly clear or
straight-forward
at
the moment. @chunk=to-self could turn on titles in nav? I'm just
riffing
here).
Where there were a need for authors to manually switch on/off
navigation
for specific nodes, something along those lines makes sense. As a
general point, though, it's always been up to implementors how to
define
these kinds of navigational / presentational rules, and Lightweight
DITA
doesn't attempt to constrain that side of things further (at least if
I've understood the initiative correctly).

(I'd started talking about navigation as an illustration of the intent
of <section>. While you *could* get section titles into navigation, it
seems like going against the grain, and you'd still be prevented from
nesting sections.)

Of course the tradeoff is that you can't then easily reorder the
nested
topics to suit a particular output context...
There's no problem with reordering nested topics provided the parent
topic
content doesn't move...
Right. I just wanted to point out that the tradeoff of keeping child
topics in the same storage object is that you have to edit the CMS
object / file to reorder them. If you're doing it for a specific
output
context only (re-ordering for a particular audience segment or
product,
perhaps), it means duplicating the object in some way. When each topic
is a separate storage object, you can re-order topics in a specific
map
for that output context.
---------------------------------------------------------------------
To unsubscribe from this mail list, you must leave the OASIS TC that
generates this mail.  Follow this link to all your TCs in OASIS at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php


        

      


--
Don R. Day
Founding Chair, OASIS DITA Technical Committee
LinkedIn: donrday   Twitter: @donrday
About.me: Don R. Day   Skype: don.r.day
"Where is the wisdom we have lost in knowledge?
Where is the knowledge we have lost in information?"
--T.S. Eliot



Avast logo

This email has been checked for viruses by Avast antivirus software.
www.avast.com




[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]