dita message

Subject: Re: [dita] Impacts of lightweight topics not being self-describing

From: Michael Priestley <mpriestl@ca.ibm.com>
To: Don R Day <donrday@contelligencegroup.com>
Date: Tue, 27 May 2014 09:25:44 -0400

You make some really good points. For the metadata, do you have a sense of what's really needed for lightweight, and would be directly authorable?

My first thought would be:

- map categories to attributes (unless they have to be human-readable text, not controlled keys)
- add prolog for keywords (tags) and data (everything else)
- control inclusion of the prolog with an entity in the DTDs so that it's easy to create a DTD without one (for example, to match markdown)

Let me know what you think,

Michael Priestley, Senior Technical Staff Member (STSM)
Total Information Experience (TIE) Technology Strategist
mpriestl@ca.ibm.com
http://dita.xml.org/blog/25

From: Don R Day <donrday@contelligencegroup.com>
To: Michael Priestley/Toronto/IBM@IBMCA, OASIS DITA TC List <dita@lists.oasis-open.org>,
Date: 05/20/2014 10:46 AM
Subject: [dita] Impacts of lightweight topics not being self-describing
Sent by: <dita@lists.oasis-open.org>

Lightweight DITA can display well enough in a live-rendering universe if directly linked or invoked. But because of the lack of prolog in the data model, these topics are invisible to file-based searches that attempt to make collections based on metadata (other than features already in the topics: doctype/topictype, XPath to content, or filename).

The baseline for functionality for comparison is a typical blog or wiki entry. The SQL schemas for this type of content [1][2] typically include:

postID (usually the primary key by which the item is stored; equivalent to either a topic ID or a topic filename, whichever one chooses to be the key locator)
postTitle
postBody (which, if it contains a "more" PI can be divided into an excerpt/shortdesc and "the rest of the body")
comment key (if not the postID itself)
feature image (may be used in sliders or feature posts, but not always part of other rendered content such as sidebar snippets)
kicker title

and a comprehensive set of data used to select by collection type:

tags (folksonomy or enumerated terms, often used in tag clouds)
categories (faceted filtering)
author (collections by author, with foreign keys into member/user tables)
date (collections by creation, publish date, archive date, etc.)
edit notes and/or status
related posts
included media (for collecting topics that contain videos or UI exhibits, for example)
obscure other, depending on application (Drupal node types and relations for example)

For the baseline case, this data represents one row of self-described content retrieved by "select * where id=$postid". By contrast, for the file entity case, the Lightweight DITA topic as an "entity-as-row" basically self-presents only the first set of data (and not all, at that); the rest must be carried in a hybrid database entry as needed. In other words, it is not possible for Lightweight DITA to as a "file-only" entity to self-represent the equivalent data set as the "database-only" baseline.

Whether this restricted inherent data model is important or not depends on your application. It complicates the logical data access layer, which must be hybrid rather than one or the other. Regular DITA topics come close to the baseline equivalency, if used with some metadata conventions. [3] For example a microsite or landing page application, the full DITA topic is usually sufficiently self-describing (as long as you explicitly identify "feature" images as othermeta, for example, and have a convention for retrieving them). And don't use all the processing features that complicate direct rendering.[4]

All noted in order to ensure that Lightweight DITA is appropriately disclaimed against user expectations for an equivalently simpler application. The current model only makes the input side simpler. Until someone needs to enter other metadata into another form.

The alternative, to be weighed against the message of "utter simplicity," is that we add some of these features back in without being strictly limited by the contentEditable feature set, with the expectation of using a hybrid editor (with input fields for discrete metadata and contentEditble divs for the discourse, which is probably how a LWD editor will be designed anyway).

And I think it will be soon time to get a Subcommittee started where we can begin channeling these design discussions in their own list. I'm willing to lead in the initiation of this, if needed.

-----
[1] http://codex.wordpress.org/Database_Description
[2] http://www.mediawiki.org/wiki/Manual:Database_layout
[3]But the gaps lead me to continue thinking towards a "DITA for the Web" that breaks with strict compatibility (and therefore would not be called "DITA" when that time comes) in order to enable Web applications to use structured content in ways that the current standard inhibits.
[4] https://groups.yahoo.com/neo/groups/dita-users/conversations/messages/34990

--
Don R. Day
Co-Founder, ContelligenceGroup.com
Past Chair, OASIS DITA Technical Committee
LinkedIn: donrday Twitter: @donrday
About.me: Don R. Day Skype: don.r.day
"Where is the wisdom we have lost in knowledge?
Where is the knowledge we have lost in information?"
--T.S. Eliot

References:
- Impacts of lightweight topics not being self-describing
  - From: Don R Day <donrday@contelligencegroup.com>