cgmo-webcgm message

Subject: Re[2]: [cgmo-webcgm] Text searching
From: Lofton Henderson <lofton@rockynet.com>
To: Benoit Bezaire <benoit@itedo.com>,cgmo-webcgm@lists.oasis-open.org
Date: Thu, 04 May 2006 18:26:17 -0600
Hi Benoit,

Some technical replies for you (and Dieter)...

At 06:31 PM 5/3/2006 -0400, Benoit Bezaire wrote:
>I'm seeing the emails coming in about this topic. And I have to state
>that I don't understand how people get to such an understanding of the
>feature by reading what is in the specification. More inline...
>
>Wednesday, May 3, 2006, 6:04:20 PM, you wrote:
> > Benoit,
>
> > I think the example does not reflect the intentions of the authors.
>
> > It should be like this
> >>   (approx syntax)
> >>   BEGAPS 'myPara'
> >>    APSATTR 'content' 'Hello World';
> >>    ...
> >>    BEGAPS 'mySubpara'
> >>     APSATTR 'content' 'World';
> >>     ...
> >>    ENDAPS;
> >>   ENDAPS;
>
> > Hence the content attribute of the para would contain all the text of the
> > para, whereas the attribute of the subpara woul contain the text of
> > the subpara only.
>Hmmm. Isn't this an assumption? I could see it use this way when using
>para/subpara on a raster; but that may not always be the case.

Perhaps it is an assumption, but it seems to me to be at least hinted by 
the text of 3.2.1.3, 3.2.1.4, and 3.2.2.8.  (Or ... perhaps I'm too biased 
by what the 1.0 authors meant to say, but that they didn't express 
unambiguously.)


>Regardless, doesn't Chris' question still stand?

That question is:  is para a block and subpara an inline?  Yes, we're going 
to have to answer the question somehow.  There are a couple of problems here.

First problem, para and subpara (as you pointed out in your proposed reply) 
are APS objects which group stuff which might not even be text.  So the 
question, as it stands, seems meaningless.  However, para+content could be 
viewed as a surrogate for or abstraction of the textual-related thingy 
inside its APS, and similarly for subpara+content.  Then you could phrase 
the question about those "surrogates".

Second problem, I still don't know what block and inline mean (Chris is 
consulting with an i18n guy before sending more info).  But from XHTML, a 
block element is like a 'p' and an inline element is like a 'span'.  Let's 
suppose HTML had a 'content' attribute (maybe you could do this example 
with 'title' attribute, which is typically used for a tooltip).

<p content="???">Hello <span content="world">world</span></p>

Would you expect ??? to reflect the entire content of the <p> element, or 
only that portion of the <p> element that is outside of the <span>?  I 
would expect the first, i.e., ??? should be "Hello world".

This is the way I think about para and subpara (and apparently some others 
do as well).  However, from the example that Chris posed, I may be entirely 
off base as to the meaning of "block" and "inline".

More...


> >> -----Original Message-----
> >> From: Benoit Bezaire [mailto:benoit@itedo.com]
> >> Sent: Wednesday, May 03, 2006 11:53 PM
> >> To: cgmo-webcgm@lists.oasis-open.org
> >> Subject: [cgmo-webcgm] Text searching
> >>
> >> Hi,
> >>
> >>   On the call today, Chris asked me the following question... Assume
> >>   we have:
> >>
> >>   (approx syntax)
> >>   BEGAPS 'myPara'
> >>    APSATTR 'content' 'Hello';
> >>    ...
> >>    BEGAPS 'mySubpara'
> >>     APSATTR 'content' 'World';
> >>     ...
> >>    ENDAPS;
> >>   ENDAPS;
> >>
> >>   And he does a text search on the string "Hello World", will he get a
> >>   hit, yes or no?
> >>
> >>   I believe this to be an indirect way of asking/answering if
> >>   'subpara' is an inline or a block.
> >>
> >>   If we say, yes there's a hit, then we've defined 'subpara' as
> >>   inline, if we say, no there's no hit, it's a block.

I'd say "no hit".  But the problem here is that the 1.0 authors designed 
this with a very specific ad hoc semantic in mind -- like <p> and <span> -- 
and the question is ... well, baffling to me still.

That doesn't mean that we can't answer it, once we know what block and 
inline mean, but we need to be a little careful of adding semantic that 
wasn't there and not intended in 1.0.

Btw, we have other under-spec problems as well.  In this example

BEGAPS 'myPara'
   APSATTR 'content' 'Hello World';
   ...
   BEGAPS 'mySubpara'
      APSATTR 'content' 'World';
      ...
   ENDAPS;
ENDAPS;

Does a search on "World" return the para or the subpara?  (I would say the 
subpara -- "closest to leaf" -- and I think this is what users like Dave 
would expect.)

> >>
> >>   What's the answer?
> >>   The specification says the following (for para)... The WebCGM
> >>   prescription for priority of text search matching is: 'para' with
> >>   matching 'content' (1st priority match); 'para' without 'content'
> >>   but with recognizable single-element RESTRICTED TEXT match (2nd
> >>   priority match); or, single-element RESTRICTED TEXT match, outside
> >>   of any 'para' (3rd priority match).
> >>   And for subpara: See 3.2.1.3, 'para'.
> >>
> >>   In other words, it's not specified :(
> > I think that Chris wants to build a logical relationship between the
> > attributes where there is none. You search ONE attribute at a time,
> > not a combination of nested attributes.
>I don't get to the same conclusion. The above wording doesn't even say
>how to perform a search within RESTRICTED TEXT and APPEND TEXT
>(without the 'content' attribute).

As I suggested yesterday, perhaps that search-priority specification should 
be made into recommendations for search applications, non-normative, along 
with some clarification/guidance for how we expect 'content' to be used on 
para and subpara?  (Hello World on para, and just World on subpara).

More about RT and AT below.


> >>
> >>   Chris made it relatively clear that if we want to have these APS
> >>   types in WebCGM 2, we need to improve how they are specified.

Reluctantly agree.  But I think (as I said above), we need to be careful 
about adding (e.g., from some W3C CharMod model) some concepts or semantics 
that are unrelated to the original purpose of para/subpara/content.

Question for Dave: did this stuff derive from something in ATA?

> > I agree that this is all underspecified, however, the entire search
> > is still wide open, no syntax, nothing.
>I'm not sure what you mean by syntax? I would expect this to be a
>vendor feature (like the Search functionality in Web Browsers).
>
> > The only way to get access is limited by the DOM functions, which don't
> > allow you to access the RESTRICTED TEXT anyway if I remember this
> > correctly.
>
> > So right now, whoever wants to search, can retrieve the content
> > attribute of a para or subpara using the DOM, and he can then do
> > whatever he wants to perform a search therein.
>That's sounds quite difficult to perform from a user's perspective.
>
> > I want to point out that I brought up this issue several times, it
> > is an important requirement of the Navy, but the group decided to
> > turn this down and to not define text search in WebCGM 2.0.
>Well, maybe it will have to be defined after all.
>
>Kind regards,
>  Benoit   mailto:benoit@itedo.com
>
> > Regards,
> > Dieter
> >>
> >>   So here are some thoughts...
> >>   I see RESTRICTED TEXT as a block.
> >>   I see APPEND TEXT as an inline.

That's a novel view!  Seriously, it is an intriguing idea.  But it diverges 
from the conventional ISO CGM:1999 picture of RT and AT.  AT is a syntactic 
artifice, invented solely for the purpose of changing text attributes 
within a single text primitive.  If you look at pages 108-111 of CGM:1999, 
you'll see that only a handful of things -- basically just text attributes 
-- are allowed between RT and AT.  So for example this is illegal:

BEGAPS 'myPara'
    APSATTR 'content' 'Hello World';
    RestrText (x,y,width,height) "Hello ";
    BEGAPS 'mySubpara'
       APSATTR 'content' 'World';
       ApndText final "World";
    ENDAPS;
ENDAPS;

Which is not to say that we couldn't put some search semantics, or impose a 
block/inline model, on a sequence of RT+AT+... +AT(final).  But I'd prefer 
that we don't go there.

> >>
> >>   So regardless of para/subpara/content... If 'Hello' is in a
> >>   RESTRICTED TEXT and 'World' in a child APPEND TEXT, a search on
> >>   "Hello World" would generate a hit. Anyone agrees with me?

Well, if there were a 'content' match, then 1.0 says that generates the hit 
(1st priority).

But assuming no content match, RT"Hello " + AT"World" would generate a hit 
for Hello World, IMO.  But I say that because, in my reading of CGM 
:1999,  RT+AT+...+AT is logicially a single, single-line text 
primitive.  Not because of a block-inline model (which I don't yet understand).

> >>
> >>   I would be tempted to use the same logic on 'content'. I.e., if
> >>   'content' is specified on a para, it's a block. If it's specified on
> >>   a child subpara, it's an inline. However, I don't know if the
> >>   current search functionality provided by vendors adopts the same
> >>   logic?!

I think it does not.  But the vendors and users are the ones to consult on 
this -- some have spoken, like Forrest and Dave (whom I associate with the 
origin of this stuff, for Boeing and/or ATA application)

> >>
> >>   I'm still waiting for more information from Chris about this, but
> >>   why not get the conversation started right away within the group?

Okay.

Btw, how would you define block and inline?  You seem to be getting a 
pretty good working sense of them.

Best,
-Lofton.
Follow-Ups:
- Re[3]: [cgmo-webcgm] Text searching
  - From: Benoit Bezaire <benoit@itedo.com>
References:
- RE: [cgmo-webcgm] Text searching
  - From: Dieter Weidenbrück <dieter@itedo.com>
- Text searching
  - From: Benoit Bezaire <benoit@itedo.com>
- Re[2]: [cgmo-webcgm] Text searching
  - From: Benoit Bezaire <benoit@itedo.com>