cgmo-webcgm message

Subject: RE: Re[3]: [cgmo-webcgm] Text searching
From: "DULUC, Franck" <franck.duluc@airbus.com>
To: "Cruikshank, David W" <david.w.cruikshank@boeing.com>,"Lofton Henderson" <lofton@rockynet.com>,"Benoit Bezaire" <benoit@itedo.com>,<cgmo-webcgm@lists.oasis-open.org>
Date: Tue, 9 May 2006 10:12:59 +0200

All,

This is back to the requirement to have a "view" on the CGM file to include it in larger publishing framework (and not only consultation application framework).

What about a DOM for Editor and/or converter, as this knowledged could be xtracted from the cgm file at this time (as we are doing for the A380)?

This is a question and I do not know if there is a real business case for the vendors, although I know there is a requirement for the Technical Data people (Boeing, Airbus, ASD, ...)

Regards,

Regards,

Franck.

> -----Message d'origine-----
> De : Cruikshank, David W [mailto:david.w.cruikshank@boeing.com]
> Envoyé : dimanche 7 mai 2006 06:32
> À : Lofton Henderson; Benoit Bezaire; cgmo-webcgm@lists.oasis-open.org
> Objet : RE: Re[3]: [cgmo-webcgm] Text searching
>
>
>
>  I'll try to address how "content" is used at Boeing (albeit our
> implementation of CGM V4 predates WebCGM and even the ATA IGExchange
> stuff (although this spec contained the concept of para and
> subpara with
> content attributes...as unspecified as it is in WebCGM).  In our IETM
> (PMA) we do capture content on para.  Subpara is really
> implemented as a
> substring with pointers to capture and reference in the middle of a
> "para".  The para really contains the content.  The implementation of
> searchable text in PMA is that when we "build" the document
> for PMA all
> the "content" strings are externalized and run through the
> text indexer
> along with all the other SGML text in the document, so when you do a
> full text search on the document you get hits in both the text and
> graphics.  This is not the same as a DOM call to a viewer to
> search text
> within a single CGM file.  The major use case or searching in a single
> cgm file is for a complicated graphic like a wiring diagram.  You may
> know that a wire in on a particular diagram, but is is very
> difficult to
> spot it by looking.  The ability to search a bunch of graphics for a
> string is outside the scope of a WebCGM viewer, but the content
> attribute on para facilitates that in the IETM.  Content on subpara is
> probably of less importance.
>
> Dave
>
>
> Technical Fellow - Graphics/Digital Data Interchange
> Boeing Commercial Airplane
> 206.544.3560, fax 206.662.3734  <-- NEW NUMBERS
> david.w.cruikshank@boeing.com
>
> -----Original Message-----
> From: Lofton Henderson [mailto:lofton@rockynet.com]
>
> Sent: Saturday, May 06, 2006 11:34 AM
> To: Benoit Bezaire; cgmo-webcgm@lists.oasis-open.org
> Subject: Re[3]: [cgmo-webcgm] Text searching
>
> Let me extract a couple of points up front
>
>
> 1.)  Process/procedural:  About the current under-defined state of
> para/subpara/content, you said, "And according to recent W3C standards
> would not make it into the spec if not corrected."  An interesting
> question about this:  if W3C approved this twice as Rec (1999
> and 2001),
> and if it has been in 1.0 for 7 years, and in the field and
> implementations (if indeed anyone is using it) ... to what
> degree is it
> appropriate for W3C to try to force this legacy 1.0 stuff to
> be revised
> and cleaned up to current standards?  I.e., how hard will they push on
> it?
>
> Regardless of the answer, we should definitely do something -- Chris's
> question must be answered.  But how far should we go?  It seems that
> this has progressed somewhat like the drawing-model issue.  We have
> gotten much deeper into it than answering Chris's question.  That's
> good, we need to understand the situation fully and clearly.  But then
> we need to decide what to do about it.
>
> 2.) This is probably premature until we have Chris's
> clarification about
> block/inline comment.  But pretty soon it would be nice to
> progress to a
> concrete proposal for and answer to Chris changes to the (1.0)
> specification.
>
> I'll make a few more comments inline...
>
> At 09:34 AM 5/5/2006 -0400, Benoit Bezaire wrote:
> >See inline...
> >
> >Thursday, May 4, 2006, 8:26:17 PM, you wrote:
> > > Hi Benoit,
> >
> > > Some technical replies for you (and Dieter)...
> >
> > > At 06:31 PM 5/3/2006 -0400, Benoit Bezaire wrote:
> > >>I'm seeing the emails coming in about this topic. And I have to
>
> > >>state that I don't understand how people get to such an
>
> > >>understanding of the feature by reading what is in the
> specification. More inline...
> > >>
> > >>Wednesday, May 3, 2006, 6:04:20 PM, you wrote:
> > >> > Benoit,
> > >>
> > >> > I think the example does not reflect the intentions of the
> authors.
> > >>
> > >> > It should be like this
> > >> >>   (approx syntax)
> > >> >>   BEGAPS 'myPara'
> > >> >>    APSATTR 'content' 'Hello World';
> > >> >>    ...
> > >> >>    BEGAPS 'mySubpara'
> > >> >>     APSATTR 'content' 'World';
> > >> >>     ...
> > >> >>    ENDAPS;
> > >> >>   ENDAPS;
> > >>
> > >> > Hence the content attribute of the para would contain all the
>
> > >> > text
> > of the
> > >> > para, whereas the attribute of the subpara woul
> contain the text
>
> > >> > of the subpara only.
> > >>Hmmm. Isn't this an assumption? I could see it use this way when
>
> > >>using para/subpara on a raster; but that may not always
> be the case.
> >
> > > Perhaps it is an assumption, but it seems to me to be at least
>
> > > hinted by the text of 3.2.1.3, 3.2.1.4, and 3.2.2.8.  (Or ...
> > > perhaps I'm too biased by what the 1.0 authors meant to say, but
>
> > > that they didn't express unambiguously.)
> >Sorry, I disagree.
> >
> >There's no hint in there which says 'content' on a para MUST contain
>
> >all text strings found in all subpara 'content's. I see things like
>
> >'may be used to identify text', 'can potentially enable text search',
>
> >'identifying matches [...] is not specified in WebCGM.', 'may be used
>
> >to identify smaller fragments', 'This enables, for example [...]'.
> >
> >What para/subpara/content is suppose to do, is far from clear. And
>
> >according to recent W3C standards would not make it into the spec if
>
> >not corrected.
>
> I'll accept that the document doesn't explain it clearly. 
> The fact that
> four "old-timers" have expressed the same view of it probably
> means that
> we all talked about it in 1999 and over the years, and
> evolved something
> of a common understanding, but that cannot be divined from
> the 1.0 text.
> Fair enough.
>
> (Don't be put off by my use of "old timers" -- it is only meant to
> signify those who have been around from the beginning, and share a
> common but poorly written understanding.)
>
>
> > >>Regardless, doesn't Chris' question still stand?
> > > That question is: is para a block and subpara an inline? 
> Yes, we're
>
> > > going to have to answer the question somehow.  There are
> a couple of
>
> > > problems here.
> >
> > > First problem, para and subpara (as you pointed out in
> your proposed
> > > reply) are APS objects which group stuff which might not even be
>
> > > text.  So the question, as it stands, seems meaningless.  However,
> > > para+content could be viewed as a surrogate for or abstraction of
> > > the textual-related thingy inside its APS, and similarly for
> > > subpara+content.  Then you could phrase the question about those
> > > "surrogates".
> >You are playing with words here!
> >On the call we explained to Chris that para/supara were not text
>
> >elements but APS. But his question still stand and has now become:
> >is para+'content' a block and subpara+'content' an inline.
> >
> > > Second problem, I still don't know what block and inline
> mean (Chris
>
> > > is consulting with an i18n guy before sending more info).
> >I agree.
> >
> > > But from XHTML, a block element is like a 'p' and an
> inline element
>
> > > is like a 'span'.
> >Yes.
> >
> > > Let's suppose HTML had a 'content' attribute (maybe you could do
>
> > > this example with 'title' attribute, which is typically used for a
>
> > > tooltip).
> >
> > > <p content="???">Hello <span content="world">world</span></p>
> >
> > > Would you expect ??? to reflect the entire content of the <p>
>
> > > element, or only that portion of the <p> element that is
> outside of
>
> > > the <span>? I would expect the first, i.e., ??? should be "Hello
>
> > > world".
> >I would have no expectation. I don't know any specification that puts
>
> >restrictions on character data for an attribute. It's either a
>
> >predefined set of values or plain character data.
> >
> >I think using HTML 'alt' would be a better comparison... and you will
>
> >notice that it can only be specified on IMG, AREA, APPLET, and INPUT.
> >It cannot be used on <p> and <span>, thus most (if not all)
> the WebCGM
>
> >problems related to this do not exist in HTML.
> >
> > > This is the way I think about para and subpara (and
> apparently some
>
> > > others do as well).  However, from the example that Chris posed, I
>
> > > may be entirely off base as to the meaning of "block" and
> "inline".
> >I don't think we are way off on the block/inline thing. But
> I do think
>
> >that using an attribute (content) on APS which can be nested and
>
> >possibly already readable, to be a mistake.
>
> I disagree.  There is a perfectly simple explanation:  'content' on
> 'para'
>
> should reflect the text content of the entire 'para' APS; 'content' on
> 'subpara' should reflect the text content of the entire 'subpara'
>
> APS.  Period.  (By "text content", I mean the RT elements, or the text
> that is drawn by the filled polybeziers, rasters, etc).
>
> I claim that is the common understanding.  I believe it
> originated with
> is an ad hoc solution to needs of Boeing and/or ATA, that made its way
> into WebCGM 1.0.  (On this thread I have asked Dave to
> confirm or refute
> that, but he hasn't replied.)
>
> To be clear about "nested" ... as you know, 'subpara' (and only
> 'subpara') can be nested in 'para', and nothing can be nested in
> 'subpara'.
>
>
> > > More...
> >
> > >> >> -----Original Message-----
> > >> >> From: Benoit Bezaire [mailto:benoit@itedo.com]
> > >> >> Sent: Wednesday, May 03, 2006 11:53 PM
> > >> >> To: cgmo-webcgm@lists.oasis-open.org
> > >> >> Subject: [cgmo-webcgm] Text searching
> > >> >>
> > >> >> Hi,
> > >> >>
> > >> >>   On the call today, Chris asked me the following question...
> Assume
> > >> >>   we have:
> > >> >>
> > >> >>   (approx syntax)
> > >> >>   BEGAPS 'myPara'
> > >> >>    APSATTR 'content' 'Hello';
> > >> >>    ...
> > >> >>    BEGAPS 'mySubpara'
> > >> >>     APSATTR 'content' 'World';
> > >> >>     ...
> > >> >>    ENDAPS;
> > >> >>   ENDAPS;
> > >> >>
> > >> >>   And he does a text search on the string "Hello
> World", will he
> get a
> > >> >>   hit, yes or no?
> > >> >>
> > >> >>   I believe this to be an indirect way of asking/answering if
> > >> >>   'subpara' is an inline or a block.
> > >> >>
> > >> >>   If we say, yes there's a hit, then we've defined
> 'subpara' as
> > >> >>   inline, if we say, no there's no hit, it's a block.
> >
> > > I'd say "no hit".  But the problem here is that the 1.0 authors
> designed
> > > this with a very specific ad hoc semantic in mind -- like <p> and
> <span> --
> > > and the question is ... well, baffling to me still.
> >
> > > That doesn't mean that we can't answer it, once we know what block
> and
> > > inline mean, but we need to be a little careful of adding semantic
> that
> > > wasn't there and not intended in 1.0.
> >
> > > Btw, we have other under-spec problems as well.  In this example
> >
> > > BEGAPS 'myPara'
> > >    APSATTR 'content' 'Hello World';
> > >    ...
> > >    BEGAPS 'mySubpara'
> > >       APSATTR 'content' 'World';
> > >       ...
> > >    ENDAPS;
> > > ENDAPS;
> >
> > > Does a search on "World" return the para or the subpara?  (I would
> say the
> > > subpara -- "closest to leaf" -- and I think this is what
> users like
> Dave
> > > would expect.)
> >I don't know what kind of searching you guys have in mind. But the
> >search functionality that I use on a daily basis (Dev Studio, email
> >search, PDF search, HTML/browser search)... would generate two hits;
> >the user than picks the one which is most relevant to him.
>
> One could treat this either like one of those searches, or like the
>
> generation of mouse hits from nested APSs.  I was espousing the
>
> latter.  But I don't care much, and I would actually like it
> best if we
>
> could avoid this depth of detailed specification.
>
>
> > >> >>   What's the answer?
> > >> >>   The specification says the following (for para)...
> The WebCGM
> > >> >>   prescription for priority of text search matching is: 'para'
> with
> > >> >>   matching 'content' (1st priority match); 'para' without
> 'content'
> > >> >>   but with recognizable single-element RESTRICTED TEXT match
> (2nd
> > >> >>   priority match); or, single-element RESTRICTED TEXT match,
> outside
> > >> >>   of any 'para' (3rd priority match).
> > >> >>   And for subpara: See 3.2.1.3, 'para'.
> > >> >>
> > >> >>   In other words, it's not specified :(
> > >> > I think that Chris wants to build a logical
> relationship between
> the
> > >> > attributes where there is none. You search ONE attribute at a
> time,
> > >> > not a combination of nested attributes.
> > >>I don't get to the same conclusion. The above wording doesn't even
> say
> > >>how to perform a search within RESTRICTED TEXT and APPEND TEXT
> > >>(without the 'content' attribute).
> >
> > > As I suggested yesterday, perhaps that search-priority
> specification
> > > should be made into recommendations for search applications,
> > > non-normative, along with some clarification/guidance for how we
> > > expect 'content' to be used on para and subpara?  (Hello World on
> > > para, and just World on subpara).
> >
> > > More about RT and AT below.
> >
> > >> >>
> > >> >>   Chris made it relatively clear that if we want to have these
> APS
> > >> >>   types in WebCGM 2, we need to improve how they are
> specified.
> >
> > > Reluctantly agree.  But I think (as I said above), we need to be
> > > careful about adding (e.g., from some W3C CharMod model) some
> > > concepts or semantics that are unrelated to the original
> purpose of
> > > para/subpara/content.
> >
> > > Question for Dave: did this stuff derive from something in ATA?
> >
> > >> > I agree that this is all underspecified, however, the entire
> search
> > >> > is still wide open, no syntax, nothing.
> > >>I'm not sure what you mean by syntax? I would expect this to be a
> > >>vendor feature (like the Search functionality in Web Browsers).
> > >>
> > >> > The only way to get access is limited by the DOM
> functions, which
> don't
> > >> > allow you to access the RESTRICTED TEXT anyway if I
> remember this
> > >> > correctly.
> > >>
> > >> > So right now, whoever wants to search, can retrieve the content
> > >> > attribute of a para or subpara using the DOM, and he
> can then do
> > >> > whatever he wants to perform a search therein.
> > >>That's sounds quite difficult to perform from a user's
> perspective.
> > >>
> > >> > I want to point out that I brought up this issue several times,
> it
> > >> > is an important requirement of the Navy, but the group
> decided to
> > >> > turn this down and to not define text search in WebCGM 2.0.
> > >>Well, maybe it will have to be defined after all.
> > >>
> > >>Kind regards,
> > >>  Benoit   mailto:benoit@itedo.com
> > >>
> > >> > Regards,
> > >> > Dieter
> > >> >>
> > >> >>   So here are some thoughts...
> > >> >>   I see RESTRICTED TEXT as a block.
> > >> >>   I see APPEND TEXT as an inline.
> >
> > > That's a novel view!  Seriously, it is an intriguing idea.  But it
> > > diverges from the conventional ISO CGM:1999 picture of RT and AT.
> > > AT is a syntactic artifice, invented solely for the purpose of
> > > changing text attributes within a single text primitive.
> >Yes, exactly like <span> in HTML. And, as you said, <span> is an
> >inline.
> >
> > > If you look at pages 108-111 of CGM:1999, you'll see that only a
> > > handful of things -- basically just text attributes -- are allowed
> > > between RT and AT.  So for example this is illegal:
> >I know.
> >
> > > BEGAPS 'myPara'
> > >     APSATTR 'content' 'Hello World';
> > >     RestrText (x,y,width,height) "Hello ";
> > >     BEGAPS 'mySubpara'
> > >        APSATTR 'content' 'World';
> > >        ApndText final "World";
> > >     ENDAPS;
> > > ENDAPS;
> >
> > > Which is not to say that we couldn't put some search semantics, or
> > > impose a block/inline model, on a sequence of RT+AT+...
> +AT(final).
> > > But I'd prefer that we don't go there.
> >
> > >> >>
> > >> >>   So regardless of para/subpara/content... If 'Hello' is in a
> > >> >>   RESTRICTED TEXT and 'World' in a child APPEND TEXT, a search
> on
> > >> >>   "Hello World" would generate a hit. Anyone agrees with me?
> >
> > > Well, if there were a 'content' match, then 1.0 says that
> generates
> > > the hit (1st priority).
> >That wasn't the question.
> >
> > > But assuming no content match, RT"Hello " + AT"World"
> would generate
> > > a hit for Hello World, IMO.  But I say that because, in my reading
> > > of CGM :1999,  RT+AT+...+AT is logicially a single,
> single-line text
> > > primitive.
> >Lets wait for the definition of block/inline... but I think you've
> >just explained your own definition (i.e., it's a single line
> of text).
> >
> > > Not because of a block-inline model (which I don't yet
> understand).
> >
> > >> >>   I would be tempted to use the same logic on 'content'. I.e.,
> if
> > >> >>   'content' is specified on a para, it's a block. If it's
> specified on
> > >> >>   a child subpara, it's an inline. However, I don't
> know if the
> > >> >>   current search functionality provided by vendors adopts the
> same
> > >> >>   logic?!
> >
> > > I think it does not.  But the vendors and users are the ones to
> > > consult on this -- some have spoken, like Forrest and Dave (whom I
> > > associate with the origin of this stuff, for Boeing and/or ATA
> > > application)
> >I've asked in a previous email... is this stuff even used in the real
> >world? An concrete example would be nice.
>
> Good question.
>
>
> > >> >>   I'm still waiting for more information from Chris
> about this,
> but
> > >> >>   why not get the conversation started right away within the
> group?
> >
> > > Okay.
> >
> > > Btw, how would you define block and inline?  You seem to
> be getting
> a
> > > pretty good working sense of them.
> >At the moment, I'm assuming that Chris is coming from an HTML and SVG
> >background. Which means <p> and <span>; <text> and <tspan>.
>
> It seems to be taking a long time to answer.  (I know he went back to
>
> discuss it with Richard Ishida, so there must be some subtlety and
> nuance
>
> about the concepts in the original question.)
>
> -Lofton.
>
>
>
>
>
> This mail has originated outside your organization,
> either from an external partner or the Global Internet.
> Keep this in mind if you answer this message.
>

This mail has originated outside your organization, either from an external partner or the Global Internet. Keep this in mind if you answer this message.