[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: RE: Re[3]: [cgmo-webcgm] Text searching
All, This is back to the requirement to have a "view" on the CGM file to include it in larger publishing framework (and not only consultation application framework). What about a DOM for Editor and/or converter, as this knowledged could be xtracted from the cgm file at this time (as we are doing for the A380)? This is a question and I do not know if there is a real business case for the vendors, although I know there is a requirement for the Technical Data people (Boeing, Airbus, ASD, ...) Regards, Regards, Franck. > -----Message d'origine----- > De : Cruikshank, David W [mailto:david.w.cruikshank@boeing.com] > Envoyé : dimanche 7 mai 2006 06:32 > À : Lofton Henderson; Benoit Bezaire; cgmo-webcgm@lists.oasis-open.org > Objet : RE: Re[3]: [cgmo-webcgm] Text searching > > > > I'll try to address how "content" is used at Boeing (albeit our > implementation of CGM V4 predates WebCGM and even the ATA IGExchange > stuff (although this spec contained the concept of para and > subpara with > content attributes...as unspecified as it is in WebCGM). In our IETM > (PMA) we do capture content on para. Subpara is really > implemented as a > substring with pointers to capture and reference in the middle of a > "para". The para really contains the content. The implementation of > searchable text in PMA is that when we "build" the document > for PMA all > the "content" strings are externalized and run through the > text indexer > along with all the other SGML text in the document, so when you do a > full text search on the document you get hits in both the text and > graphics. This is not the same as a DOM call to a viewer to > search text > within a single CGM file. The major use case or searching in a single > cgm file is for a complicated graphic like a wiring diagram. You may > know that a wire in on a particular diagram, but is is very > difficult to > spot it by looking. The ability to search a bunch of graphics for a > string is outside the scope of a WebCGM viewer, but the content > attribute on para facilitates that in the IETM. Content on subpara is > probably of less importance. > > Dave > > > Technical Fellow - Graphics/Digital Data Interchange > Boeing Commercial Airplane > 206.544.3560, fax 206.662.3734 <-- NEW NUMBERS > david.w.cruikshank@boeing.com > > -----Original Message----- > From: Lofton Henderson [mailto:lofton@rockynet.com] > > Sent: Saturday, May 06, 2006 11:34 AM > To: Benoit Bezaire; cgmo-webcgm@lists.oasis-open.org > Subject: Re[3]: [cgmo-webcgm] Text searching > > Let me extract a couple of points up front > > > 1.) Process/procedural: About the current under-defined state of > para/subpara/content, you said, "And according to recent W3C standards > would not make it into the spec if not corrected." An interesting > question about this: if W3C approved this twice as Rec (1999 > and 2001), > and if it has been in 1.0 for 7 years, and in the field and > implementations (if indeed anyone is using it) ... to what > degree is it > appropriate for W3C to try to force this legacy 1.0 stuff to > be revised > and cleaned up to current standards? I.e., how hard will they push on > it? > > Regardless of the answer, we should definitely do something -- Chris's > question must be answered. But how far should we go? It seems that > this has progressed somewhat like the drawing-model issue. We have > gotten much deeper into it than answering Chris's question. That's > good, we need to understand the situation fully and clearly. But then > we need to decide what to do about it. > > 2.) This is probably premature until we have Chris's > clarification about > block/inline comment. But pretty soon it would be nice to > progress to a > concrete proposal for and answer to Chris changes to the (1.0) > specification. > > I'll make a few more comments inline... > > At 09:34 AM 5/5/2006 -0400, Benoit Bezaire wrote: > >See inline... > > > >Thursday, May 4, 2006, 8:26:17 PM, you wrote: > > > Hi Benoit, > > > > > Some technical replies for you (and Dieter)... > > > > > At 06:31 PM 5/3/2006 -0400, Benoit Bezaire wrote: > > >>I'm seeing the emails coming in about this topic. And I have to > > > >>state that I don't understand how people get to such an > > > >>understanding of the feature by reading what is in the > specification. More inline... > > >> > > >>Wednesday, May 3, 2006, 6:04:20 PM, you wrote: > > >> > Benoit, > > >> > > >> > I think the example does not reflect the intentions of the > authors. > > >> > > >> > It should be like this > > >> >> (approx syntax) > > >> >> BEGAPS 'myPara' > > >> >> APSATTR 'content' 'Hello World'; > > >> >> ... > > >> >> BEGAPS 'mySubpara' > > >> >> APSATTR 'content' 'World'; > > >> >> ... > > >> >> ENDAPS; > > >> >> ENDAPS; > > >> > > >> > Hence the content attribute of the para would contain all the > > > >> > text > > of the > > >> > para, whereas the attribute of the subpara woul > contain the text > > > >> > of the subpara only. > > >>Hmmm. Isn't this an assumption? I could see it use this way when > > > >>using para/subpara on a raster; but that may not always > be the case. > > > > > Perhaps it is an assumption, but it seems to me to be at least > > > > hinted by the text of 3.2.1.3, 3.2.1.4, and 3.2.2.8. (Or ... > > > perhaps I'm too biased by what the 1.0 authors meant to say, but > > > > that they didn't express unambiguously.) > >Sorry, I disagree. > > > >There's no hint in there which says 'content' on a para MUST contain > > >all text strings found in all subpara 'content's. I see things like > > >'may be used to identify text', 'can potentially enable text search', > > >'identifying matches [...] is not specified in WebCGM.', 'may be used > > >to identify smaller fragments', 'This enables, for example [...]'. > > > >What para/subpara/content is suppose to do, is far from clear. And > > >according to recent W3C standards would not make it into the spec if > > >not corrected. > > I'll accept that the document doesn't explain it clearly. > The fact that > four "old-timers" have expressed the same view of it probably > means that > we all talked about it in 1999 and over the years, and > evolved something > of a common understanding, but that cannot be divined from > the 1.0 text. > Fair enough. > > (Don't be put off by my use of "old timers" -- it is only meant to > signify those who have been around from the beginning, and share a > common but poorly written understanding.) > > > > >>Regardless, doesn't Chris' question still stand? > > > That question is: is para a block and subpara an inline? > Yes, we're > > > > going to have to answer the question somehow. There are > a couple of > > > > problems here. > > > > > First problem, para and subpara (as you pointed out in > your proposed > > > reply) are APS objects which group stuff which might not even be > > > > text. So the question, as it stands, seems meaningless. However, > > > para+content could be viewed as a surrogate for or abstraction of > > > the textual-related thingy inside its APS, and similarly for > > > subpara+content. Then you could phrase the question about those > > > "surrogates". > >You are playing with words here! > >On the call we explained to Chris that para/supara were not text > > >elements but APS. But his question still stand and has now become: > >is para+'content' a block and subpara+'content' an inline. > > > > > Second problem, I still don't know what block and inline > mean (Chris > > > > is consulting with an i18n guy before sending more info). > >I agree. > > > > > But from XHTML, a block element is like a 'p' and an > inline element > > > > is like a 'span'. > >Yes. > > > > > Let's suppose HTML had a 'content' attribute (maybe you could do > > > > this example with 'title' attribute, which is typically used for a > > > > tooltip). > > > > > <p content="???">Hello <span content="world">world</span></p> > > > > > Would you expect ??? to reflect the entire content of the <p> > > > > element, or only that portion of the <p> element that is > outside of > > > > the <span>? I would expect the first, i.e., ??? should be "Hello > > > > world". > >I would have no expectation. I don't know any specification that puts > > >restrictions on character data for an attribute. It's either a > > >predefined set of values or plain character data. > > > >I think using HTML 'alt' would be a better comparison... and you will > > >notice that it can only be specified on IMG, AREA, APPLET, and INPUT. > >It cannot be used on <p> and <span>, thus most (if not all) > the WebCGM > > >problems related to this do not exist in HTML. > > > > > This is the way I think about para and subpara (and > apparently some > > > > others do as well). However, from the example that Chris posed, I > > > > may be entirely off base as to the meaning of "block" and > "inline". > >I don't think we are way off on the block/inline thing. But > I do think > > >that using an attribute (content) on APS which can be nested and > > >possibly already readable, to be a mistake. > > I disagree. There is a perfectly simple explanation: 'content' on > 'para' > > should reflect the text content of the entire 'para' APS; 'content' on > 'subpara' should reflect the text content of the entire 'subpara' > > APS. Period. (By "text content", I mean the RT elements, or the text > that is drawn by the filled polybeziers, rasters, etc). > > I claim that is the common understanding. I believe it > originated with > is an ad hoc solution to needs of Boeing and/or ATA, that made its way > into WebCGM 1.0. (On this thread I have asked Dave to > confirm or refute > that, but he hasn't replied.) > > To be clear about "nested" ... as you know, 'subpara' (and only > 'subpara') can be nested in 'para', and nothing can be nested in > 'subpara'. > > > > > More... > > > > >> >> -----Original Message----- > > >> >> From: Benoit Bezaire [mailto:benoit@itedo.com] > > >> >> Sent: Wednesday, May 03, 2006 11:53 PM > > >> >> To: cgmo-webcgm@lists.oasis-open.org > > >> >> Subject: [cgmo-webcgm] Text searching > > >> >> > > >> >> Hi, > > >> >> > > >> >> On the call today, Chris asked me the following question... > Assume > > >> >> we have: > > >> >> > > >> >> (approx syntax) > > >> >> BEGAPS 'myPara' > > >> >> APSATTR 'content' 'Hello'; > > >> >> ... > > >> >> BEGAPS 'mySubpara' > > >> >> APSATTR 'content' 'World'; > > >> >> ... > > >> >> ENDAPS; > > >> >> ENDAPS; > > >> >> > > >> >> And he does a text search on the string "Hello > World", will he > get a > > >> >> hit, yes or no? > > >> >> > > >> >> I believe this to be an indirect way of asking/answering if > > >> >> 'subpara' is an inline or a block. > > >> >> > > >> >> If we say, yes there's a hit, then we've defined > 'subpara' as > > >> >> inline, if we say, no there's no hit, it's a block. > > > > > I'd say "no hit". But the problem here is that the 1.0 authors > designed > > > this with a very specific ad hoc semantic in mind -- like <p> and > <span> -- > > > and the question is ... well, baffling to me still. > > > > > That doesn't mean that we can't answer it, once we know what block > and > > > inline mean, but we need to be a little careful of adding semantic > that > > > wasn't there and not intended in 1.0. > > > > > Btw, we have other under-spec problems as well. In this example > > > > > BEGAPS 'myPara' > > > APSATTR 'content' 'Hello World'; > > > ... > > > BEGAPS 'mySubpara' > > > APSATTR 'content' 'World'; > > > ... > > > ENDAPS; > > > ENDAPS; > > > > > Does a search on "World" return the para or the subpara? (I would > say the > > > subpara -- "closest to leaf" -- and I think this is what > users like > Dave > > > would expect.) > >I don't know what kind of searching you guys have in mind. But the > >search functionality that I use on a daily basis (Dev Studio, email > >search, PDF search, HTML/browser search)... would generate two hits; > >the user than picks the one which is most relevant to him. > > One could treat this either like one of those searches, or like the > > generation of mouse hits from nested APSs. I was espousing the > > latter. But I don't care much, and I would actually like it > best if we > > could avoid this depth of detailed specification. > > > > >> >> What's the answer? > > >> >> The specification says the following (for para)... > The WebCGM > > >> >> prescription for priority of text search matching is: 'para' > with > > >> >> matching 'content' (1st priority match); 'para' without > 'content' > > >> >> but with recognizable single-element RESTRICTED TEXT match > (2nd > > >> >> priority match); or, single-element RESTRICTED TEXT match, > outside > > >> >> of any 'para' (3rd priority match). > > >> >> And for subpara: See 3.2.1.3, 'para'. > > >> >> > > >> >> In other words, it's not specified :( > > >> > I think that Chris wants to build a logical > relationship between > the > > >> > attributes where there is none. You search ONE attribute at a > time, > > >> > not a combination of nested attributes. > > >>I don't get to the same conclusion. The above wording doesn't even > say > > >>how to perform a search within RESTRICTED TEXT and APPEND TEXT > > >>(without the 'content' attribute). > > > > > As I suggested yesterday, perhaps that search-priority > specification > > > should be made into recommendations for search applications, > > > non-normative, along with some clarification/guidance for how we > > > expect 'content' to be used on para and subpara? (Hello World on > > > para, and just World on subpara). > > > > > More about RT and AT below. > > > > >> >> > > >> >> Chris made it relatively clear that if we want to have these > APS > > >> >> types in WebCGM 2, we need to improve how they are > specified. > > > > > Reluctantly agree. But I think (as I said above), we need to be > > > careful about adding (e.g., from some W3C CharMod model) some > > > concepts or semantics that are unrelated to the original > purpose of > > > para/subpara/content. > > > > > Question for Dave: did this stuff derive from something in ATA? > > > > >> > I agree that this is all underspecified, however, the entire > search > > >> > is still wide open, no syntax, nothing. > > >>I'm not sure what you mean by syntax? I would expect this to be a > > >>vendor feature (like the Search functionality in Web Browsers). > > >> > > >> > The only way to get access is limited by the DOM > functions, which > don't > > >> > allow you to access the RESTRICTED TEXT anyway if I > remember this > > >> > correctly. > > >> > > >> > So right now, whoever wants to search, can retrieve the content > > >> > attribute of a para or subpara using the DOM, and he > can then do > > >> > whatever he wants to perform a search therein. > > >>That's sounds quite difficult to perform from a user's > perspective. > > >> > > >> > I want to point out that I brought up this issue several times, > it > > >> > is an important requirement of the Navy, but the group > decided to > > >> > turn this down and to not define text search in WebCGM 2.0. > > >>Well, maybe it will have to be defined after all. > > >> > > >>Kind regards, > > >> Benoit mailto:benoit@itedo.com > > >> > > >> > Regards, > > >> > Dieter > > >> >> > > >> >> So here are some thoughts... > > >> >> I see RESTRICTED TEXT as a block. > > >> >> I see APPEND TEXT as an inline. > > > > > That's a novel view! Seriously, it is an intriguing idea. But it > > > diverges from the conventional ISO CGM:1999 picture of RT and AT. > > > AT is a syntactic artifice, invented solely for the purpose of > > > changing text attributes within a single text primitive. > >Yes, exactly like <span> in HTML. And, as you said, <span> is an > >inline. > > > > > If you look at pages 108-111 of CGM:1999, you'll see that only a > > > handful of things -- basically just text attributes -- are allowed > > > between RT and AT. So for example this is illegal: > >I know. > > > > > BEGAPS 'myPara' > > > APSATTR 'content' 'Hello World'; > > > RestrText (x,y,width,height) "Hello "; > > > BEGAPS 'mySubpara' > > > APSATTR 'content' 'World'; > > > ApndText final "World"; > > > ENDAPS; > > > ENDAPS; > > > > > Which is not to say that we couldn't put some search semantics, or > > > impose a block/inline model, on a sequence of RT+AT+... > +AT(final). > > > But I'd prefer that we don't go there. > > > > >> >> > > >> >> So regardless of para/subpara/content... If 'Hello' is in a > > >> >> RESTRICTED TEXT and 'World' in a child APPEND TEXT, a search > on > > >> >> "Hello World" would generate a hit. Anyone agrees with me? > > > > > Well, if there were a 'content' match, then 1.0 says that > generates > > > the hit (1st priority). > >That wasn't the question. > > > > > But assuming no content match, RT"Hello " + AT"World" > would generate > > > a hit for Hello World, IMO. But I say that because, in my reading > > > of CGM :1999, RT+AT+...+AT is logicially a single, > single-line text > > > primitive. > >Lets wait for the definition of block/inline... but I think you've > >just explained your own definition (i.e., it's a single line > of text). > > > > > Not because of a block-inline model (which I don't yet > understand). > > > > >> >> I would be tempted to use the same logic on 'content'. I.e., > if > > >> >> 'content' is specified on a para, it's a block. If it's > specified on > > >> >> a child subpara, it's an inline. However, I don't > know if the > > >> >> current search functionality provided by vendors adopts the > same > > >> >> logic?! > > > > > I think it does not. But the vendors and users are the ones to > > > consult on this -- some have spoken, like Forrest and Dave (whom I > > > associate with the origin of this stuff, for Boeing and/or ATA > > > application) > >I've asked in a previous email... is this stuff even used in the real > >world? An concrete example would be nice. > > Good question. > > > > >> >> I'm still waiting for more information from Chris > about this, > but > > >> >> why not get the conversation started right away within the > group? > > > > > Okay. > > > > > Btw, how would you define block and inline? You seem to > be getting > a > > > pretty good working sense of them. > >At the moment, I'm assuming that Chris is coming from an HTML and SVG > >background. Which means <p> and <span>; <text> and <tspan>. > > It seems to be taking a long time to answer. (I know he went back to > > discuss it with Richard Ishida, so there must be some subtlety and > nuance > > about the concepts in the original question.) > > -Lofton. > > > > > > This mail has originated outside your organization, > either from an external partner or the Global Internet. > Keep this in mind if you answer this message. > This mail has originated outside your organization, either from an external partner or the Global Internet. Keep this in mind if you answer this message.
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]