[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Subject: [humanmarkup-comment] on a new small stratified ontology for cyber war
<header> Again, I apologize for creating this public discussion. However it seems important at least to me. And I do remind everyone that the time required for you to delete the message is not so great.</header> *** Richard Ballard's contribution here is important and relevant to the issue of producing ontologies for arbitrary domains. He and I have talked about what it would take to code a software system and demonstrate a methodology that would produce a stratified ontology supporting sense making about the events that occur in hacking activities and cyber war. Perhaps this is a less then 5M project, with operational deployment (independent of all other systems) within nine months, and refinements over a three year period of time. Len Bullard is one of the developers of HyTime, a notational system for modeling the production of music, and which has been adopted into many small ontologies used in agile transformation of information. His comments are welcome, and are thoughtful of the issues posed. Frank Sowa's recent work on these related subjects is at: http://www.jfsowa.com/pubs/signproc.htm The working notes that I developed, due to discussions with Dennis Wisnosky, on this stratified sense making system is in a four panel PowerPoint at: http://www.ontologystream.com/EI/slipstream_files/frame.htm This architecture is designed to interface with Richard Ballard's Mark 3 knowledge system. *** The primary compatibility can be seen in Dick's references to the REF-REF matches { machine derivable by co-occurrence and other (n-gram, tensor, Latent Semantic Indexing, and a few other esoteric evolutional programming processes } and now the functional load mapping of the "single node" formative ontology of the event map (please look at the representation of Port 80 - the e-mail port) in the top right corner of the paper: http://www.ontologystream.com/bSLIP/finalReview.htm This visualization is original to me.. but related to the work of Soviet era cognitive visualization of the theorems of elementary number theory: http://www.ontologystream.com/IRRTest/Evaluation/ARLReport.htm The connection to number theory is in elementary number base conversions... (changing the base from base 10 to base 6 alters the "solvability" of the problem of representing 1/3 in a rational expansion. This is related in my unpublished work on the Whorf hypothesis (in non-translatability) and to Godel/Cantor theory (in the foundation of finite mathematics.) This work on transforming unsolvable problems (Peter Kugler would call this a by-pass) leads to very fast scatter gather (clustering) algorithms so that what takes 4 hours using FoxPro Rushmore indexing is reduced to 25 seconds. So the clustering in the SLIP (stochastic) and eventChemistry work is fast enough to interact with human's attention span during an real time investigation of data invariance. Self organizing feature maps often take a day to cook a representation of a text corpus. The new algorithms change this investment of time to perhaps several minutes. The theory is very simple and the demonstration was already in the December 7th, 2001 SLIP Browsers: downloadable with short tutorial at: http://www.ontologystream.com/SLIP/files/ArbitaryEventLog.htm The SLIP atoms are Peircean nodes (not graphs - but a single node)! Sigh.. the insight that I seem to have *that no one else has shown to me* is that the mental event is a single node "noun subject" with the reference link *potential* enumerated. This is not a Bayes representation, because the causes of these potential links are not probabilities. The link forms (emerges) within a stratified architecture with the decomposition of past memory being the substructure and the anticipation due to ecological affordance being the ultrastructure (notational engineer Jeff Long's term used here slightly differently). A simple algorithmic process (I invented in 1997) makes routing and retrieval situated. http://www.bcngroup.org/area3/pprueitt/kmbook/Appendix.htm So atoms, required by a formative compound ontology, first are created by a stochastic process (see the papers) and then meaning is acquired by human introspection (introspection is the "I" word in science, yes?). Peircean thirdness is in moving from a level of atom ontologies (firstness) to the level of compound, via scope (secondness). The current Topic Maps, even with HyTime, are not doing this yet... (at least not that I am aware of.) The key elements of this architecture are: Each of four levels of the taxonomy has human terminology evolution processes, in conjunction with human communities of practice. The bottom layer of this layered taxonomy is an open semiotic system depending on invariance types (categories) produced from the data aggregation of the Internet traffic at selected points within the Internet systems. The second layer is an intrusion event level that is responsive to the already deployed infrastructure of Intrusion Detection Systems (IDSs) and to the visualization of intrusion event patterns at the third level. The third level is a knowledge management system having knowledge propagation and a knowledge base system developed based on Peircean logics (cognitive graphs) that have a formative and thus situational aspect. The fourth level is a machine representation of the compliance models produced by policy makers. National Defense System against Cyber War would deploy a structured and stratified taxonomy across the governmental Computer Emergancy Response Team (CERT) centers. This system would be an independant system, from the current systems, and would have a knowledge management component for virtual collaboration. *** Why have I found it necessary to make this conversation public? 1) The theoretical and practical issues relevant to a working dynamic and stratified taxonomy of this nature must see the light of day. These issues can be partially solved, not solved at all, and solved in ways that burden the National response to cyber war. 2) The proper solutions to this problem are useful in eBusiness, decision support systems (commercial and military), and in systems from virtual education. They CANNOT become part of the confusion which is the classified technologies (most of which simply are well-known not to work.) 3) Private and personal reasons that since November 2001 no one has been considerate enough to pay me and my programmer for the work that we continue to do, all the while treating the issue as if a field test of **my** software, software that was not yet completed, is in progress. It has been treated as if the new work, more completed axioms and theorems foundation to a new area of pure mathematics, and guidance from those whom I would ask to develop proper outcome metrics; well these things are somehow not needed or wanted. The business value proposition is in competition with something that makes the business unit more money. *** This MUST be considered a policy issue because it is one more example, where there are many many more examples, of how the innovations needed in defining knowledge science are restricted by the practices of the business mind in the exercise of control over the science mind. This might be ok if the Nation and the economy was not traveling a light speed towards a brick wall. I realize that the general systems dynamic has nothing to do with me, or the business unit. It is systemic and ubiquitous. It is a fact of life. However, the larger moral issue is in regards to why we as a culture have given absolute power to those who are practiced in this control. They know nothing of the issues that might be solved. They brought is the .com bubble because they felt too important to understand that in most cases there was no product even considered by these invested companies. Yes? Perhaps there is some other reason why the .com bubble occured? I do not think so. I again call for a Manhattan Project to establish Knowledge Science, and extend the true capabilities of Information Technology. http://www.bcngroup.org/area3/manhattan/sindex.htm This project could change the nature of the public discussion about what IT is good for, by bringing an understanding of the existing science to bare on the (mostly unprovable and often deceptive) theories in artificial intelligence and information technology. In response to Microsoft's advertizement "Where would you like to go?", I say I want to go somewhere where there is a stable operating system that will not change just a soon as I get my programs to work. Paul S. Prueitt Chantilly VA *** -----Original Message----- From: Richard Ballard [mailto:rlballard@earthlink.net] Sent: Saturday, February 02, 2002 1:31 AM To: eventChemistry@yahoogroups.com; Topicmaps-Comment; Thomas B. Passin Cc: Mark Turner; Douglas Weidner; Tim Barber; Dorothy Denning; Doug Dearie; Dr. Robert Brammer; Rita Colwell; James L. Olds; Humanmarkup-Comment; Katarina Auer; Paul Zavidniak; William Sander; Dennis Wisnosky; Albright; Ivan Prueitt; Pharris(Contr-Ito); George Lakoff; Wojciech M. Jaworski Subject: [eventChemistry] Reaction to -- multilingual thesaurus - language, scope, and topic naming constraint Paul & Others: This conversation is a wonderfully entangled cameo of semantics, taken as the nexus or solution or insolvability of all things conceptual. Everyone of us becomes tempted at some time of life to untangle this problem or, via some simplifying assumption, finesse it as a barrier and move past. Some settle in and decide to spend their lives either solving or contributing to it from some particular perspective. The pernicious raise the issue just to assert that no problem can be solved unless their favorite problem is solved first. Delightfully that perniciousness, while present, is not blatant here. But still it goes round and round. At some point, the question has to be called for and some division of the house. I usually ask two questions: (1) What do you want language to do for you that makes semantics the issue? (2) From what you have learned so far is this problem going to be solved in years, decades, centuries, millennia, or ever? I would certainly like to hear an optimistic answer, particularly from George Lakoff or others who are so heavily invested. For me some 20 years was devoted to natural language dialog systems, sub-language analysis, and related linguistic issues in user interface design and computer based instruction and tutoring. When I turned to full time knowledge engineering (some 18 years ago), my faith and sympathy for language as a system for knowledge representation became a losing struggle. I abandoned it completely 10 years ago. I consider that a breakthrough and will say more on it at the Knowledge Technology Conference in Seattle March 11-13. In knowledge coding we have the problem identifying "ideas" with some code, symbol, or phrase and then integrating the knowledge gathered and acquired by modeling from many sources. Each source had its own ontological commitment and the problem and goal is to marry these views at points where they share a common idea. In formal languages, like computer programming, we speak of DEFs and REFs. DEFs are places where the author has carefully defined as precisely as possible what a given phrase, symbol, or idea means as compared to REFs where some phrase, symbol, abbreviation, or figurative pronoun is used in reference to ideas that were never defined. In computers, the job of detecting conflicting DEF-DEF assertions and perfecting DEF-REF matches and self-consistency is accomplished by compilers, matches across sources by linkers. None of these tools tries to make REF-REF matches, unless some necessary characteristic matches exactly. In natural language sources, the ratio of DEFs to REFs is very small. (1 in 10 might be a useful, integrative, "learnable" knowledge source.) Try to find definitions in the foregoing conversations. What passes for conversation is invariably REF-REF matches. It is hard to believe that language evolved under the imperatives of exactly matching ideas and meaning, more likely its natural selection criteria was "adequate similarity" within the "bonding cultural illusion" of shared feelings, interest, and understanding. Language and (unfortunately?) language misunderstanding and ambiguity are exactly what cultures and civilizations need to sustain unity under the stress of cultural diversity and broad differences in education, motives, and real interest. The "plasticity" of language to change and become what ever it needs to become makes the idea of "correct sense matching through language" more likely to mean politically correct, culturally correct, religiously correct, legally correct, than it is to be logically correct. Whose penalties are most severe? Well, who am I talking to and who else is listening. In large scale knowledge base construction we employ four primary talents (acquisition editors, modelers, production editors, and consulting subject specialists). Acquisition editors are trained to seek out and recognize the highest quality knowledge sources relevant to the target audience's primary needs and demands. Modeler's sort through these sources, focusing primarily on the quality and completeness of their "dominant mediating conceptual structures" (taxonomies, compositions, task/subtask hierarchies, flows, choice and constraint structures, etc etc.). Within these contexts concept meanings are strongly typed independent of language used, models make the first order ontological assignments and direct the word processing "pick and shovel" workers who add great productivity and volume to their efforts. this is the human equivalent of compilation. Production editors, assisted by consulting subject specialists, focus on source differences in abstraction level and granularity -- the processing of proximate matches. This work goes on within narrow subject areas suited to sublanguage analysis in limited domains where contextual settings and "subject expertise" resolve and validate the matches made. This is the human equivalence of linking. All of this work is value added and well worth the effort if the sources are suitable and highly structured, which from the knowledge management perspective means thick, repetitive, tabular books and data bases that for some reason cost a lot to produce (because of their completeness), and make dull reading. The first thing a company is likely to throw out. Most knowledge acquisition by modeling efforts become economic today where direct labor costs fall within $5000-$10K per source document (excluding royalties, licensing, etc.) Within the next 2-5 years this legacy mining might be expected to grow very fast, given market awareness and delivery tool environments. The dictionary was invented to stabilize word use, spelling, and meaning assignments against constant generational drift. Even when overloading words with 10-20 alternate meanings there are not enough to match one word to one concept. In the main, we use noun phrases for concept titling and acronyms, abbreviations, and pronouns when we get tired of writing these. Our literary forms favor constant reference variation to keep from sounding repetitive or one dimensional. These forces stressing human attention span and need for stimulation tell us that language has more to do than help us compare ideas. If we look hard at technical book stores today we will see the ontological equivalent of the dictionary taking up whole bookshelves. Its the field of medical coding. If your doctor orders a 26910 treatment and your not suffering from either a 170.5, 198.5, 730.13, or 991.1. Then that could cost you serious money, because your insurance company will not pay for it. If you want to sell clothes to Nordstrom, then you are going to have to enter into their standardized retail buying network and match their coding system in all your paper work. If we need an exact concept matching language, we will get it and it will not come from the dictionary. Dick PS. As is your way, feel free to share this. -----Original Message----- From: psp [mailto:beadmaster@ontologyStream.com] Sent: Friday, February 01, 2002 8:32 AM To: Topicmaps-Comment; Thomas B. Passin Cc: Douglas Weidner; Tim Barber; Dorothy Denning; Doug Dearie; Dr. Robert Brammer; Rita Colwell; James L. Olds; eventChemistry; Humanmarkup-Comment; Katarina Auer; Paul Zavidniak; William Sander; Dennis Wisnosky; Albright; Ivan Prueitt; Pharris(Contr-Ito); George Lakoff Subject: [eventChemistry] RE: [topicmaps-comment] multilingual thesaurus - language, scope, and topic naming constraint <header> This is a complex message - perhaps of some theoretical interest to the cc list. However, if Points of Contact at DARPA, OSTP and NSF are not interested in this discussion; then we request a different point of contact. -Paul Prueitt OSI </header> **** **** Tom Passin said about the excellent post by Bernard Valant, "I didn't think of representing that those words themselves stood for different concepts. Interesting!" to the topicmaps-comment forum (at Oasis). *** <Paul Prueitt> A brief note here regarding the scope of a word due to language setting. I think that what I will say here will not be a surprise to linguists. It is NOT simply an "technical understanding of the language" that provides the real scope of a word in a language. Meaning occurs and can only be fully understood in the cultural setting and realities of the social system. To hold the opposition position (that an Interlingua exists in an absolute sense) is speculative, at best. This position is reductionism at core (this is my claim), since it claims that all natural language can be reduced to a single deep structure. Perhaps Professor Lakoff will make a comment on this? "Contextual is also pragmatic, as the word *lives* in a cultural setting. (Fiona Citkin, Head translator of the ARL sponsored conference (1995 - 1999) on Soviet Semiotics) private communication.)" In most cases the (Whorf?) problem is not so bad. However, in many cases profound misunderstanding can come because of an assumption that it is a technical understanding of a second language that stands in for the cultural experience. Yes? Machine translation systems have this problem often. Yes? On the practice of constructing static topic map? Well **perhaps** the TM community sees the real problem that comes from an early binding of scope during the production of TMs by one person and the use of the TM by someone who has a different point of view. These TM are becoming engines that will do things? And thus the issue of false Sense Making is vital - since evidence indicate miscommunication **between humans** sometimes distorts the meaning in diplomatic channels. Tonfoni makes the (private) argument that diplomatic miscommunication was responsible for much of the diplomatic errors made before the Gulf War. {Certainly, the American Nation is close, in many instances, to false sense making with respect to many issues where we are using great force to achieve outcomes that is proper, but that... we are not properly understanding the **scope**. } This is not a small matter! *False* sense making (Karl Weick, Sensemaking in Organizations), using off the shelf ontology (static TM), is a big problem that is not completely solved using HyTime... http://www.bcngroup.org/area3/pprueitt/private/KM_files/frame.htm The issue is reflected in the problem with machine based declassification and a operational theory of similarity, as I have stated in: http://www.bcngroup.org/area3/pprueitt/SDIUT/sdlong.htm This is a long and unpublished paper. I hope that the TM community will realize that I am NOT criticizing the important work that has been done over the past several years using Topic Maps. But there continues to be a problem, and Bernard's message states this problem *perfectly*. yes? *** I have an approach to mapping the functional load between one word and all other words in natural use in a language. This is completely novel and new (I think). It is the eventChemistry as applied to word co-occurrence. I have studied the Aesop fable collection in English... but I need some help with issues like noun and verb differentiation.. and case grammars. There are a lot of similarities to Latent Semantic Indexing.. but eventChemistry has visualization and a few other surprises. Is there anyone (a linguist) who would like to do this work on the fable collection (likely requiring 30 - 40 hours of effort, using the eventChemistry software. What we might go after is a description of the functional load of some of the terms as used by Aesop in his fables. http://www.ontologystream.com/bSLIP/finalReview.htm So, some of you already see where this is going; the notion is that mapping single word usage in natural settings will provide a single atom (node with affordance links) --- as in Peirce's Unifying Logic Vision... concepts are like chemical compounds that are composed of atoms". This single atom is like the event atoms I have developed to study cyber war and innovation adoption (both of these are **intrusions** from one level of natural activity into another level of natural activity.) Please just look at the short paper on this at the above URL. It would seems that this would make a good publication, and perhaps even identify a value proposition? The mark-up of the context setting is addressed nicely in the work of Tonfoni http://www.bcngroup.org/area3/gtonfoni/EIVD/index.html Paul Prueitt OntologyStream Inc. Chantilly VA I have copied Bernard's message below for two other forums.. as the issue of scope is so beautifully expressed: **** -----Original Message----- From: Bernard Vatant [mailto:bernard.vatant@mondeca.com] Sent: Friday, February 01, 2002 4:46 AM To: topicmaps-comments Cc: stefan.jensen@eea.eu.int Subject: Re: [topicmaps-comment] multilingual thesaurus - language, scope, and topic naming constraint Thanks to all who tried to answer, both on this list and through private communications. Now let me expose what I found out yesterday night - just after switching off the computer - with that delicious feeling you have when a long searched solution suddenly appears obvious and crystal clear, just because you have, at last, looked at it the right and simple way, and all the previous attempts look awkward and far-fetched. But, be patient. A bit of history. Last year, I was investigating that question with Seruba research team, unfortunately swept from the scene by economical constraints. The solution I had suggested at the time was to consider terms in different languages as n distinct topics, independent from the abstract descriptor, itself considered topic n+1. And then link those guys together through associations, asserting something like: "This topic is an abstract descriptor, representing an abstract concept, independent from any language. Those topics represent the term used in those languages to represent this descriptor concept". In putting the concept and the terms on different levels of topics, we had a technical way to manage synonymy and polysemy. But, like solutions proposed by Kal or Tom, that was only a stealth, and I remember one of Seruba's linguists, very skeptical about it, keeping saying to me "It works, but it does not make sense!" And he was right! The only sustainable viewpoint is that there is no such thing as a *concept independent of its representation by a term in a certain language*. Every attachment of a term to a concept is always asserted in the scope of a certain language, and every other language conveys a slightly or radically different view of the world and organisation of concepts, and that's why lingual diversity is so precious, and translation so difficult ... So we have to go back to basics: one subject = one topic. (DAN : økonomi), (DUT : economie), (ENG : economy), (FRE : économie), (GER : Wirtschaft), (SPA : economía) convey a priori six different concepts and views of the world, that someone familiar with all those languages could certainly feel, even if the differences are subtle. Hence they are six different subjects, and therefore have to be represented by six different topics. They are not six names of the same topic in different scopes, and definitely not variants. And they are not even representations of a same descriptor in different languages. The 7th topic, standing in the middle of nowhere outside of any language scope, does not make sense, because it has no meaningful subject. Note that if you give a definition of the descriptor, you always give it in some default language ... So what is a descriptor, putting together those six concepts for the purpose of cross-language communication and translation? What do you do when you gather topics? Obvious - you build an association. And what is the scope of that association? The scope of the language viewpoint from which you assert this association, that means the default language of the thesaurus ... This association asserts that those topics can be considered as "equivalent", allowing a translation which makes sense, maybe in a certain scope. Note that the scope is not on the names, but on the association. And that the associations are not necessarily the same if I stand from another language viewpoint. So if I edit the thesaurus with a different default language, I will certainly have to change the set of associations. That approach is deeply respecting the diversity of *concepts* conveyed by the different languages. All previous approaches are in fact killing the linguistic diversity, if you look at them closely, because the default language of the descriptor imposes the set of concepts, and the other languages are to find willy-nilly a name for it. And this is really enabled by the topic map representation. Think about it. I've got to put all that in XTM now. Regards Bernard To unsubscribe from this group, send an email to: eventChemistry-unsubscribe@yahoogroups.com Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ ------------------------ Yahoo! Groups Sponsor ---------------------~--> Get your FREE credit report with a FREE CreditCheck Monitoring Service trial http://us.click.yahoo.com/ACHqaB/bQ8CAA/ySSFAA/0EHolB/TM ---------------------------------------------------------------------~-> To unsubscribe from this group, send an email to: eventChemistry-unsubscribe@yahoogroups.com Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Powered by eList eXpress LLC