humanmarkup-comment message

Subject: [humanmarkup-comment] on a new small stratified ontology for cyber war
From: psp <beadmaster@ontologystream.com>
To: eventChemistry@yahoogroups.com,Topicmaps-Comment <topicmaps-comment@lists.oasis-open.org>,"Thomas B. Passin" <tpassin@home.com>
Date: Sat, 02 Feb 2002 11:19:24 -0500
<header> Again, I apologize for creating this public discussion.  However it
seems important at least to me.  And I do remind everyone that the time
required for you to delete the message is not so great.</header>

***

Richard Ballard's contribution here is important and relevant to the issue
of producing ontologies for arbitrary domains.  He and I have talked about
what it would take to code a software system and demonstrate a methodology
that would produce a stratified ontology supporting sense making about the
events that occur in hacking activities and cyber war.  Perhaps this is a
less then 5M project, with operational deployment (independent of all other
systems) within nine months, and refinements over a three year period of
time.

Len Bullard is one of the developers of HyTime, a notational system for
modeling the production of music, and which has been adopted into many small
ontologies used in agile transformation of information.  His comments are
welcome, and are thoughtful of the issues posed.

Frank Sowa's recent work on these related subjects is at:

http://www.jfsowa.com/pubs/signproc.htm


The working notes that I developed, due to discussions with Dennis Wisnosky,
on this stratified sense making system is in a four panel PowerPoint at:

http://www.ontologystream.com/EI/slipstream_files/frame.htm

This architecture is designed to interface with Richard Ballard's Mark 3
knowledge system.

***

The primary compatibility can be seen in Dick's references to the REF-REF
matches { machine derivable by co-occurrence and other (n-gram, tensor,
Latent Semantic Indexing, and a few other esoteric evolutional programming
processes } and now the functional load mapping of the "single node"
formative ontology of the event map (please look at the representation of
Port 80 - the e-mail port) in the top right corner of the paper:

http://www.ontologystream.com/bSLIP/finalReview.htm

This visualization is original to me.. but related to the work of Soviet era
cognitive visualization of the theorems of elementary number theory:

http://www.ontologystream.com/IRRTest/Evaluation/ARLReport.htm

The connection to number theory is in elementary number base conversions...
(changing the base from base 10 to base 6 alters the "solvability" of the
problem of representing 1/3 in a rational expansion.  This is related in my
unpublished work on the Whorf hypothesis (in non-translatability) and to
Godel/Cantor theory (in the foundation of finite mathematics.)

This work on transforming unsolvable problems (Peter Kugler would call this
a by-pass) leads to very fast scatter gather (clustering) algorithms so that
what takes 4 hours using FoxPro Rushmore indexing is reduced to 25 seconds.
So the clustering in the SLIP (stochastic) and eventChemistry work is fast
enough to interact with human's attention span during an real time
investigation of data invariance.  Self organizing feature maps often take a
day to cook a representation of a text corpus.  The new algorithms change
this investment of time to perhaps several minutes.  The theory is very
simple and the demonstration was already in the December 7th, 2001 SLIP
Browsers: downloadable with short tutorial at:

http://www.ontologystream.com/SLIP/files/ArbitaryEventLog.htm


The SLIP atoms are Peircean nodes (not graphs - but a single node)!  Sigh..
the insight that I seem to have *that no one else has shown to me* is that
the mental event is a single node "noun subject" with the reference link
*potential* enumerated.  This is not a Bayes representation, because the
causes of these potential links are not probabilities.  The link forms
(emerges) within a stratified architecture with the decomposition of past
memory being the substructure and the anticipation due to ecological
affordance being the ultrastructure (notational engineer Jeff Long's term
used here slightly differently).  A simple algorithmic process (I invented
in 1997) makes routing and retrieval situated.

http://www.bcngroup.org/area3/pprueitt/kmbook/Appendix.htm

So atoms, required by a formative compound ontology, first are created by a
stochastic process (see the papers) and then meaning is acquired by human
introspection (introspection is the "I" word in science, yes?).  Peircean
thirdness is in moving from a level of atom ontologies (firstness) to the
level of compound, via scope (secondness). The current Topic Maps, even with
HyTime, are not doing this yet...  (at least not that I am aware of.)

The key elements of this architecture are:

Each of four levels of the taxonomy has human terminology evolution
processes, in conjunction with human communities of practice.

The bottom layer of this layered taxonomy is an open semiotic system
depending on invariance types (categories) produced from the data
aggregation of the Internet traffic at selected points within the Internet
systems.

The second layer is an intrusion event level that is responsive to the
already deployed infrastructure of Intrusion Detection Systems (IDSs) and to
the visualization of intrusion event patterns at the third level.

The third level is a knowledge management system having knowledge
propagation and a knowledge base system developed based on Peircean logics
(cognitive graphs) that have a formative and thus situational aspect.

The fourth level is a machine representation of the compliance models
produced by policy makers.

National Defense System against Cyber War would deploy a structured and
stratified taxonomy across the governmental Computer Emergancy Response Team
(CERT) centers.  This system would be an independant system, from the
current systems, and would have a knowledge management component for virtual
collaboration.

***

Why have I found it necessary to make this conversation public?

1) The theoretical and practical issues relevant to a working dynamic and
stratified taxonomy of this nature must see the light of day.  These issues
can be partially solved, not solved at all, and solved in ways that burden
the National response to cyber war.

2) The proper solutions to this problem are useful in eBusiness, decision
support systems (commercial and military), and in systems from virtual
education.  They CANNOT become part of the confusion which is the classified
technologies (most of which simply are well-known not to work.)

3) Private and personal reasons that since November 2001 no one has been
considerate enough to pay me and my programmer for the work that we continue
to do, all the while treating the issue as if a field test of **my**
software, software that was not yet completed, is in progress.  It has been
treated as if the new work, more completed axioms and theorems foundation to
a new area of pure mathematics, and guidance from those whom I would ask to
develop proper outcome metrics; well these things are somehow not needed or
wanted.  The business value proposition is in competition with something
that makes the business unit more money.

***

This MUST be considered a policy issue because it is one more example, where
there are many many more examples, of how the innovations needed in defining
knowledge science are restricted by the practices of the business mind in
the exercise of control over the science mind.   This might be ok if the
Nation and the economy was not traveling a light speed towards a brick wall.

I realize that the general systems dynamic has nothing to do with me, or the
business unit.  It is systemic and ubiquitous.  It is a fact of life.

However, the larger moral issue is in regards to why we as a culture have
given absolute power to those who are practiced in this control.  They know
nothing of the issues that might be solved.  They brought is the .com bubble
because they felt too important to understand that in most cases there was
no product even considered by these invested companies.  Yes?  Perhaps there
is some other reason why the .com bubble occured?  I do not think so.

I again call for a Manhattan Project to establish Knowledge Science, and
extend the true capabilities of Information Technology.

http://www.bcngroup.org/area3/manhattan/sindex.htm

This project could change the nature of the public discussion about what IT
is good for, by bringing an understanding of the existing science to bare on
the (mostly unprovable and often deceptive) theories in artificial
intelligence and information technology.  In response to Microsoft's
advertizement "Where would you like to go?", I say I want to go somewhere
where there is a stable operating system that will not change just a soon as
I get my programs to work.

Paul S. Prueitt
Chantilly VA



***



-----Original Message-----
From: Richard Ballard [mailto:rlballard@earthlink.net]
Sent: Saturday, February 02, 2002 1:31 AM
To: eventChemistry@yahoogroups.com; Topicmaps-Comment; Thomas B. Passin
Cc: Mark Turner; Douglas Weidner; Tim Barber; Dorothy Denning; Doug
Dearie; Dr. Robert Brammer; Rita Colwell; James L. Olds;
Humanmarkup-Comment; Katarina Auer; Paul Zavidniak; William Sander;
Dennis Wisnosky; Albright; Ivan Prueitt; Pharris(Contr-Ito); George
Lakoff; Wojciech M. Jaworski
Subject: [eventChemistry] Reaction to -- multilingual thesaurus -
language, scope, and topic naming constraint


Paul & Others:

This conversation is a wonderfully entangled cameo of semantics, taken as
the nexus or solution or insolvability of all things conceptual. Everyone of
us becomes tempted at some time of life to untangle this problem or, via
some simplifying assumption, finesse it as a barrier and move past. Some
settle in and decide to spend their lives either solving or contributing to
it from some particular perspective. The pernicious raise the issue just to
assert that no problem can be solved unless their favorite problem is solved
first. Delightfully that perniciousness, while present, is not blatant here.
But still it goes round and round.

At some point, the question has to be called for and some division of the
house. I usually ask two questions: (1) What do you want language to do for
you that makes semantics the issue? (2) From what you have learned so far is
this problem going to be solved in years, decades, centuries, millennia, or
ever? I would certainly like to hear an optimistic answer, particularly from
George Lakoff or others who are so heavily invested.

For me some 20 years was devoted to natural language dialog systems,
sub-language analysis, and related linguistic issues in user interface
design and computer based instruction and tutoring. When I turned to full
time knowledge engineering (some 18 years ago), my faith and sympathy for
language as a system for knowledge representation became a losing struggle.
I abandoned it completely 10 years ago. I consider that a breakthrough and
will say more on it at the Knowledge Technology Conference in Seattle March
11-13.

In knowledge coding we have the problem identifying "ideas" with some code,
symbol, or phrase and then integrating the knowledge gathered and acquired
by modeling from many sources. Each source had its own ontological
commitment and the problem and goal is to marry these views at points where
they share a common idea. In formal languages, like computer programming, we
speak of DEFs and REFs. DEFs are places where the author has carefully
defined as precisely as possible what a given phrase, symbol, or idea means
as compared to REFs where some phrase, symbol, abbreviation, or figurative
pronoun is used in reference to ideas that were never defined. In computers,
the job of detecting conflicting DEF-DEF assertions and perfecting DEF-REF
matches and self-consistency is accomplished by compilers, matches across
sources by linkers. None of these tools tries to make REF-REF matches,
unless some necessary characteristic matches exactly.

In natural language sources, the ratio of DEFs to REFs is very small. (1 in
10 might be a useful, integrative, "learnable" knowledge source.) Try to
find definitions in the foregoing conversations. What passes for
conversation is invariably REF-REF matches. It is hard to believe that
language evolved under the imperatives of exactly matching ideas and
meaning, more likely its natural selection criteria was "adequate
similarity" within the "bonding cultural illusion" of shared feelings,
interest, and understanding. Language and (unfortunately?) language
misunderstanding and ambiguity are exactly what cultures and civilizations
need to sustain unity under the stress of cultural diversity and broad
differences in education, motives, and real interest.

The "plasticity" of language to change and become what ever it needs to
become makes the idea of "correct sense matching through language" more
likely to mean politically correct, culturally correct, religiously correct,
legally correct, than it is to be logically correct. Whose penalties are
most severe? Well, who am I talking to and who else is listening.

In large scale knowledge base construction we employ four primary talents
(acquisition editors, modelers, production editors, and consulting subject
specialists). Acquisition editors are trained to seek out and recognize the
highest quality knowledge sources relevant to the target audience's primary
needs and demands. Modeler's sort through these sources, focusing primarily
on the quality and completeness of their "dominant mediating conceptual
structures" (taxonomies, compositions, task/subtask hierarchies, flows,
choice and constraint structures, etc etc.). Within these contexts concept
meanings are strongly typed independent of language used, models make the
first order ontological assignments and direct the word processing "pick and
shovel" workers who add great productivity and volume to their efforts. this
is the human equivalent of compilation.

Production editors, assisted by consulting subject specialists, focus on
source differences in abstraction level and granularity -- the processing of
proximate matches. This work goes on within narrow subject areas suited to
sublanguage analysis in limited domains where contextual settings and
"subject expertise" resolve and validate the matches made. This is the human
equivalence of linking. All of this work is value added and well worth the
effort if the sources are suitable and highly structured, which from the
knowledge management perspective means thick, repetitive, tabular books and
data bases that for some reason cost a lot to produce (because of their
completeness), and make dull reading. The first thing a company is likely to
throw out.

Most knowledge acquisition by modeling efforts become economic today where
direct labor costs fall within $5000-$10K per source document (excluding
royalties, licensing, etc.) Within the next 2-5 years this legacy mining
might be expected to grow very fast, given market awareness and delivery
tool environments.

The dictionary was invented to stabilize word use, spelling, and meaning
assignments against constant generational drift. Even when overloading words
with 10-20 alternate meanings there are not enough to match one word to one
concept. In the main, we use noun phrases for concept titling and acronyms,
abbreviations, and pronouns when we get tired of writing these. Our literary
forms favor constant reference variation to keep from sounding repetitive or
one dimensional. These forces stressing human attention span and need for
stimulation tell us that language has more to do than help us compare ideas.

If we look hard at technical book stores today we will see the ontological
equivalent of the dictionary taking up whole bookshelves. Its the field of
medical coding. If your doctor orders a 26910 treatment and your not
suffering from either a 170.5, 198.5, 730.13, or 991.1. Then that could cost
you serious money, because your insurance company will not pay for it. If
you want to sell clothes to Nordstrom, then you are going to have to enter
into their standardized retail buying network and match their coding system
in all your paper work. If we need an exact concept matching language, we
will get it and it will not come from the dictionary.

Dick

PS. As is your way, feel free to share this.

-----Original Message-----
From: psp [mailto:beadmaster@ontologyStream.com]
Sent: Friday, February 01, 2002 8:32 AM
To: Topicmaps-Comment; Thomas B. Passin
Cc: Douglas Weidner; Tim Barber; Dorothy Denning; Doug Dearie; Dr.
Robert Brammer; Rita Colwell; James L. Olds; eventChemistry;
Humanmarkup-Comment; Katarina Auer; Paul Zavidniak; William Sander;
Dennis Wisnosky; Albright; Ivan Prueitt; Pharris(Contr-Ito); George
Lakoff
Subject: [eventChemistry] RE: [topicmaps-comment] multilingual thesaurus
- language, scope, and topic naming constraint


<header>  This is a complex message - perhaps of some theoretical interest
to the cc list.  However, if Points of Contact at DARPA, OSTP and NSF are
not interested in this discussion; then we request a different point of
contact.  -Paul Prueitt OSI </header>

****
****

Tom Passin said about the excellent post by Bernard Valant,


"I didn't think of representing that those words themselves stood
for different concepts.  Interesting!"

to the topicmaps-comment forum (at Oasis).

***

<Paul Prueitt>

A brief note here regarding the scope of a word due to language setting.  I
think that what I will say here will not be a surprise to linguists.

It is NOT simply an "technical understanding of the language" that provides
the real scope of a word in a language.  Meaning occurs and can only be
fully understood in the cultural setting and realities of the social system.
To hold the opposition position (that an Interlingua exists in an absolute
sense) is speculative, at best.  This position is reductionism at core (this
is my claim), since it claims that all natural language can be reduced to a
single deep structure.  Perhaps Professor Lakoff will make a comment on
this?

"Contextual is also pragmatic, as the word *lives* in a cultural setting.
(Fiona Citkin, Head translator of the ARL sponsored conference (1995 - 1999)
on Soviet Semiotics) private communication.)"

In most cases the (Whorf?) problem is not so bad.  However, in many cases
profound misunderstanding can come because of an assumption that it is a
technical understanding of a second language that stands in for the cultural
experience. Yes?   Machine translation systems have this problem often.
Yes?

On the practice of constructing static topic map?   Well **perhaps** the TM
community sees the real problem that comes from an early binding of scope
during the production of TMs by one person and the use of the TM by someone
who has a different point of view.

These TM are becoming engines that will do things?  And thus the issue of
false Sense Making is vital - since evidence indicate miscommunication
**between humans** sometimes distorts the meaning in diplomatic channels.
Tonfoni makes the (private) argument that diplomatic miscommunication was
responsible for much of the diplomatic errors made before the Gulf War.
{Certainly, the American Nation is close, in many instances, to false sense
making with respect to many issues where we are using great force to achieve
outcomes that is proper, but that... we are not properly understanding the
**scope**.   }  This is not a small matter!

*False* sense making (Karl Weick, Sensemaking in Organizations), using off
the shelf ontology (static TM), is a big problem that is not completely
solved using HyTime...

http://www.bcngroup.org/area3/pprueitt/private/KM_files/frame.htm

The issue is reflected in the problem with machine based declassification
and a operational theory of similarity, as I have stated in:

http://www.bcngroup.org/area3/pprueitt/SDIUT/sdlong.htm

This is a long and unpublished paper.

I hope that the TM community will realize that I am NOT criticizing the
important work that has been done over the past several years using Topic
Maps.  But there continues to be a problem, and Bernard's message states
this problem *perfectly*.  yes?

***


I have an approach to mapping the functional load between one word and all
other words in natural use in a language.  This is completely novel and new
(I think).

It is the eventChemistry as applied to word co-occurrence.  I have studied
the Aesop fable collection in English... but I need some help with issues
like noun and verb differentiation.. and case grammars.  There are a lot of
similarities to Latent Semantic Indexing.. but eventChemistry has
visualization and a few other surprises.

Is there anyone (a linguist) who would like to do this work on the fable
collection (likely requiring 30 - 40 hours of effort, using the
eventChemistry software.  What we might go after is a description of the
functional load of some of the terms as used by Aesop in his fables.

http://www.ontologystream.com/bSLIP/finalReview.htm

So, some of you already see where this is going; the notion is that mapping
single word usage in natural settings will provide a single atom (node with
affordance links)  --- as in Peirce's Unifying Logic Vision...  concepts are
like chemical compounds that are composed of atoms".

This single atom is like the event atoms I have developed to study cyber war
and innovation adoption (both of these are **intrusions** from one level of
natural activity into another level of natural activity.)  Please just look
at the short paper on this at the above URL.

It would seems that this would make a good publication, and perhaps even
identify a value proposition?

The mark-up of the context setting is addressed nicely in the work of
Tonfoni

http://www.bcngroup.org/area3/gtonfoni/EIVD/index.html



Paul Prueitt
OntologyStream Inc.
Chantilly VA


I have copied Bernard's message below for two other forums.. as the issue of
scope is so beautifully expressed:



****

-----Original Message-----
From: Bernard Vatant [mailto:bernard.vatant@mondeca.com]
Sent: Friday, February 01, 2002 4:46 AM
To: topicmaps-comments
Cc: stefan.jensen@eea.eu.int
Subject: Re: [topicmaps-comment] multilingual thesaurus - language,
scope, and topic naming constraint




Thanks to all who tried to answer, both on this list and through private
communications.

Now let me expose what I found out yesterday night - just after switching
off the
computer - with that delicious feeling you have when a long searched
solution suddenly
appears obvious and crystal clear, just because you have, at last, looked at
it the right
and simple way, and all the previous attempts look awkward and far-fetched.

But, be patient. A bit of history. Last year, I was investigating that
question with
Seruba research team, unfortunately swept from the scene by economical
constraints. The
solution I had suggested at the time was to consider terms in different
languages as n
distinct topics, independent from the abstract descriptor, itself considered
topic n+1.
And then link those guys together through associations, asserting something
like:
"This topic is an abstract descriptor, representing an abstract concept,
independent from
any language. Those topics represent the term used in those languages to
represent this
descriptor concept".
In putting the concept and the terms on different levels of topics, we had a
technical way
to manage synonymy and polysemy. But, like solutions proposed by Kal or Tom,
that was only
a stealth, and I remember one of Seruba's linguists, very skeptical about
it, keeping
saying to me "It works, but it does not make sense!"

And he was right! The only sustainable viewpoint is that there is no such
thing as a
*concept independent of its representation by a term in a certain language*.
Every
attachment of a term to a concept is always asserted in the scope of a
certain language,
and every other language conveys a slightly or radically different view of
the world and
organisation of concepts, and that's why lingual diversity is so precious,
and translation
so difficult ...

So we have to go back to basics: one subject = one topic.
(DAN : �konomi), (DUT : economie), (ENG : economy), (FRE : �conomie), (GER :
Wirtschaft),
(SPA : econom�a) convey a priori six different concepts and views of the
world, that
someone familiar with all those languages could certainly feel, even if the
differences
are subtle. Hence they are six different subjects, and therefore have to be
represented by
six different topics. They are not six names of the same topic in different
scopes, and
definitely not variants.
And they are not even representations of a same descriptor in different
languages. The 7th
topic, standing in the middle of nowhere outside of any language scope, does
not make
sense, because it has no meaningful subject. Note that if you give a
definition of the
descriptor, you always give it in some default language ...

So what is a descriptor, putting together those six concepts for the purpose
of
cross-language communication and translation?
What do you do when you gather topics? Obvious - you build an association.
And what is the
scope of that association? The scope of the language viewpoint from which
you assert this
association, that means the default language of the thesaurus ...
This association asserts that those topics can be considered as
"equivalent", allowing a
translation which makes sense, maybe in a certain scope. Note that the scope
is not on the
names, but on the association. And that the associations are not necessarily
the same if I
stand from another language viewpoint. So if I edit the thesaurus with a
different default
language, I will certainly have to change the set of associations.

That approach is deeply respecting the diversity of *concepts* conveyed by
the different
languages. All previous approaches are in fact killing the linguistic
diversity, if you
look at them closely, because the default language of the descriptor imposes
the set of
concepts, and the other languages are to find willy-nilly a name for it.

And this is really enabled by the topic map representation.

Think about it. I've got to put all that in XTM now.

Regards

Bernard




To unsubscribe from this group, send an email to:
eventChemistry-unsubscribe@yahoogroups.com



Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/



------------------------ Yahoo! Groups Sponsor ---------------------~-->
Get your FREE credit report with a FREE CreditCheck
Monitoring Service trial
http://us.click.yahoo.com/ACHqaB/bQ8CAA/ySSFAA/0EHolB/TM
---------------------------------------------------------------------~->

To unsubscribe from this group, send an email to:
eventChemistry-unsubscribe@yahoogroups.com



Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/