humanmarkup-comment message

Subject: [humanmarkup-comment] multilingualthesaurus - language, scope,and linguistic functional load
From: psp <beadmaster@ontologystream.com>
To: Rex Brooks <rexb@starbourne.com>, Rob Nixon <rnixon@qdyn.com>,Emanuele Caglioti <caglioti@mat.uniromal>, Albright <palbrigh@ostp.eop.gov>
Date: Sun, 03 Feb 2002 13:53:45 -0500
<header>  Please feel free to delete this message. </header>


Rob and other respected colleagues,



This (the recent citation to Physics News, see below) is additional evidence
that my notion of using a compression dictionary to understand the (dynamic
topic ecosystem that IS the ?) bit stream is feasible.

http://www.ontologystream.com/EI/slipstream_files/frame.htm

in the context of a Defense System against Cyber war.

http://www.ontologystream.com/administration/toOSTP.htm


This gives the bottom layer, in a four layer stratified ontology, in a way
that cost almost nothing.

In fact compression and encryption will speed Internet transactions if more
of the transmissions are automatically compressed... while going through
certain randomly selected measurement points (gates).  Remember that
intentional expression in literature (conjecture by Dr. Cameron Jones) and
in cyber war (conjecture by Dr. Paul Prueitt) is fractal in the sense that a
small number of generators are being applied over and over and in slightly
different orderings. { Something here about J J Gibson, Robert Shaw and
perceptual physics..}

http://www.ontologystream.com/bSLIP/S&FT.htm

Smolensky's sub symbolic "harmony theory" is relevant here as is Thagard's
work on explanatory coherence and automated legal reasoning.  This leads
into the deeper (machine synthetic intelligence) architecture I have
proposed based on Pribram holonomic theory of brain function.

It is too bad that a business model has not be developed to allow this to be
integrated into a single system (yet).  The business types seem to believe
that if it is not already understood then there is no reason to talk about
it.  (This is a claim, not intended to be an insult - but just pointing out
a problem that most of us here are well aware of).  How can the social value
of deep theory ever be measured if the business mind always inhibits it
because a narrow business model is not automatically available?

The existing state of the art in cyber defense systems is specified by a
note from a Mitre Informational Assurance scientist to me:

"I looked at the web link you provided, and from an engineering
viewpoint, things are done the way you describe.  That is, the bottom
layer consists of sensors which collect data and pre-process the data
(e.g., normalization, aggregation, etc.).  The next layer consists of
analyzers (e.g., intrusion detection modules, classifiers,
correlators, etc.).  The next layer consists of providing situational
awareness (e.g., visualization, mapping system events to mission
impacts, etc.).  Finally, there is a data repository that maintains
the various models (e.g., domain model, mission model, user model,
etc.) and maps between them.

In your message, you talk about masking out normal traffic so that
what's left is the interesting stuff.  This is typically done not at
the sensor level (on raw data), but at the analyzer level (on events
raised by various analysis algorithms).  For instance, one may filter
out port scans (which are a large fraction of raised alerts) to focus
in on more interesting alerts.  MITRE has a paper on this (published
in the RAID conference either last year or the year before).

The main problem that I have is that you speak a very different
language from what we in the security community speak, so I'm unable
to understand what you're proposing. "


Yes, eventChemistry is a new language, because it is derived from a
stratified theory of process.  It is assumed by everyone who works in
Informational Assurance that the entire web can not be understood all at
once and in real time.  I think that this assumption is not correct.

If one needs pictures, I have developed a URL for you.

http://www.ontologystream.com/bSLIP/finalReview.htm



***

What this Italian group (see copy below) is doing is developing a
compression dictionary as if for zipping, but then stream the compression
tokens and do n-gram analysis, and then something like Latent Semantic
Indexing (or eventChemistry).  (Well, if they did not quite get to this,
then perhaps they should try the n-gram stuff. I claim that it is this
feature of the proposed Defense System that makes it possible to shed light
into the dark alley of the Internet.)  Perhaps I can go work in Italy?

My event Chemistry will do the functional load for co-occurrence in this
n-gram stream, and produce event maps - where the event log is a parse of
the n-grams of compression token of the bit stream.  These will be more
powerful that ANY current generation Intrusion Detection System... (this is
not a huge claim)  because the rules will match patterns in an "small"
abstraction space as opposed to trying to treat individual sequences as
something to be recognized without this simplifying (in terms of
computational load) abstraction.  (Imagine trying to count without the
abstractions of numbers.)

Not understanding the above paragraph is not because it is not simple, but
because it is thinking from a stratified (Peircean) point of view.  Sorry,
it does not get simpler... but reading it several time often helps.

This is not using IDS log files, but rather the 0s and 1s in the packets.

The overhead is what one needs to do either compression or encryption...
nothing more or less.

The topic maps....  well these are needed to communicate event graphs and
metadata form one analyst to another analyst. HyTime and Petri nets... hum
has anyone looked into this...  Can anyone here describe the *behavior* of a
colored Petri Net using Topic Maps and HyTime ?   What about the results
from Latent Semantic Analysis, has anyone looked at representing the forward
pointing (not yet specified) associations between topics using HyTime.  Will
someone here give a simple description of HyTime and how this addresses the
problems related to temporal changes?

**

On the notion of mapping functional load for a specific literature, say a
book, in one language and then mapping the functional load of the
translation of the book; well I think that this is what terminology science
is all about.

Rex said:

"I believe,(this is a belief
not an assertion of fact), that it is necessary for topics to include
cross references to similarly translated concepts in as many language
as feasible, perhaps as  a specific cross-language reference to
associations, in order for ease of collecting divergent meanings for
"similar" not identical, topics, as words or phrases."


These cross references should shown the similarity and differences of the
linguistic functional load for the term in one language and a
"corresponding" term in a second language?  Yes?



-----Original Message-----
From: Rob Nixon [mailto:rnixon@qdyn.com]
Sent: Sunday, February 03, 2002 11:00 AM
To: Rex Brooks
Cc: psp; Steven R. Newcomb; Topicmaps-Comment; Humanmarkup-Comment;
eventChemistry
Subject: Re: [humanmarkup-comment] RE: AW: [topicmaps-comment]
multilingualthesaurus - language, scope, and topic naming constraint


Here is something that may be of interest regarding the current discussion:

-Rob

---------

>From (AIP) PHYSICS NEWS UPDATE - update.575

SQUEEZING INFORMATION FROM ZIPPING PROGRAMS.
Data compression programs, such as the file zipping applications
found on many personal computers, provide an unusual means to
analyze information. Researchers at the La Sapienza University in
Rome (Emanuele Caglioti, caglioti@mat.uniromal.it, 39-06-4991-
4972) have demonstrated how compression routines can accurately
identify the language, and even the author, of a document without
requiring anyone to bother reading the composition. The key to the
analysis is the measurement of the compression efficiency that a
program achieves when an unknown document is appended to
various reference documents.
      Zipping programs typically compress data by searching for
repeated strings of information in a file. The programs record a
single copy of the information and note the locations of subsequent
instances of the string. Unzipping a file consists of replacing various
bits of information at the locations recorded by the zipped file.
Such file compression routines work better on long files because
programs are, in effect, learning about the type of information they
are encoding as they move  through the data. Add a page of Italian
text to an Italian document, and a zipping  program achieves good
efficiency because it finds words and phrases that appear earlier  in
the file. If, however, Italian text is appended to an English
document, the program is forced to learn a new language on the fly,
and compression efficiency is reduced.
       The researchers found that file compression analysis worked
well in identifying the language of files as short as twenty characters
in length, and could correctly sort books by author more than 93%
of the time.  Because subject matter often dictates vocabulary, a
program based on the analysis could automatically classify
documents by semantic content, leading to sophisticated search
engines. The technique also provides a rigorous method for various
linguistic applications, such as the study of the relationships
between different languages. Although they are currently focusing
on text files, the researchers note that their analysis should work
equally well for any information string, whether it records DNA
sequences, geological processes, medical data, or stock market
fluctuations. (D. Benedetto, E. Caglioti, and V. Loreto, Physical
Review Letters, 28 January 2002)



----------------------------------------------------------------
To subscribe or unsubscribe from this elist use the subscription
manager: <http://lists.oasis-open.org/ob/adm.pl>
References:
- Re: [humanmarkup-comment] RE: AW: [topicmaps-comment]multilingualthesaurus - language, scope, and topic naming constraint
  - From: Rob Nixon <rnixon@qdyn.com>