was message

Subject: Re: [was] Schema Started
From: "Mark Curphey" <mark@curphey.com>
To: <ingo@ingostruck.de>,<was@lists.oasis-open.org>
Date: Wed, 10 Sep 2003 22:39:38 -0400
Ingo,

Thanks and it is great to have you onboard !

I don't claim to know enough about schema or DTD to be authoritatve here,
but I do know Schema was easy enough for me to pick up and start writing
meaningful stuff after an hour of reading a book. What I also know is we
need to choose one OR the other to move forward in unison. The reason schema
would get my vote would be that I think the types will be a mix of strings
and other things such as dates, booleans and URI's etc. Plus we have already
started the Thesaurus / Classification in it. By designing with these from
day one I personally think its easier, but I am obviously happy to go with
the majority. As so far its only you, me, Rogan and Andy writing any
document (and the rest reading hopefully) I think the four of us can work it
out easily and move forward quickly. Lets do that after tomorows meeting /
offline.

You kinda started in a different direction from what I was doing so I won't
continue until after tomorows meeting if appropriate but some thoughts.....

I totally agree in the strong structure. It is THE most important thing.

Natural language - that makes perfect sense. Your experience beats my
niavity anyday !

Characteristics - if the intention is to group like elements (which I think
is a great idea as well), I am not sure I understand the difference or maybe
the need for seperate groupings of security chacteristics and basic. To me
the easiest model would be for two groupings with sub-groups if appropriate.

WASDescription
    Reference
    Remedy
TestCase

The Descriptive / Reference should contain all of the elements we have been
discussing such as fix information, references to other databases, the
thesaurus, ID etc  and the Test contains the executeable signature. The
TestCase could import / include the exploit.

If this simple model works then can I propose we work on the Description
first. This will be more managable to tackle one at a time, the
significantly easier of the two to do and will allow us to focus on one
problem at a time. I see no reason why this can't be finished this week.

On that note I met Andy Jaquith for a fine beer last night and we chatted
some about companies wanting to use vuln informaiton for statistical
analysis. Things like Of the vulns found, "how long have patches been
available ?", "What is the latency between advisory and exploit?", what is
the most common category of issues found?" etc. I think this is an
interesting thought process to help think about the data in the Description
section as well as the information needed to run a production vuln database
such as provider info, versioning info, author credits, copyright, licensing
etc

Seem OK ? If so lets chat after the meeting tomorow about DTD and Schema and
get WASDescription bit done ASAP.


----- Original Message ----- 
From: "Ingo Struck" <ingo@ingostruck.de>
To: <was@lists.oasis-open.org>
Cc: <mark@curphey.com>
Sent: Wednesday, September 10, 2003 7:54 AM
Subject: Re: [was] Schema Started


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Mark, hi all...

First (again) a remark about document formats used:
*please*, please use interoperable formats for your documents and no
word-docs. I am getting kind of tired to "antiword" documents on a regular
basis. Even though I am personally not so very easy with DocBook, it is
one of the more interoperable formats recommended by OASIS and there
are even some templates around (cf. http://www.oasis-open.org/spectools/)
and so I think we should stick to that.

That said, here you go with some thoughts regarding the implementation
of some of the requirements:

i18n - I saw that you introduced a "naturalLanguage" element in the spec.

Based on experience with a multilingual thesaurus project I would say that
this doesnt make much sense. For a working, real multilang support, you
need to do the following things:
- - make each "natural" text element repetitive
- - qualify each occurence with a script (e.g. "latn", "hebr")
- - qualify each occurence with a natural language code, preferably
  with ISO 639-2 (three letter lang code)

The "script" qualifier is optional and supportive information for the
renderer
and to support different transcriptions. Generally each "natural" text
element
should use Unicode only and should be UTF-8 encoded. The mandatory overall
encoding thus should be UTF-8.

> The basic question I have (for this mail anyway) is what do we want the
> overall structure to look like ? Jeff Williams sent an email out a while
> back with a proposal for 5 main sections.
>
>   1 - basic characteristics of the vulnerability
>   2 - security characteristics of the vulnerability
>   3 - characteristics related to finding the vulnerability
>   4 - characteristics related to exploiting the vulnerability
>   5 - characteristics related to remedying the vulnerability
I think that the usage of differenc "characteristics" makes much sense after
all.
- From the outlines what needs to be stored there, I derived
a generic model for a single characteristic. This allows for
uniform description and simplified search criteria upon retrieval.

Based upon that I would propose the attached overall
structure of a WAS-core entry:

A WASDescription consists of generic information that can be indexed
and used to search the database directly as well as of a set of
characteristics describing the problem.

I modelled that into a proposal as a DTD. From that I generated an XML
schema
using a patched version of dtd2xs 1.60 (cf.
http://puvogel.informatik.med.uni-giessen.de/lumrix/)

If you compare them, you'll see that the DTD is much better to read.

Since a former mail of mine did not yet reach this list, I would like to
repeat a part of that:

=== snip ===

Lets try to design the overall structure of the schema using a DTD
and then transform it to an XML schema later on. This would have
the following advantages:

- - the DTD is not an optimal, but much more compact description of the
  overall structure (e.g. the cardinality information consists only of one
  symbol rather than of a lengty minOccurs, maxOccurs; lists are better
  to read etc.) so that the description is better to read and easier to
  modify. I think we will find that the "advanced" features of XML schema
  are seldom, if at all, used anyway (only for data types which will be
  mainly strings)
- - we can focus first on structure (rough outline, *what* is needed and
  fine-tuning of cardinality) and then on detailled typisation later on
- - there is a *working* application online, where anybody could play around
  with the latest schema using an editor to create sample entries.
  These sample entries are made publicly available and can be reviewed
  by the rest of the TC (and the rest of the world), so the latest schema
  proposal could be checked easily for practicability.
  This application is freely available (SF CVS), based upon DTDescription
  and could be adapted within less than an hour to any new DTD, but it
  would cost weeks (i.e. one or two man days) to plug an XML schema parser
  to it.
- - turning the "final" DTD into an XML scheme is not much pain
  (it just means to bloat it to about 300% with tag-style non-information,
   which could be performed by a DTD2scheme converter, a very simple script
   or even a handful of vi commands)

> XML schema can be daunting at first.
To be honest, XML schema remains daunting, and a slick BNF notation
or even ASN.1 would be of much more use, but within this scope I accept
that we have to create a schema. ;o)

=== snap ===

Mark answered to that:
 "I think that this maybe
 counter intuitive as to design the extensibility etc I think it would be
 easier to use a schema (like most things I am a total novice but based on a
 furious weeks reading) may make more sense. To use object types and to
 define object types for reuse etc would seem like design goals from day
one?"

I would rather say that we need a strong structure much more than strong
typisation; most of the types we need here are/will be "String" anyway.

Having a full-fledged working online application where the current structure
could be tested for practicability seems to me a very strong argument too.

Kind regards

Ingo

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.0 (GNU/Linux)

iD8DBQE/XxDjhQivkhmqPSQRAsOpAKCPXAfYj2uiCFLXVPvwgf+/2R0sAwCdE7A3
yMDUVFFu4B+32btO+7kGdZw=
=XUsR
-----END PGP SIGNATURE-----
References:
- Schema Started
  - From: "Mark Curphey" <mark@curphey.com>
- Re: [was] Schema Started
  - From: Ingo Struck <ingo@ingostruck.de>