office-metadata message

Subject: Re: [office-metadata] Atom and document/feed IRIs

From: Patrick Durusau <patrick@durusau.net>
To: Bruce D'Arcus <bruce.darcus@OpenDocument.us>
Date: Wed, 07 Mar 2007 06:37:23 -0500

Bruce,

Interesting.

I remember in the newsgroup days that all messages had unique IDs (well, 
sort of, they were actually recycled after several years) which is how 
they did threading.

I am definitely leaning towards the IRI approach, mostly because it 
makes the subject stable.

Hope you are having a great day!

Patrick

Bruce D'Arcus wrote:

> For comparison, here's how Atom handles feed or document identification:
>
> First, an informal primer of sorts:
>
> <http://diveintomark.org/archives/2004/05/28/howto-atom-id>
>
> Second, the spec:
>
>>    The "atom:id" element conveys a permanent, universally unique
>>    identifier for an entry or feed.
>>
>>    atomId = element atom:id {
>>       atomCommonAttributes,
>>       (atomUri)
>>    }
>>
>>    Its content MUST be an IRI, as defined by [RFC3987].  Note that the
>>    definition of "IRI" excludes relative references.  Though the IRI
>>    might use a dereferencable scheme, Atom Processors MUST NOT assume it
>>    can be dereferenced.
>>
>>
>>
>> Nottingham & Sayre          Standards Track                    [Page 19]
>>
>> 
>> RFC 4287                      Atom Format                  December 2005
>>
>>
>>    When an Atom Document is relocated, migrated, syndicated,
>>    republished, exported, or imported, the content of its atom:id
>>    element MUST NOT change.  Put another way, an atom:id element
>>    pertains to all instantiations of a particular Atom entry or feed;
>>    revisions retain the same content in their atom:id elements.  It is
>>    suggested that the atom:id element be stored along with the
>>    associated resource.
>>
>>    The content of an atom:id element MUST be created in a way that
>>    assures uniqueness.
>>    Because of the risk of confusion between IRIs that would be
>>    equivalent if they were mapped to URIs and dereferenced, the
>>    following normalization strategy SHOULD be applied when generating
>>    atom:id elements:
>>
>>    o  Provide the scheme in lowercase characters.
>>    o  Provide the host, if any, in lowercase characters.
>>    o  Only perform percent-encoding where it is essential.
>>    o  Use uppercase A through F characters when percent-encoding.
>>    o  Prevent dot-segments from appearing in paths.
>>    o  For schemes that define a default authority, use an empty
>>       authority if the default is desired.
>>    o  For schemes that define an empty path to be equivalent to a path
>>       of "/", use "/".
>>    o  For schemes that define a port, use an empty port if the default
>>       is desired.
>>    o  Preserve empty fragment identifiers and queries.
>>    o  Ensure that all components of the IRI are appropriately character
>>       normalized, e.g., by using NFC or NFKC.
>>
>
>
>
>

-- 
Patrick Durusau
Patrick@Durusau.net
Chair, V1 - Text Processing: Office and Publishing Systems Interface
Co-Editor, ISO 13250, Topic Maps -- Reference Model
Member, Text Encoding Initiative Board of Directors, 2003-2005

Topic Maps: Human, not artificial, intelligence at work!

Follow-Ups:
- Re: [office-metadata] Atom and document/feed IRIs
  - From: Svante Schubert <Svante.Schubert@Sun.COM>

References:
- Atom and document/feed IRIs
  - From: Bruce D'Arcus <bruce.darcus@OpenDocument.us>