OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

legaldocml-comment message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [legaldocml-comment] [COMMENT] Discrepancy in akn-media-v1.0-csprd01.pdf Document


Hi Fabio,

On Wed, Jun 24, 2015 at 5:03 AM, Fabio Vitali <fvitali@gmail.com> wrote:

Thank you for pointing this out for us. This is clearly a copy/paste done wrong. The sentence should be:

" There is no single initial octet sequence that is always present in Akoma Ntoso documents. "


Grand. Many eyes makes all bugs shallow.
 

I would think that trying to deduce an XML media type from magic bytes is in general unreliable and overly complex. The rfc 2376 (XML Media Types) [1] has this to say about magic numbers:

> Magic number(s): none
>
>       Although no byte sequences can be counted on to always be present,
>       XML entities in ASCII-compatible charsets (including UTF-8) often
>       begin with hexadecimal 3C 3F 78 6D 6C ("<?xml"), and those in
>       UTF-16 often begin with hexadecimal FE FF 00 3C 00 3F 00 78 00 6D
>       or FF FE 3C 00 3F 00 78 00 6D 00 (the Byte Order Mark (BOM)
>       followed by "<?xml").  For more information, see Annex F of [REC-
>       XML].


Slightly off point but the purpose here is that MagicByte detection is only one of a number of detection methods we utilize within Tika. I understand that AKN files have no specific magic byte fingerprint so that is OK... I can move on :)
 
Thanks Fabio, I'll update the AKN GoogleGroup and this public list once the AKN parser for Tika is finished.
Excellent work @kohsah for driving on development of akomantoso-lib Java project. Kudos.
Lewis


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]