[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [legaldocml-comment] [COMMENT] Discrepancy in akn-media-v1.0-csprd01.pdf Document
Hi Fabio,On Wed, Jun 24, 2015 at 5:03 AM, Fabio Vitali <fvitali@gmail.com> wrote:
Thank you for pointing this out for us. This is clearly a copy/paste done wrong.
The sentence should be:
" There is no single initial octet sequence that is always present in Akoma Ntoso documents. "
Grand. Many eyes makes all bugs shallow.
I would think that trying to deduce an XML media type from magic bytes is in general unreliable and overly complex. The rfc 2376 (XML Media Types) [1] has this to say about magic numbers:
> Magic number(s): none
>
> Although no byte sequences can be counted on to always be present,
> XML entities in ASCII-compatible charsets (including UTF-8) often
> begin with hexadecimal 3C 3F 78 6D 6C ("<?xml"), and those in
> UTF-16 often begin with hexadecimal FE FF 00 3C 00 3F 00 78 00 6D
> or FF FE 3C 00 3F 00 78 00 6D 00 (the Byte Order Mark (BOM)
> followed by "<?xml"). For more information, see Annex F of [REC-
> XML].
Slightly off point but the purpose here is that MagicByte detection is only one of a number of detection methods we utilize within Tika. I understand that AKN files have no specific magic byte fingerprint so that is OK... I can move on :)Thanks Fabio, I'll update the AKN GoogleGroup and this public list once the AKN parser for Tika is finished.Excellent work @kohsah for driving on development of akomantoso-lib Java project. Kudos.Lewis
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]