OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.

 


Help: OASIS Mailing Lists Help | MarkMail Help

cti-stix message

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


Subject: Re: [cti-stix] Supporting translations in STIX


RFC 2277 only talks about protocols, rather than content, because it doesn't make a distinction - it occasionally refers to data, but that's about it. But it's clearly talking about what we'd now call content, as it discusses HTTP, email, and so on.

As a language, "i-default" is obviously not a language in a traditional sense; it's a language tag to indicate a fallback

I agree that in almost every case, we'll be in a position to negotiate a suitable language tag, and know what language tag the original data was written in. But there are edge cases, such as dealing with unmarked legacy data, perhaps.

This is why in the normative text I sketched out, I used "SHOULD" for the lang attribute, in the sense of the RFC 2119 definition which we cite:

3. SHOULD   This word, or the adjective "RECOMMENDED", mean that there
   may exist valid reasons in particular circumstances to ignore a
   particular item, but the full implications must be understood and
   carefully weighed before choosing a different course.

That doesn't mean optional, or even that one ought to include a language tag most of the time, but that the language tag must always be present - except when that proves impossible.

When it's impossible, I suggested that the missing language tag "MUST" be considered to be "i-default", such that it doesn't match any particular language, and "SHOULD" be understandable to an English speaker.

Dave.


On 27 June 2016 at 14:43, Jason Keirstead <Jason.Keirstead@ca.ibm.com> wrote:

i-default Is not for marking content though. STIX is a content specification, it is not a protocol specification. i-default Is to be used inside of a protocol when no language has been negotiated. As STIX is not a protocol, it is inapplicable. A given piece of content is never "i-default", it is always *something*. If the person who authored the content did not specify it, you would have to guess - it is not sufficient to treat it as "i-default", because this has no meaning as "i-default" is defined to not be a language.

As to the default assumption being "en-US" - stating that the default assumption for an unspecified "lang" attribute is treated as en-US is *not* the same as specifying a "default language", rather it is specifying a fallback assumption. However, I would be just as happy with making "lang" mandatory.

-
Jason Keirstead
STSM, Product Architect, Security Intelligence, IBM Security Systems
www.ibm.com/security | www.securityintelligence.com

Without data, all you are is just another person with an opinion - Unknown


Inactive hide details for Dave Cridland ---06/27/2016 09:58:44 AM---No, you shouldn't really have any content explicitly markedDave Cridland ---06/27/2016 09:58:44 AM---No, you shouldn't really have any content explicitly marked i-default. But neither should you mandat

From: Dave Cridland <dave.cridland@surevine.com>
To: Jason Keirstead/CanEast/IBM@IBMCA
Cc: Allan Thomson <athomson@lookingglasscyber.com>, "Wunder, John A." <jwunder@mitre.org>, "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, Bret Jordan <bret.jordan@bluecoat.com>
Date: 06/27/2016 09:58 AM


Subject: Re: [cti-stix] Supporting translations in STIX
Sent by: <cti-stix@lists.oasis-open.org>




No, you shouldn't really have any content explicitly marked i-default. But neither should you mandate a default language other than i-default. If we do, it ought to be Mandarin Chinese. If your argument is that most people speak at least *some* English, than that is also the argument of i-default...

RFC 2277 details its use in section 4.5, but loosely, i-default is used when there's no content negotiation. When there is, or when there's a known language, that should be used instead. So what I'm leaning toward is:

Objects SHOULD have an explicit "lang" attribute providing a language tag describing the language used by the human-readable text within the object. If this is absent, the language tag MUST be treated as "i-default", and the human-readable text SHOULD be understandable to an English speaker.

The above text means that if you've got content written in US English but the lang attribute is missing, it'll be fine. If you want to add a language tag, then "en-US" is the right one, too. Mandating a particular language tag feels like walking into the same problems that HTTP did by mandating iso-8859-1 as the charset - it introduced a slew of problems that haven't ever been fully resolved.

On 27 Jun 2016 12:46, "Jason Keirstead" <Jason.Keirstead@ca.ibm.com> wrote:

    I don't think i-default is meant to be used in this way (to mark content).

    i-default is not titled "
    English for an International Audience", it is titled "Default Language". The reason it exists in the language registry is for implementations to use as a place-holder until another language has been negotiated. Example:

        A server that advertises this extension MUST use the language
          "i-default" as described in [
        RFC2277] as its default language until
          another supported language is negotiated by the client.


        defaultLocale
         is the original language of the Context instance and will be used as the last fallback locale if other locales are registered. If it is undefined, or if registerLocales hasn't been called at all, the Context instance will create a special locale called i-default to be used as the default.

    Since STIX is actually a piece of content, marking it as "i-default" doesn't make a lot of sense. A piece of content is always *something*.

    If a lang attribute is to be added, my vote is to either make it mandatory, or make en-US the default.


    -
    Jason Keirstead
    STSM, Product Architect, Security Intelligence, IBM Security Systems

    www.ibm.com/security | www.securityintelligence.com

    Without data, all you are is just another person with an opinion - Unknown


    Inactive hide details for Dave Cridland ---06/25/2016 04:51:42 AM---That would be a little odd, given i-default is specificallyDave Cridland ---06/25/2016 04:51:42 AM---That would be a little odd, given i-default is specifically intended for this. It's not deprecated.

    From:
    Dave Cridland <dave.cridland@surevine.com>
    To:
    Bret Jordan <bret.jordan@bluecoat.com>
    Cc:
    Allan Thomson <athomson@lookingglasscyber.com>, Jason Keirstead/CanEast/IBM@IBMCA, "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, "Wunder, John A." <jwunder@mitre.org>
    Date:
    06/25/2016 04:51 AM
    Subject:
    Re: [cti-stix] Supporting translations in STIX
    Sent by:
    <cti-stix@lists.oasis-open.org>




    That would be a little odd, given i-default is specifically intended for this. It's not deprecated.

    On 25 Jun 2016 00:25, "Jordan, Bret" <bret.jordan@bluecoat.com> wrote:

        Or drop all of the confusion with grandfathered things and just use "en" or "en-us" as the default.

        Bret 

        Sent from my Commodore 64

        On Jun 24, 2016, at 12:51 PM, Dave Cridland <
        dave.cridland@surevine.com> wrote:
                Jason,

                http://www.iana.org/assignments/lang-tags/i-default

                Grandfathered means it predates the registry, and wasn't added under the formal rules, I believe. I've only created a single IANA registry though, so I'm hardly an expert.

                Dave.

                On 24 Jun 2016 20:33, "Jason Keirstead" <Jason.Keirstead@ca.ibm.com> wrote:
                I am not an expert on this at all, but looking at the registry it says "i-default" is "grandfathered", not sure if that implies "deprecated" or not (?)


                http://www.iana.org/assignments/language-subtag-registry/language-subtag-registry

                %%
                Type: grandfathered
                Tag: i-default
                Description: Default Language
                Added: 1998-03-10
                %%


                -
                Jason Keirstead
                STSM, Product Architect, Security Intelligence, IBM Security Systems

                www.ibm.com/security | www.securityintelligence.com

                Without data, all you are is just another person with an opinion - Unknown


                <graycol.gif>
                Dave Cridland ---06/24/2016 04:23:58 PM---Allan, As I recall, "i-default" is "English for an International Audience" or some

                From:
                Dave Cridland <dave.cridland@surevine.com>
                To:
                Allan Thomson <athomson@lookingglasscyber.com>
                Cc:
                cti-stix@lists.oasis-open.org, "Wunder, John A." <jwunder@mitre.org>
                Date:
                06/24/2016 04:23 PM
                Subject:
                Re: [cti-stix] Supporting translations in STIX
                Sent by:
                <cti-stix@lists.oasis-open.org>





                Allan,

                As I recall, "i-default" is "English for an International Audience" or some such. So it's English of sorts. I'm sitting on the sofa in post-brexit shock, however, and may not have that *quite* right.

                In practise, "i-default" is either the C locale or US English I believe.

                It's given as a special token to avoid a "better" English translation taking precedence.

                Dave.

                On 24 Jun 2016 20:18, "Allan Thomson" <athomson@lookingglasscyber.com> wrote:

                        I would prefer optional with a default of “English” value.

                        If anyone cares about which English version then suggest EU-English. (Had to make that joke based on Brexit news).

                        allan

                        From: "
                        cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org> on behalf of "Wunder, John" <jwunder@mitre.org>
                        Date: Friday, June 24, 2016 at 12:06 PM
                        To: Dave Cridland <
                        dave.cridland@surevine.com>
                        Cc: "
                        cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>
                        Subject: Re: [cti-stix] Supporting translations in STIX

                        Writing normative text would help us a ton, thanks! I think we need two things:


                        1.      The row in the property table:

                        Property Name

                        Type

                        Description

                        lang (optional/required)

                        string

                        ?????



                        2.      A new 6.x section in the STIX Core document (sibling of versioning, object markings, etc.) with any other text we need (if any).

                        I would say we either make the field optional with a default of “i-default” or we make it required and force people to say what language they’re providing. We don’t want to tie STIX to TAXII but if there are transport considerations you think we should include at a more generic level we could do that in the 6.x section. I’d reach out to Bret and Mark on the TAXII side to include the Accept-Language stuff directly in those specs.

                        Thanks again,
                        John

                        From: Dave Cridland <
                        dave.cridland@surevine.com>
                        Date: Friday, June 24, 2016 at 2:05 PM
                        To: "Wunder, John A." <
                        jwunder@mitre.org>
                        Cc: "
                        cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>
                        Subject: Re: [cti-stix] Supporting translations in STIX


                        2(a) for now. I'm assuming a IANA language tag here, with a default of "i-default" (from memory).

                        I think translation objects will work, an alternate design might be to duplicate the entire object (reference and all), or have a relationship to indicate equivalent objects (which allows for both translations and more complex equivalences).

                        We'd want TAXII to mention something about Accept-Language for HTTP, and maybe note about other l11n capabilities in other transports (eg, stream language in XMPP), and payload formats (Content-Language in HTTP, email, and stanza language tag in XMPP).

                        I can knock out some formal normative text if you like.

                        Dave.
                        On 24 Jun 2016 16:28, "Wunder, John A." <
                        jwunder@mitre.org<mailto:jwunder@mitre.org>> wrote:
                        All,

                        You’re probably aware that we’ve had a bit of work over the past couple months on the best approach to support translations in STIX. As I alluded to in the prioritization e-mail, it’s getting to the point where we need to decide on an approach or we’re at risk of not making the July release date and having to postpone until Winter. As I see it, we have a couple options.


                        1.       We can decide on a general approach and try to prove that it will work for MVP. Ideally, it would be a fairly minimalist approach so that we can be confident in the flows.

                        a.       Along those lines, I wrote up some normative text on an approach we discussed on Slack. Translations are very minimal objects (not standard TLOs) and refer to other TLOs to translate their titles and descriptions. It’s here:
                        https://docs.google.com/document/d/1wiG6RoNEFaE2lrblfgjpu3RTAJZOK2q0b5OxXCaCV14/edit#heading=h.aq3spklsm9m6

                        b.       If we think that approach is close enough to agree on by MVP we can continue to evolve that.

                        c.       If you have a different approach that you think we can agree on, please write up some normative text and submit it to the full list.

                        2.       Alternatively, we can implement something super minimalist now and delay until winter (6 months) to make sure we get this right

                        a.       IMO if we add a “lang” property to all TLOs we can provide some immediate capability and build on it in the winter.

                        My preference at this point is #2a. Let’s just add a “lang” tag to TLO common properties, put the discussion on hold while we finish MVP, and then resume in August. Then we can spend the fall making sure we get it right. At the same time, we enable an ecosystem where TLOs are in specific languages and so people can innovate and try out different approaches. That said, if people think #1 is close, I’m happy to continue trying to push that forward.

                        What do you think?

                        John





--

Dave Cridland

+448454681066
dave.cridland@surevine.com
dave.cridland.surevine

Surevine

Participate | Collaborate | Innovate

Surevine Limited, registered in England and Wales with number 06726289. Mailing Address : PO Box 1136, Guildford GU1 9ND
If you think you have received this message in error, please notify us.


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]