Re: [xdi] Exciting development: XDI graph model documented

Subject: Re: [xdi] Exciting development: XDI graph model documented - please indicate your syntax preferences!

This thread is getting very deep, but one more round of answers inline (it's a very good conversation, as it's helping form the agenda for tomorrow's very important TC call).

On Thu, Apr 11, 2013 at 6:24 PM, Joseph Boyle <boyle.joseph@gmail.com> wrote:

On Apr 11, 2013, at 5:08 PM, Drummond Reed <drummond.reed@xdi.org> wrote:
On Thu, Apr 11, 2013 at 2:45 PM, Joseph Boyle <boyle.joseph@gmail.com> wrote:
On Apr 11, 2013, at 2:04 PM, Markus Sabadello <markus.sabadello@xdi.org> wrote:
On Thu, Apr 11, 2013 at 9:50 PM, Joseph Boyle <planetwork@josephboyle.net> wrote:
1. If values now have JSON semantics, and XDI collections are now always ordered, why not allow specification of the two-line street address as a JSON array? The following looks like a legal statement even under current syntax.
=markus+address!4567+&street#/#/
   ["123 Main St","Apt 23"]
In the latest iteration of the model there's no such thing as collections, only classes, and they can have both ordered and unordered instances.
In the unordered case are the instances individually tagged? If so the collection would correspond to a JSON object.
Yes. Which is why ever since your suggestion about looking at more intuitive JSON serializations last week, I've been playing with a simple concept: model every XDI graph node as a JSON object. All nodes except a literal node would look like this (where <xxx> is a placeholder):

{
"<context-symbols>": "context-id",
"contexts": [
{ <nested-json-object-for-each-context> }

]
"relations": {
"<predicate>": "<object>",
"<predicate>": "<object>"

}
}

A literal node would look like this (if # was the symbol for a literal node):

{
"#": {

"#": <value>
}
}
Looks a lot like https://github.com/peacekeeper/xdi2/blob/master/core/src/main/java/xdi2/core/impl/BasicContextNode.java :)

Cool. Doesn't surprise me; seems like an obvious fit.

To add color to Markus' first answer, as soon as you move XDI addressing into a literal, you lose a primary feature of the XDI graph model, which is that any entity or attribute can have subentities or sub attributes.

I need to understand entities vs. attributes better to understand why this is the case.

Just to be clear, I don't think this addressability distinction doesn't have to do with entities or attributes. Both can contain other entities or attributes. It has to do with describability. For example, in XML, an element can contain a subelement to describe the parent element. And subelements can contain other subelements, etc. So XML elements are describable.

However in XML, attributes cannot contain other attributes. For example, if have an person element with a weight attribute, you can't put a timestamp attribute on the weight attribute.

So in the case of an address, if you need to be able to separately describe address line 1, address line 2, and address line 3 -- for example, to say that only address line 1 is required, or that address line 3 includes a special requirement - then you need for each address line to be separately addressable in XDI and not just JSON. (If you didn't need that describability, you could just specify in the dictionary that &+address takes a JSON array as its value, and then specify the array in JSON schema.)

Secondly, if an XDI literal value is a JSON array, there's already a way to address into the array using JSON. That's why I think it should be fine to allow both JSON objects and JSON arrays as XDI lteral values. In the JSON serialization, it's unambiguous what the JSON data type of an XDI liiteral is, so you "cross over" from XDI addressing to JSON Path addressing when you move into a literal value. But we need to keep the XDI and JSON Path addressing spaces separate because XDI addressing has different properties than JSON addressing.

I need to understand which properties are incompatible.

See above.

I was just thinking of the . and [ ] accessors in _javascript_ and the equivalents in other languages, which programmers are used to using constantly .
I had to look up JSONPath, which uses . and [ ] similarly, but has more complex features, many of which look like they would be useful in more general queries than a specific address.

I think far fewer programmers are going to be familiar with JSONPath.
It looks like there is also a generalization of JSONPath called JSONQuery.

2. If retrieving =markus+address!4567+&street# can return a JSON array or object, or a JSON single value depending on whether you previously set it to an array / set one or more of its array members, vs setting it to simply a single value, then we don't need a singleton marker to specify singleton vs. collection; in fact it would simply lead to an error if mismatch.

I think I would argue for not allowing JSON arrays or objects as literal values, only strings, numbers, boolean.

I disagree - I think we definitely want to allow JSON arrays and objects as XDI literal values, for several reasons including what I mentioned above. But the question Joseph asked is a non-sequitor. The XDI dictionary definition for &+street (I simply can't force myself to write it the other way around) will specify the datatype of a literal value. It's either a string or an array (or possibly an object). Whatever it is, that's what you will get back.

In programming terms this means XDI data would be "typed" or "strongly typed", like in Java and C/C++, and unlike _javascript_, Python, Perl, Ruby, etc. This surprises me and doesn't seem like a natural fit for JSON.

It may not be a natural fit for JSON but it seems like a hard requirement for a semantic data interchange format. I'd put it this way: even though we want the JSON serialization of XDI graphs to be very friendly to developers, and we want the XDI graph model to leverage the very simple and elegant JSON data model, XDI does not equal JSON (if it did, we wouldn't need XDI).

3. Also, while [1] is very recognizable syntax for array indexing, if we only allow integer constant indexes and not computed expressions, we shouldn't have to use a paired delimiter, which should be reserved for cases where we need to specify the end of _expression_ that would otherwise be ambiguous. # is a nonpaired delimiter that would also be a recognizable choice to precede an integer index into a collection.

I could imagine using a nonpaired delimiter
Me too. In fact there's a pretty good argument that only roots and variables should use paired delimiters because they are the only ones that allow grouping of subgraph identifiers.

In which case if we use # for that, we need to pick a different symbol for literals. (It never ends ;-)

Single quote is unobtrusive and easy to type though it's usually used as paired. Vertical bar seems intuitive to me. Colon also seems intuitive.

Good suggestions. I particularly like colon as it's exactly what's used by JSON.

4. I still do not get why the entity vs. attribute distinction is needed and especially why it has to be explicitly specified in every XDI address.

For dictionary purposes, but I'd defer to Drummond for giving a more elaborate answer to this question :)

We're going to get this question 1000 more times, so I've decided we need a wiki page devoted entirely to documenting the design decisions for the XDI graph model.

Wrt this question, here's the bullet points:
As Markus alludes, in an XDI dictionary, entities and attributes have different rules. Attributes can have literal nodes. Entities cannot. This is similar (if not identical to) the distinction between roots and subgraphs. Roots can be the context for other roots (peer roots, inner roots, statement roots). Subgraphs cannot. (It's also akin to the rule about classes and instances. Classes can have instances; singletons cannot.)

The same concept identifier (e.g., +email, $uri) can serve as both an entity (in some contexts) and an attribute (in other contexts). Example: =markus&+email (attribute). +email+server (entity). =drummond+website|$uri (attribute). @ietf$uri+spec (entity). Without the ability to specify whether a node is serving as an attribute or entity in a particular context, it is semantically ambiguous.

The pattern I'm seeing so far is that when found at the end of an address, it's an attribute, but when not the last component in the address, it's an entity.

That doesn't work for attributes of attributes. To use the example of a person's weight attribute having a timestamp attribute, and using | for singleton and > for attribute:

=drummond|>+weight|>+timestamp:/:/""2010-09-20T10:11:12Z"

IMHO the power of attributes being able to have attributes, and to have all of them addressable, is one of the most important features of XDI as a semantic data interchange model.

Also are you omitting a particular character at the end to signal it's an attribute, or is there really not one there in the attribute examples?

If you are talking about the example below (from the https://wiki.oasis-open.org/xdi/XdiSyntaxExamples page), then I should clarify: the # character at the end is not a signal that what precedes it is an attribute. The # character is what represents the graph node that has the literal node containing the actual value. This node -- the "literal value node" - can only FOLLOW an attribute (because only attributes can have values). But this character MUST NOT follow an attribute if:

The attribute does not have a value (i.e., it is null). This is why the JSON serialization of XDI does not need to use the JSON null data type.
The attribute contains an entity.
The attribute contains another attribute.

=markus|&+email/$ref/
   =markus&+email!9876
=markus+home|&+email/$ref/
   =markus&+email!5432
=markus+work|&+email/$ref/
   =markus&+email!9876
=markus&+email!5432#/#/
   "ms@example.com"
=markus&+email!9876#/#/
   "markus@example.net"
=markus+address!4567&+street[1]#/#/
   "123 Main St"
=markus+address!4567&+street[2]#/#/
   "Apt 23"
=markus+address!4567|&+city#/#/
   "Vienna"
=markus+address!4567|&+country#/#/
   "Austria"
=markus+address!4567|&$t#/#/
   "2013-04-11T10:11:12Z"

=Drummond

xdi message