[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: notes on RDF profiling
Re: the business of RDF profiling, I'm just posting some issues to consider when we get to it. Basically, it's sort of the thinking that drove the design of my demo schema. 1) triples What is fundamental to RDF is the triple model, and the use of URIs to identify things (the elements of the triples). What that core provides is a really simple, but really powerful, model to mix and merge data. E.g. it's the answer to the requirement for a metadata system that is robustly extensible, and which allows data to be articulated across the boundaries of functionality (a document description to reference a contact record, for example). At the most basic, then, all resource statements have a value of a literal or of a URI, which then references another resource. The first then might look like: urn:x:1 --> x:title --> "A title" ... and the second: urn:x:1 --> x:author --> urn:person:y ... where the last urn is an identifier for a person record, which in turn might be: urn:person:y --> x:name --> "Jane Doe" If you're thinking of a relational database, the first example is like a column in a "documents" table, while the second like a foreign key reference to a row in a "people" table. If you're thinking about objects, the first is a simple string attribute, while the second a reference to another object. Any RDF profile really must support this basic notion. Hell, even if it was a non-RDF metadata encoding, it should support it! XMP, FWIW, does not really support the model, since it does not support values (objects) that are URIs. All properties are either literals or blank nodes ... 2) blank nodes Above the person record is identified by URI and thus separately described. This is a powerful approach because it means that person description becomes a node in the graph, where other descriptions can also link to it. It's good, normalized, data design. But practically speaking, one may not always want to identify something as a discrete -- linkable -- resource like this. This is what blank nodes do; allow you to have nested anonymous resource descriptions. I include them in my demo schema, and XMP also includes them. RDF/XML also has a short-hand syntax for this (the rdf:parseType="Resource" attribute), which XMP also supports. 3) properties as elements or attributes? The single biggest problem with RDF/XML for XML tools is that it makes no distinction between elements and attributes. Properties can be encoded in either way. This gives XML tools problems. There's a rational history behind this (they wanted to allow embedding of properties within XHTML IIRC), but I don't think it makes any sense to allow attributes in ODF for this. It buys us nothing really, and adds significant complexity from an XML perspective. Adobe, for some reason, allows both in XMP. 4) types To plug RDF data into ontology-based systems, you type resources. You create a class like "meta:Document" and, if you like, write a little RDF schema fragment that gives further information about it ("it's a class that is like this other class, it has x, y, z natural language descriptions, etc."). Types in RDF can either be assigned by replacing the rdf:Description wrapper with another term (like "meta:Document" above), or by using an rdf:type element with a URI rdf:resource attribute. I personally think allowing typing makes sense, but there's one potential impedance mismatch with traditional OO programming languages one needs to take into account, which is that RDF descriptions can have many types. Also, because typing can be indicated in two ways, you need to account for this in XML tools. In my bibliographic demo, I got addressed this by using a more generic typed node "bib:Reference" and then indicated subclass using a dc:type property. So from an RDF standpoint, there is effectively one type. 5) reification To be honest, I don't much understand this, which tells me it's a problem we really don't want to deal with. In general, it's the ability to make statements about statements, and it is a) seldom used in practice (or so I understand), and b) requires some syntactic gymnastics to support. There are a number of RDF experts who think now that reification was a mistake; adding too much complexity for too little gain. If I understand right, XMP DOES support reification (described as "property qualifiers" in the XMP spec, p17). I don't think we should. 6) containers and collections Standard rdf containers are Seq, Alt and Bag. These are just ways to wrap properties. They're also one of the reason non-RDF people scream about the syntax. Most RDF experts I've talked to think these are also problematic. Indeed, in recent discussions about a so-called "RDF Lite" profile, most RDFers agree they could forego these structures, because the basic triple model (and maybe typing) achieves the same thing in practice. One of the controversial things about XMP is not so much that it allows these, but that it *requires* them for any duplicate properties. I think for us it might make sense to allow them, but not in any to encourage their use. E.g. I think the generic profile validator I wrote probably would say these are valid, but I don't think our documentation should talk about them. Collections are ways to wrap multiple resources. From a modeling perspective, it can be useful. One of the practical problems with both the containers and the collections is that right now, for example, SPARQL (the new RDF query language from the W3C) does not support them. That will come later, but it'll probably be a couple of years. There's one problem that we at the bibliographic would definitely need to get around, though, which is that author lists and such are ordered, while the RDF model is not. This would take some thought about how best to handle, but one suggestion that I'm liking from Ian Davis is that one allow a position or order property (which is how you'd do it in a relational database). So here we're left with pragmatic questions about what specific details we ought to support to achieve our goals. If we assume RDF tools, there's no real need to worry about this. But if we assume non-RDF tools, then we have to recognize that each feature we support adds corresponding complexity (to the spec, to the RELAX NG schema(s), to processing). Of course, we still have to clarify what your "goals" are ;-) Bruce
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]