URI Format – Part 1 – Addressing resources using URI path segments
Deciding on something that becomes a public interface of a developer-oriented technology is a tricky task. Not only does the resulting design need to be correct and complete, but also there are various aspects that are more around aesthetics and personal preference. The URI format used by Astoria will need to survive both sets of challenges…
The Astoria REST “protocol” is made up of a URI addressing scheme, HTTP-based interaction model and payload formats (Web3S/XML,JSON , ATOM/APP). In the interest of staying focused on the URI format, this write-up will only touch on the URI format used by Astoria and leave discussion of the interaction model to a future post. See this post for a discussion around payload formats used by Astoria.
In general, Astoria takes a conceptual model expressed in terms of entities in an EDM schema and surfaces data that follows that model over an HTTP interface, representing entities as resources and associations between entities as links. The URI interface needs to provide a rich yet simple way of addressing those resources.
The URI format in Astoria has a few specific goals:
a) Provide a mechanism to point to every resource or member of a resource in the system. That is, every piece of data is addressable, and the URI used to address it needs to be derivable from the service metadata which describes the conceptual model of the system
b) Allow for simple queries to be formulated. That is, instead of pointing to a particular resource, allow URIs that express filtered sets of resources satisfying certain criteria
c) Support manipulating the presentation of results. This includes things such as sorting resources, paging over them and expanding related resources.
This “part I” write up focuses on item a) above; pointing to resources and their members. We will discuss b) and c) in future blog posts
NOTE: the following descriptions use EDM terminology and constructs. Regardless of the underlying data access layer (Entity Framework, Custom LINQ provider, etc) an Astoria service is exposing, the service is described using an EDM schema, so this description applies equally to any data source. In addition, typical REST verbiage (as is done above) refers to items pointed to by URIs as resources. In the remainder of this write up the term ‘entity’ should be interpreted as a synonym for ‘resource’.
Starting from the root
At the root of the service we are thinking of carrying the behavior of the CTP forward and putting all of the resource sets, which are simply the list of entity sets we find in the EDM schema. These are addressed by name, separated by a forward-slash (“/”) from the service root URI. (e.g. …/northwind.svc/Customers, where “Customers” is a resource container).
A detail: In an EDM schema an entity set is contained within a single entity container and there may be multiple entity containers in the schema. If that’s the case, to access non-default containers, the names need to be container-qualified (e.g. “/NorthwindContainer.Customers”). The default one *cannot* be qualified, to avoid introducing a redundant way of getting to it.
Pointing to a particular entity
Every entity in an EDM schema has a key which consists of one or more of the properties in the entity. An entity key is unique within the containing entity set, so to identify an entity with a URI we need to include at least the entity-set and the key values.
–Location of keys
The key value could go before or after the question mark. That is, the URI could be built by adding a query parameter after the URI question mark as in:
Or we could consider the entity-set-plus-value construct part of the URI namespace of the service and write it as
We prefer the second approach with the entity-sets and keys form a URI namespace and there is no query parameter required. One of the reasons for leaning towards the second approach is that it makes it explicit which entity set the key is associated with, especially when the URI path becomes quite long.
–a bit of syntax
Now, assuming we go with that approach, there is now the question of the syntax. The May 2007 CTP used values in square-brackets (e.g. “…/Customers[ALFKI]”). We got “generous” feedback saying that square-brackets were a bad choice.
The approach we are currently thinking to take is to attach the key directly after the entity set name and using the ‘!’ character as the separator (e.g. …/Customers!23 ). That said, as per our last “design” posting on formats, we are looking to support the Web3S format. Web3S has a more flexible data model in that it allows heterogeneous sets while an Astoria server supports homogenous sets. To enable interoperability between any servers implementing Web3S and an Astoria server a URI scheme flexible enough to address heterogeneous sets is required. Therefore, we are thinking to expand on our current approach and allow a “full” form of URI and a “compressed” form, where the full form supports heterogeneous sets and the compressed form can be used as a shorthand notation when the set being addressed is on an Astoria server and is thus homogenous. For example:
“Full” form: …/Customers/Customer(123) would identify the instance 123 of type Customer within the Customers set.
“Compressed” form: …/Customers!123 would identify the same resource as above in Astoria because the ‘Customer’ type is implied since Astoria Servers support homogenous sets.
One option would be to encode name/value pairs, but that would result in verbose URIs and extra syntax to be invented. We are leaning toward a simple approach: we only use the values, separated by semicolon. The values are listed in the same order as they appear in the metadata document which describes the service. Metadata will be the topic of a future post, however, for now it’s enough to say in the typical case the description of a service will be available by making a GET request to …/$metadata. The following is an example of a URI which contains a composite key:
Some folks love this, some hate it. The main concern from folks who hate it is readability: you cannot interpret the URI without the schema. Is that an issue? The alternate option of using name-value pairs is more explicit in this sense. We could have:
The single-key case would still not require the name, and given that most cases will be single-key cases, compactness won’t suffer too much. An aspect of this that’s both good and bad is the fact that by using names you can specify the values in different order. That’s “handy”, but it means that these URIs are not useful for comparison as strings when trying to determine identity.
— literal forms
Using just literal values in a composite key doesn’t really work, because now you cannot tell whether that’s a 2-element key or a 1 element key that happens to have a comma in it. So we need to use proper literal forms for the values. We will need that when we want to express query expressions such as filter predicates anyway, so we may as well be consistent and use a single literal form everywhere.
· Strings: a string surrounded by single-quotes, (e.g. ‘ALFKI’)
· Numbers: just the number, using US style (dot separates decimal digits)
· Dates: quoted as strings. Inside the string, use format described in RFC3339
· Guids: use the form “dddddddd-dddd-dddd-dddd-dddddddddddd”
· Binary: “0x” followed by two hex digits per byte (e.g. 0x1AB4)
So the examples above would actually be, for single- and composite-key respectively:
Probably is a good idea to not allow spaces, as it would help making sure that URIs that mean the same thing are easily comparable.
To address a member of an entity, simply append the member to a URI that points to the entity, separated by a forward-slash. For example, if Customers have a CompanyName property:
Note that addressing a member like that would return the member appropriately wrapped to conform to the negotiated MIME type. For example, if using XML you’d get the value wrapped in an XML element and annotated with the required namespaces and such.
If you want just the value with no wrappers, you can use the /$value “magic member”. For example:
For a string, this would just return the string (text/plain) by default (same applies to all types but binary, which would return application/octet-stream). The developer can customize this by annotating the schema and indicating which MIME type a given value should be treated as. That would allow for example a text field to store HTML or a binary field to store an image, and HTTP responses would include the proper MIME type for them.
A special form of member access is when the member being accessed is actually a navigation property (a link in non EDM terms). Such a property can be considered a hard-link that resolves into the related entity (for associations that have a cardinality of 1 on the other end) or a set of entities (if the other end is “many”).
The syntax is the same as in regular members, independent of the cardinality of the other end. So, if a customer has sales orders, the URI to access the orders would be:
Keep drilling down
When the result of a given URI is a single object, the members of those objects can be accessed by adding the member name to the URI. For example, if a customer has a “Contact” navigation property that points to a single Person object, which has a Name property, the name can be retrieved directly by using this URI:
For the case where traversing an association yields a set, you need to further scope the set by providing a key to point to a single element of the set in order to traverse further using the URI path. For example if a customer has a set of orders and each order has an order date property, one valid URI to access an order date would be:
A note on escaping
Quite a bit of escaping beyond basic URI encoding is necessary for the whole scheme to work. Things like “=” and “?” need to be carefully handled to not confuse URI translators and agents. Details go beyond this write up, but specific thoughts are welcome. Although the trickiest one deserves to be brought up: should we escape “/” inside a quoted string? The same question, asked more deeply, is “should we assume that consumers of Astoria URIs understand their syntax?”.
In general, we are leaning towards requiring characters of special meaning in a URI path (ex. ‘/’) to be escaped even when such a character is within a quoted string. If the character is not escaped we will treat it as per its predefined meaning in path segments in RFC 3986. We believe this would provide a consistent method for developers to craft and interpret URIs.
The ideas presented above represent our current thinking in the space. As always, feedback and comments are most welcome. We look forward to hearing your thoughts. In follow up posts we will discuss the query string section of the URI and dig into addressing service operations.
This post is part of the transparent design exercise in the Astoria Team. To understand how it works and how your feedback will be used please look at this post.