Astoria data sources and system layering

DB Blogs

September 27th, 20070 0

The question is: what data sources should Astoria support as inputs? Should we allow ADO.NET Entity Framework only or open up more? If we open up more, how does the system need to be layered to enable the use of those alternate data sources?

Let me step back for a second first and establish why certain data sources are interesting.

Why does Astoria build on top of the EDM and the Entity Framework?

Putting an HTTP head on top of some set of data is easy enough that it’s tempting to do it against any data source out there. The problem is that then if the way data is modeled and surfaced has not been thought through, inconsistencies and awkward aspects start to show up in the HTTP interface.

The Entity Data Model (EDM) was a very nice candidate. EDM explicitly models all data as entities, which can be nicely mapped to resources. So the definition of a resource is crisp and it comes from the underlying data model. Furthermore, the EDM defines associations between those entities, which can be surfaced quite naturally as hyperlinks. Finally, semantics around query and update for EDM data are clearly defined and can be mapped easily to HTTP verbs.

Entity Framework also provides an important contribution: mapping. Very rarely you want to surface your database data as-is to the web, either to an application or as a service for external applications. You’ll want to adjust names, reshape data, merge tables, etc. The Entity Framework mapping layer can help with data, and with the VS integration we’ll include in the Entity Framework for EDM design and mapping, you’ll be able to do this in a nice visual experience.

Clearly, the world of data sources does not end with the Entity Framework…

Strictly from the data model perspective we clearly need to pick a single one for consistency, and EDM has the right characteristics for our needs. Now, from the perspective of the actual data sources that we can handle, it would not be reasonable to assume that every data source out there will be plugged into the Entity Framework. There are many scenarios where you want to surface data coming from other sources and you still want to use the Astoria URI and payload formats, along with the client libraries and tools that will be available for it.

So we want to support Entity Framework *and* other data sources. Now the trick is how to do this without basically writing Astoria twice :-), and maintain a sound, consistent data- and interaction-model.

What do we need from a data source?

Astoria has a few specific needs when interacting with a data source:

Surface data as reasonable units we can expose as resources. For example, we could say that each CLR object is a resource
Addressability: each resource (say CLR object) needs to be addressable. For us to create an address we need to be able to figure out what members are the “keys” on that resource
Query composition: Astoria URIs are a simple form of queries. We need to be able to formulate and execute queries against the data source without knowing the specifics of the target data source. We also need to be able to “compose” queries; e.g. to take a given query and say add sort order to it
Update: we need to be able to pull a bunch of resources (say, again, CLR objects), make some modifications, and then push the changes back into the data source.

Proposed approach

Trying to avoid inventing new stuff, we could try to tackle this problem using the technology that was introduced as part of LINQ. The IQueryable interface provides a mechanism by which you can build a query by applying operators to input IQueryable objects, obtaining a new IQueryable object that includes the applied operator (there is plenty of blog entries and such on IQueryable on the web, so I won’t elaborate on it here; Matt Warren’s series on IQueryable provide a great detailed reference for this.).

In this approach we can say that your data service takes as input any class that has a set of public properties of type IQueryable<T>. For example:

public class MyDataService : WebDataService<MyDataSource>

{

}

Where MyDataSource could be something like:

public class MyDataSource

{

public IQueryable<SomeType> SomeThings {…}

public IQueryable<OtherType> OtherThings {…}

}

public class SomeType {

public int ID { get; set; }

public string SomeData { get; set; }

public ICollection<OtherThings> RelatedThings { get; }

}

public class OtherType { … }

That would result in a service with two top-level entity containers, /SomeThings and /OtherThings.

Translation to LINQ expression trees is for the most part fairly straightforward. Following the example, let’s say that you now request the URI /SomeThings!1, we would translate it to roughly:

myDataSourceInstance.SomeThings.Where(i => i.ID == 1)

(at least conceptually, we would build the expression trees directly and not go through the language form, of course).

A more elaborate example would be association traversal, for example /SomeThings!1/RelatedThings; the translation for that would be:

myDataSourceInstance.SomeThings.Where(i => i.ID == 1).SelectMany(i => i.RelatedThings)

So we are basically saying that the Astoria server runtime has 2 halves. The top-half is the Astoria runtime itself; this part is “fixed”, and it implements URI translation, the XML/JSON/etc wire formats, the interaction protocol, etc. It’s basically what it makes an Astoria service look like an Astoria service. The bottom half is the data-access layer and it’s pluggable. Communication between layers happens in terms of the IQueryable interface plus a set of conventions to map CLR graphs into the URI/payload patterns of Astoria.

Other options?

An alternate approach would be to come up with a new interface. Astoria needs only a few operations to be supported by data sources, and the IQueryable interface is certainly overkill for the job. However, the fact that IQueryable already exists and there will be various implementations for it has a lot of value itself…

Net result

The net result that we want, through IQueryable or some other alternative, is that we can surface various data sources through Astoria, so that clients can interact with data services across the way using a single, uniform mechanism. The exposed HTTP interface is still EDM-ish even when the source is not the Entity Framework, and the metadata is still expressed in EDM terms for consistency. That helps maintain the simple, straightforward semantics of the Astoria HTTP interface.

Now you can bring any data source you want and expose it through Astoria, provided that you have or can write an IQueryable implementation for it. Not only we’ll automatically hook it up in the Astoria pipeline, but also we’d push-down all the filters, sorting, and other operators, so the data source can efficiently implement them.

Querying over the Entity Framework

It turns out that we build first-class LINQ support into the Entity Framework, we call the thing LINQ to Entities. That means that while we have a different code-path for metadata, the query composition and execution code paths, as well as all of the serialization and other details, is common code across Entity Framework and any other LINQ implementation plugged into Astoria.

What’s missing

Metadata: one thing I did not discuss here is how we turn CLR object graphs obtained by executing IQueryable objects into something that works well with the Astoria HTTP interface. There is a set of conventions that we’ll use, and a few attributes to override conventions when they don’t work for you. In a future post we’ll discuss those in detail.

Update: while this model enables us to support querying over arbitrary .NET classes that expose containers of instances as IQueryable objects, it does not say how to update stuff. Discussing the update model will take a whole post (or several, most likely), but the short story is that we’ll define an interface, something like IUpdatable or whatever names works, that has the basic operations we need to perform in order to handle updates. The interface would have primitive operations for adding a new resource, remove an existing resource, applying modifications to resources and also handle linking/unlinking of resources.

Pablo Castro
Technical Lead
Microsoft Corporation
http://blogs.msdn.com/pablo

This post is part of the transparent design exercise in the Astoria Team. To understand how it works and how your feedback will be used please look at this post.

Astoria data sources and system layering

DB Blogs

Read next

0 comments