June 4th, 2024

A beginner’s guide to mapping arrays in EF Core 8

Entity Framework Core blog posts can get difficult fast! In this post, we’re going to try to keep it basic, while still imparting more than just trivial information. EF Core 8 has been out for six months now, and one of its great new features is mapping for arrays. Let’s jump in!

TIP All the code shown here can be downloaded from GitHub. To get started with EF Core, see Installing Entity Framework Core.

Imagine you want to save instances of the following Post type to a relational database:

public class Post
{
    public int Id { get; set; }
    public string Title { get; set; }
    public string Contents { get; set; }
    public string[] Tags { get; set; }
    public DateTime[] Visits { get; set; }
}

Should be pretty simple, right? Well, that’s certainly true for the Id, Title, and Contents properties. These are simple integer and string properties, so we can just create a table with appropriate integer and string column types. For example, when using Azure SQL, EF Core 8 maps these columns by default as:

CREATE TABLE [Posts] (
    [Id] int NOT NULL IDENTITY,
    [Title] nvarchar(max) NOT NULL,
    [Contents] nvarchar(max) NOT NULL,
    [Tags] ??? NOT NULL,
    [Visits] ??? NOT NULL,
    CONSTRAINT [PK_Posts] PRIMARY KEY ([Id])
);

But what about Tags and Visits? These are both array properties, and most relational databases don’t support array types. So what does EF Core 8 do? Well, for databases that do support array types, like PostreSQL, EF Core 8 uses these types directly. But for most databases that don’t support array types natively, EF Core 8 automatically uses JSON arrays instead. This is essentially transparent to the application developer using EF Core 8; you just write your LINQ query, and EF Core 8 uses the most appropriate translation for the database you are targeting.

PostgreSQL array columns

If our database does support array types, then EF Core 8 will use them automatically. For example, on PostgreSQL the full mapping for our Posts table is:

CREATE TABLE "Posts" (
    "Id" integer GENERATED BY DEFAULT AS IDENTITY,
    "Title" text NOT NULL,
    "Contents" text NOT NULL,
    "Tags" text[] NOT NULL,
    "Visits" timestamp with time zone[] NOT NULL,
    CONSTRAINT "PK_Posts" PRIMARY KEY ("Id")
);

Notice how the Title and Contents columns have the type text, indicating that each row contains a single “title” value or a single “contents” value. On the other hand, the Tags column is typed as text[], indicating that each row contains an array of zero to many “tag” values. Here’s some sample data from the Posts table:

Id Title Contents Tags Visits
1 Arrays in EF Core 8 Imagine you want… {EF Core,Entity Framework,.NET,Databases} {2024-05-13 12:41:36.957711,2024-05-12 12:41:36.957714}
2 What’s new in Orleans 8 Let’s take a look at … {Orleans,.NET} {2024-05-14 12:41:36.957779}
3 .NET at Build Get ready for a {.NET,ASP.NET Core} {2024-05-12 12:41:36.957780}

Notice how the array columns can contain multiple values per row.

EF Core 8 will then use these array columns in query translation. For example, here is a LINQ query to pull the first two tags out of the Tags array column:

var postTags = await context.Posts
    .Select(post => new
    {
        PostTitle = post.Title,
        FirstTag = post.Tags[0],
        SecondTag = post.Tags[1]
    }).ToListAsync();

EF Core 8 translates this LINQ query into the following SQL when using PostgreSQL:

SELECT p."Title" AS "PostTitle",
       p."Tags"[1] AS "FirstTag",
       p."Tags"[2] AS "SecondTag"
FROM "Posts" AS p

Notice how p."Tags"[1] and p."Tags"[2] index into the array to extract the first two items.

Another common LINQ query is to find all Post instances who’s Tags property contains a given tag value:

var tag = "EF Core";
var posts = await context.Posts
    .Where(post => post.Tags.Contains(tag))
    .ToListAsync();

When using PostgreSQL array columns, EF Core 8 translates this as:

SELECT p."Id", p."Contents", p."Tags", p."Title", p."Visits"
FROM "Posts" AS p
WHERE p."Tags" @> ARRAY[@__tag_0]::text[]

PostgreSQL has some unusual syntax, so don’t worry if you don’t fully follow the SQL here. The point is that WHERE p."Tags" @> ARRAY[@__tag_0]::text[] filters by looking in the Tags array for the tag parameter value passed.

JSON arrays

The above is great for PostgreSQL users, but what about the rest of us using Azure SQL, SQLite, or one of the other database systems that doesn’t natively support arrays? In these cases, EF Core 8 automatically uses a JSON array instead. This means on Azure SQL, the Post type shown above maps to the following table:

CREATE TABLE [Posts] (
     [Id] int NOT NULL IDENTITY,
     [Title] nvarchar(max) NOT NULL,
     [Contents] nvarchar(max) NOT NULL,
     [Tags] nvarchar(max) NOT NULL,
     [Visits] nvarchar(max) NOT NULL,
     CONSTRAINT [PK_Posts] PRIMARY KEY ([Id])
);

The Tags and Visits column are simple string columns as far as Azure SQL is aware. However, EF Core 8 knows more. EF Core 8 knows that these columns actually contain JSON arrays, and so EF Core 8 can translate queries that make use of this knowledge. For example, here is the same LINQ query from above that will pull the first two values from the Tags column:

var postTags = await context.Posts
    .Select(post => new
    {
        PostTitle = post.Title,
        FirstTag = post.Tags[0],
        SecondTag = post.Tags[1]
    }).ToListAsync();

On Azure SQL, EF Core 8 translates this LINQ query into the following:

SELECT [p].[Title] AS [PostTitle],
       JSON_VALUE([p].[Tags], '$[0]') AS [FirstTag],
       JSON_VALUE([p].[Tags], '$[1]') AS [SecondTag]
FROM [Posts] AS [p]

This is very similar to the PostgreSQL translation. The difference is that the JSON_VALUE function is first used to tell SQL that Tags is a JSON document. The tag values in the first and second positions of the JSON array are then selected using $[0] and $[1].

Looking at the second LINQ query from above:

var tag = "EF Core";
var posts = await context.Posts
    .Where(post => post.Tags.Contains(tag))
    .ToListAsync();

This query translates to the following when using Azure SQL:

SELECT [p].[Id], [p].[Contents], [p].[Tags], [p].[Title], [p].[Visits]
FROM [Posts] AS [p]
WHERE @__tag_0 IN (
    SELECT [t].[value]
    FROM OPENJSON([p].[Tags]) WITH ([value] nvarchar(max) '$') AS [t]
)

In this case the query uses OPENJSON which converts the JSON array in the Tags column into a kind of temporary table. WHERE @__tag_0 IN is then used on that temporary table to find the rows that contain the given tag parameter value.

Typed arrays

All the examples so far have used arrays of strings. However, EF Core 8 can handle arrays of any simple type, including numbers, dates/times, GUIDs, etc. For example, the Visits property in the Post class above is an array of DateTime instances. With PostgreSQL, this property is mapped to timestamp with time zone[] column. It’s clear here then that this is an array of timestamps, and not an array of some other type. EF Core 8 uses this information to perform operations specific to the timestamp type on the values of the array. For example, consider this LINQ query which returns all the posts that were visited in a give year:

var year = DateTime.UtcNow.Year;
var visited = await context.Posts
    .Where(post => post.Visits.Any(v => v.Year == year))
    .ToListAsync();

On PostreSQL, this translates to:

SELECT p."Id", p."Contents", p."Tags", p."Title", p."Visits"
FROM "Posts" AS p
WHERE EXISTS (
    SELECT 1
    FROM unnest(p."Visits") AS v(value)
    WHERE date_part('year', v.value AT TIME ZONE 'UTC')::int = @__year_0)

Notice how the date_part function is used to extract the year from each timestamp. This only works because the array is known to contain timestamp values.

In Azure SQL, the type cannot be included in the column definition, which is just a string: [Visits] nvarchar(max) NOT NULL. However, EF Core 8 knows that this is actually a JSON column of timestamps and can use this information appropriately for query translation:

SELECT [p].[Id], [p].[Contents], [p].[Tags], [p].[Title], [p].[Visits]
FROM [Posts] AS [p]
WHERE EXISTS (
    SELECT 1
    FROM OPENJSON([p].[Visits]) WITH ([value] datetime2 '$') AS [v]
    WHERE DATEPART(year, [v].[value]) = @__year_0)

Notice the WITH ([value] datetime2 '$') used in the OPENJSON statement. This again tells Azure SQL to create a temporary table, this time containing a datetime2 column. EF Core 8 can then write queries that operate on those datetime2 values, such as the use of the DATEPART function to exact the year. This is only possible because EF Core 8 understands what is stored in the JSON column.

Other uses of arrays

EF Core 8 allows an array of a simple type to be used in most places where a non-array simple type can be used. We have already seen how properties can be mapped to array columns. Another example is passing many values in a single parameter by passing those values as an array. For example, imagine we want to return all posts whose title is prefixed by one of several strings. Here’s a LINQ query to do this:

var prefixes = new[] { "What's new", "Getting started", "Intro to" };
await context.Posts
    .Where(post => prefixes.Any(prefix => post.Title.StartsWith(prefix)))
    .ToListAsync();

Looking at the EF Core 8 logging for this query when using PostgreSQL, we can see that the prefixes array is passed as a single PostgreSQL array parameter:

info: 5/14/2024 14:34:20.970 RelationalEventId.CommandExecuted[20101] (Microsoft.EntityFrameworkCore.Database.Command)
      Executed DbCommand (1ms) [Parameters=[@__prefixes_0={ 'What's new', 'Getting started', 'Intro to' } (DbType = Object)], CommandType='Text', CommandTimeout='30']
      SELECT p."Id", p."Contents", p."Tags", p."Title", p."Visits"
      FROM "Posts" AS p
      WHERE EXISTS (
          SELECT 1
          FROM unnest(@__prefixes_0) AS p0(value)
          WHERE p0.value IS NOT NULL AND left(p."Title", length(p0.value)) = p0.value)

EF Core 8 then uses the PostgreSQL unnest function to apply the filter for all values passed in the array parameter.

Looking instead at the logging for Azure SQL, we see that the parameter is a JSON array inside a string parameter:

info: 5/14/2024 14:42:57.689 RelationalEventId.CommandExecuted[20101] (Microsoft.EntityFrameworkCore.Database.Command)
      Executed DbCommand (4ms) [Parameters=[@__prefixes_0='["What\u0027s new","Getting started","Intro to"]' (Size = 4000)], CommandType='Text', CommandTimeout='30']
      SELECT [p].[Id], [p].[Contents], [p].[Tags], [p].[Title], [p].[Visits]
      FROM [Posts] AS [p]
      WHERE EXISTS (
          SELECT 1
          FROM OPENJSON(@__prefixes_0) WITH ([value] nvarchar(max) '$') AS [p0]
          WHERE [p0].[value] IS NOT NULL AND LEFT([p].[Title], LEN([p0].[value])) = [p0].[value])

The SQL generated by EF Core 8 uses OPENJSON as before to create a temporary table, but this time on the parameter value.

Combining an array property with an array parameter allows translation of short but powerful LINQ queries. For example, this LINQ query returns all posts that have any of the given tag values:

var tags = new[] { ".NET", "ASP.NET Core" };
await context.Posts
    .Where(post => tags.Any(tag => post.Tags.Contains(tag)))
    .ToListAsync();

On PostgreSQL with native array types, this query is translated to:

SELECT p."Id", p."Contents", p."Tags", p."Title", p."Visits"
FROM "Posts" AS p
WHERE @__tags_0 && p."Tags"

On Azure SQL using JSON arrays, the SQL is:

SELECT [p].[Id], [p].[Contents], [p].[Tags], [p].[Title], [p].[Visits]
FROM [Posts] AS [p]
WHERE EXISTS (
    SELECT 1
    FROM OPENJSON(@__tags_0) WITH ([value] nvarchar(max) '$') AS [t]
    WHERE [t].[value] IN (
        SELECT [t0].[value]
        FROM OPENJSON([p].[Tags]) WITH ([value] nvarchar(max) '$') AS [t0]
    ))

In both cases, values are passed in an array parameter, and then these are used with values take from an array column.

Common questions

Here are some common questions people ask about array mapping.

  • Is performance going to be bad if I use JSON?
    • Maybe, but there are also cases where the performance is better than other techniques. Check out links in the “Learn more” section.
  • Can I use collection types other than arrays?
    • Yes, everything above works with any collection that implements IList<T>.
  • Can I map nested collections or dictionaries?
    • Not in EF Core 8, but this will be supported in a future release of EF Core.
  • Can the string column created for JSON be something other than nvarchar(max)
    • Yes, the column type can be fully configured. See the “What’s New” docs linked below.
  • Can arrays be used for navigations between entity types?
    • No, since the navigation implementation must be mutable.
  • What types can be used in arrays?
    • Any type that is natively supported by the database, as well as any type for which EF has a built-in value converter, or any type for which you define your own value converter.

Summary

EF Core 8 introduces support for mapping typed arrays of simple values to database columns. Native database array types are used if available, otherwise EF Core 8 uses a string column containing a JSON array. In either case, EF Core 8 understands the semantics of the mapping and so can execute queries that depend on the typed values in the array. Arrays of simple values can also be used in other places, such as to pass many values to EF Core in a single parameter.

Learn more

We have only touched the surface on the types of queries that can be translated using these patterns. Check out the following resources for more in-depth information:

To learn more about EF Core 8, see Entity Framework Core 8 (EF Core 8) is available today announcement post.

0 comments

Discussion are closed.