This blog is part two of a series of three blogs (read part one here) where we’ll demystify commonly confused concepts for developers learning how to query data using the SQL (Core) API in Azure Cosmos DB. We’ll look at the difference between null and undefined values.
The value null must be explicitly set for a property. For example, here is an item that has the creationDate property set to null:
{ "id": "AndersenFamily", "lastName": "Andersen", "address": { "state": "WA", "county": "King", "city": "Seattle" }, "creationDate": null }
A property with a null value must have it explicitly assigned. Properties not defined in an item have an undefined value. In the example item above, the property isRegistered has a value of undefined because it is omitted from the item.
Azure Cosmos DB supports two helpful type checking system functions for null and undefined properties, both of which use the index:
- IS_NULL– checks if a property value is null
- You can also, of course, simply check if a value is null with an equality filter (example: SELECT * FROM c WHERE c.creationDate = null)
- IS_DEFINED– checks if a property value is defined
In order for a filter expression to match an item, it must evaluate to true. In addition to false, all other values (including null or undefined) will exclude the item.
Here’s an example query:
SELECT * FROM c WHERE c.isRegistered = true
This example query will not include any items where isRegistered is null or undefined.
Understanding best practices
One of the biggest advantages of Azure Cosmos DB (and NoSQL in general) is simple support for unstructured data. If you have a Cosmos container that has items of varying structures or “schemas”, some items won’t contain the same properties as others. It is generally simpler and more efficient to simply not define a property than it is to both define a property and assign it a null value. The advantages of this approach are savings on storage size, ingestion costs, and simplified queries. A large number of null values in an item can be a signal that the data model could be further optimized.
The advantages of not explicitly defining null value properties is most obvious if you examine the RU charge of an item with a high number of null value properties.
For example, consider the two below items. Item 1 has eight extra properties (fieldA through fieldH), all of which have a null value. Item 2 has undefined value for these properties and does not need to manually define them. Therefore, Item 2 is smaller than Item 1 and has fewer properties. With a default indexing policy, writing Item 2 costs 15% fewer RUs!
Item 1: Write RU Charge = 9.9 | Item 2: Write RU Charge = 8.4 |
{ "id": "AndersenFamily1", "lastName": "Andersen", "orderId": "999", "fieldA": null, "fieldB": null, "fieldC": null, "fieldD": null, "fieldE": null, "fieldF": null, "fieldG": null, "fieldH": null, "address": { "state": "WA", "county": "King", "city": "seattle" }, "creationDate": 1431620472, "isRegistered": true } |
{ "id": "AndersenFamily2", "lastName": "Andersen", "orderId": "999", "address": { "state": "WA", "county": "King", "city": "seattle" }, "creationDate": 1431620472, "isRegistered": true } |
There is sometimes a slight advantage for using null values when querying. An equality filter (WHERE c.fieldA = null) is typically slightly fewer RUs than using a system function (WHERE IS_DEFINED(c.fieldA)). This difference should not be nearly as significant as the additional write RU charge when using null, though. Therefore, we generally recommend simply not defining a property rather than both defining it and assigning it a null value.
Differences between null and undefined in mathematical expressions
Mathematical expressions return undefined when any of the inputs are non-number values.
Here are some example queries that each return undefined:
SELECT 1 + “string”
SELECT 1 + null
SELECT null + “string”
Result (for all above examples):
[ {} ]
You can use the IS_NUMBER() system function to check whether a property value is a number and, if not, the ternary (?) operator to assign another value. In this case, if c.val is not a number, its value will be 0.
Here’s an example:
SELECT 1 + (IS_NUMBER(c.val) ? c.val : 0) FROM c
When executing SUM() or AVG() aggregate system functions, one single non-number value (including null) will lead to the entire function returning undefined, even if all other values were numbers.
Therefore, a query like SELECT SUM(c.val) FROM c will return undefined if there are any null values for c.val.
You could safeguard against possible non-number values by adding an IS_NUMBER filter for c.val.
Here’s an example:
SELECT SUM(c.val) FROM c WHERE IS_NUMBER(c.val)
In aggregate system functions, undefined values will simply be omitted from the calculation and will not impact the result. For example, a query like SELECT SUM(c.total) FROM c has an implicit IS_DEFINED(c.total) filter.
Learn more about Azure Cosmos DB query operators.
Great post! Now I stumbled across the following open issue: https://github.com/Azure/azure-sdk-for-java/issues/5050
Apparently it is not possible to use that with the current Java SDK as the underlying ObjectMapper is configured to serialize null fields and there is no way to change that.
Thanks Tim, great post.
One thing this doesn't mention, which has caught me out in the past, is ordering.
If my memory serves me correctly, an 'order by' will exclude records where the field is undefined, but include them where it is null.
So, <code>, would return Item 1 above, but not Item 2.
For this reason, we always save the fields with a null value.
Is there a way around that?
Hi Simon,
This is a great question. This documentation covers it here: https://docs.microsoft.com/azure/cosmos-db/sql-query-order-by#documents-with-missing-fields. If you add an explicitly included path (as opposed to /*) , you will still get undefined values in the ORDER BY results.