Azure Search is a search-as-a-service offering from Azure that can be used to easily wrap static or slowly changing datasets, such as SQL databases, with a scalable API supporting OData lookups and robust search functionality. Below I introduce a new open source tool which helps integrate Azure Search into your devops process.
Code-Free API
For some extremely common scenarios, Azure Search, with the help of a new utility presented below, can provide a scalable and code-free API layer for your application that is simple to build and manage. In many cases, you can build a robust web application that relies on querying large datasets without worrying about building server-side API components at all.
We recently leveraged this search-as-an-API technique to speed the development of the CourseIQ prototype – a data intensive web application to support golf course architects. In CourseIQ, all of our data is sourced from an Azure SQL server instance and exposed to our web application via Azure Search. We make all Azure Search calls client-side from our ReactJS application. This dramatically reduced the amount of code required to expose data, and gave us the flexiblity to iterate on schemas quickly.
Below, we’ll discuss where this technique is useful, and the tooling built to streamline the process.
The Scenario
This solution is most applicable to situations where the data is in a supported source location, and a RESTful API is desired with minimal authentication requirements (either public, or secured with a simple key).
The following data locations support fully automated indexing by Azure Search, including scheduled incremental updates to the index to support changing data:
- Azure SQL Database
- SQL Server on an Azure VM
- Azure DocumentDB
- Azure Blob storage
- Azure Table storage
From any of these data sources you can scale up to 1.4 billion documents.
Using Azure Search as your API layer would not be practical if your application writes to a datasource in addition to reading from it. It would also not be practical if you need fine-grained authentication or client-side support for authentication protocols like OAuth. In these situations consider Azure API Management or simply construct a custom API yourself.
What You Get
After configuring Azure Search over one of your datasets, you have access to a variety of API options for accessing the data. You will get RESTful endpoints that can be queried in production, including via XmlHttpRequest from websites with full CORS support.
Each endpoint supports:
- OData Lookups
- OData filter queries
- Lucene full-text search syntax
- Geographic search capabilities
This means you can do things like this:
$filter=(category eq 'Luxury' or parkingIncluded eq true) and rating eq 5
as well as
$filter=geo.distance(location, geography'POINT(-122.131577 47.678581)') le 10
with basic GET
requests of the format
GET /indexes('[index]')/docs?[query]
Scaling
Your Azure Search instance supports scaling options to support up to 1.4 billion records. You can also shard and replicate your data across multiple servers to accomodate your traffic. Scaling is easily accomplished in the Azure Portal.
How To Do It
You can approach implementation in two parts.
First, the initial configuration can be done using a convenient user interface in the Azure Portal. The second part, performing ongoing maintenance and integrating into your devops process, is more critical and requires tools beyond the Azure Portal.
Initial Configuration
You can build out your indexes and data sources right in the Azure Portal. The focus of this document is using Azure Search as your API layer, so I won’t dive very deep into the specifics of search configuration, but here are the basics:
For most situations, the easiest way to get started is with the Add Data Source wizard.
Or you can take a more manual approach:
- Create an Azure Search resource (link)
- Configure the resource by creating indexes and adding data
Regardless of how you create your initial search configuration, you will need to note one of the admin keys and the URL of the search resource for use later.
Source Control and DevOps
Standard portal-based configuration of Azure Search is great for initial setup, but additional tools are needed to support projects in production. There are several problems that must be addressed. In order to use Azure Search in a production application we must be able to reliably do the following:
- Manage configurations in source control
- Deploy configurations programatically
Azure Search provides a full management API surface that you could use to build this functionality, but requiring you to do all of this integration would negate the value of the code-free API solution.
We have built easy to use Python scripts that can automate all of these tasks and be run manually, or as part of any automated deployment process.
Azure Search Management Scripts
Let’s look at how we can address these two issues with the Azure Search Management scripts our team has released as part of our work on the CourseIQ project.
Azure Search Management has two primary tools, save_azsearch.py
and provision_azsearch.py
. With the options available in these tools you can serialize, provision, purge, and update your entire search configuration from the command line.
Manage configurations in source control
All configuration you make in the portal can be persisted to a JSON file and comitted to your source control system. These files are formatted clearly so future configuration edits can be made in source if preferred.
For exports, we built the save_azsearch.py
script.
An export of a configuration to a file named azsearchmgmnt.json
might look like the following example.
python save_azsearch.py -k F6D1EEEEAC2A4D00DB1A5DB8C2DF09BC -u https://azsearchmanagement.search.windows.net -o azsearchmgmnt.json
The output of a successful run of this script will be the JSON configuration of the search resource. It will contain the indexes, data sources and indexers currently configured. But it will not contain the connection string or secrets associated with the data sources.
To provide flexibility in how configuration is stored and versionsed, we leverage a separate secrets file for data sources. You will need to create one before you can provision from your exported configuration.
Data Source Configuration File
The data source configuration file is expected to be a JSON file containing an array of objects with the following properties:
Name | Description |
---|---|
type | The type of datasource being described (e.g. ‘azuresql’ or ‘documentdb’) |
sources | Array of strings, each of which is the name of a data source that uses this connection information (e.g. ‘datasource1’) |
connectionString | The connection string to be by the search service when pulling data from the source. This string will specify the credentials needed when accessing the data source such as username and password for SQL or key for DocumentDB. |
Here is an example with a SQL data source and a DocumentDB datasource. The source name will match the name in your exported configuration:
[ { "type": "azuresql", "sources": ["datasource1"], "connectionString": "Server=tcp:{AZURE_SQL_DB_SERVER_NAME}.database.windows.net,1433;Initial Catalog={AZURE_SQL_DB_NAME};Persist Security Info=False;User ID={SQL_USERNAME};Password={SQL_USER_PASSWORD};MultipleActiveResultSets=False;Encrypt=True;TrustServerCertificate=False;Connection Timeout=30;" }, { "type": "documentdb", "sources": ["datasource2"], "connectionString": "AccountEndpoint=https://{AZURE_DOCDB_NAME}.documents.azure.com:443/;AccountKey={AZURE_DOCDB_ACCESS_KEY};Database={AZURE_DOCDB_DB_NAME};" } ]
Deploy configurations programmatically
Any exported configuration can be redeployed with provision_azsearch.py
. You describe the behavior you would like when encountering existing elements in the service – they can be updated, deleted and redeployed, or skipped entirely.
This provisioning can be started manually or as part of an automated deployment.
This table describes the three mode options for handling existing resources during provisioning:
Mode | Description |
---|---|
skip | The script will skip applying any configuration to it. |
update | The script will attempt to update the configuration item. |
delete | The script will first delete the item then create a new one matching the specified configuration |
Here is an example of using the script to provision an instance:
python provision_azsearch.py -k F6D1EEEEAC2A4D00DB1A5DB8C2DF09BC -u https://azsearchmanagement.search.windows.net -c azsearchmgmnt.json -d azsearchmgmnt_datasources.json -b skip
Purge Option
The provisioning script can also be run with the -p
flag. This runs the script in “purge” mode where it will first delete all the indexes, data sources and indexers in the search instance. This option is useful for removing items that were created manually, but you do not want saved.
Future Work
The Azure Search Management tooling is ideal for integration into a variety of build and CI systems. It is a goal to provide plugins or examples where appropriate to help users do these integrations. Users are encouraged to provide feedback and pull requests at the GitHub repository.
0 comments