June 19th, 2007

Managing Reference Data in Visual Studio Team Edition for Database Professionals

New users of Team Edition for Database Professionals have asked how to manage reference data. For example, you might use reference data in a lookup table, such as zip codes, states, or any other data that changes rarely. You want that data to be the same for every unit test that you run, because the data will always be the same during production runs. You can use Team Edition for Database Professionals to generate test data, but the recurring question has been: what is the best way to insert the reference data into your development and test databases?

In this case, “best” is relative, but you have a few options:

·         You might use the Data Bound Generator that ships with Team Edition for Database Professionals (known in this post as the original Data Bound Generator).

·         You can generate reference data in a post-deployment script that is run when you deploy your database project.

·         You can use the soon-to-be-released Sequential Data Bound Generator.

Data Bound Generator

Let me address the Data Bound Generator first. Unfortunately, the original Data Bound Generator does not work quite the way that some folks think that it should. You provide the Data Bound Generator with a database connection and a query that returns some set of rows. That approach seems as though it would do what you want. However, the tricky bit with the Data Bound Generator is that it reads all of the returned rows into an internal dictionary and then randomly selects a row for each column value that you generate. This behavior means that you can get duplicate entries, and you get the results in a random order. To complicate matters further, if your query returns multiple columns and you want to populate a lookup table that has multiple columns, the results might surprise you.

For example, you might have a lookup table that has two columns, named ZipCode and TaxRate, with the following entries:

1.       98053 – 8.25%

2.       12306 – 7.5%

3.       02134 – 6.5%

4.       74447 – 9.5%

If you use the Data Bound Generator to generate four rows for a table with the same definition, you have no guarantee that the tax rates will appear with their associated zip codes. Each column is generated with a random lookup, so you could generate four rows and end up with a table that looks like the following:

1.       12306 – 6.5%

2.       98053 – 7.5%

3.       02134 – 7.5%

4.       98053 – 8.25%

Although the Data Bound Generator is useful for generating other types of test data, it is not a good choice for generating reference data.

Post-Deployment Scripts

You can use post-deployment scripts to,  for example, insert rows into a reference table. Rather than adding the Transact-SQL (T-SQL) statements directly to Script.PostDeployment.sql, you can instead add another script, perhaps named ReferenceData.sql, to your database project. You can then add a statement to Script.PostDeployment.sql to include the reference data script. This approach provides a cleaner separation, and that separation means that developers can work on different parts of your post-deployment script without needing to modify the same file. If you have multiple reference tables, you might create one script for each reference table.

To include the ReferenceData.sql script in your post-deployment script, open Script.PostDeployment.sql for editing, and add the following line to the bottom:

:r .ReferenceData.sql

Now you just need to create ReferenceData.sql and add it to your database project. You can take a shortcut by using Data Compare. To do so, build and deploy your database project to your isolated development environment. Your database contains no data at this point. Use Data Compare to compare the database that has the reference data to your empty development database. On the second page of the New Data Comparison wizard, select only the check box for the table that contains the reference data that you want. When the comparison finishes, export the results to the T-SQL editor, and copy the INSERT statements into your ReferenceData.sql script.

You can ensure that the reference data is always reset to match the data that you just copied from your source database. To always reset this data, you can add the following T-SQL before the INSERT statements:

IF NOT EXISTS (SELECT * FROM [SchemaName].[TableName])

      DELETE FROM [SchemaName].[TableName];

 

As an alternative, you can use the following approach if you just want to avoid errors in deployment if the table is already populated:

IF NOT EXISTS (SELECT * FROM [SchemaName].[TableName])

BEGIN

BEGIN TRANSACTION

Add your insert statements here

COMMIT TRANSACTION

END

The disadvantage of this approach is that you must manually maintain your ReferenceData.sql file.  Because reference data tends to not change frequently, this issue should be minor.

The advantage is that you do not need to create data generation plans, and every time that anyone deploys your database project, the reference tables are correctly populated.

Sequential Data Bound Generator

The final approach will be available with the upcoming release of power tools of Team Edition for Database Professionals. The new Sequential Data Bound Generator behaves how many customers expected the original Data Bound Generator to behave. The new generator returns the records in the order specified by the query and will keep the relationships within each row as a consequence. When the power tools are released, you can install the power tools and use the Sequential Data Bound generator in your data generation plans. (Unfortunately, I do not have a confirmed date for this release, but we expect it to be within a week or two of the release date of Service Release 1 for Team Edition for Database Professionals). As for the original Data Bound Generator, you must provide a connection and a query for the Sequential Data Bound Generator.

The advantage of this approach is that you can populate your development or test databases without having to maintain a post-deployment script. You do not have to update your post-deployment script every time that the reference data changes. By using this approach, you can also populate your reference tables differently for different test configurations, without having to maintain multiple build configurations for your database project.

The disadvantage of this approach is that you must create and maintain a data generation plan. Because you can easily associate a data generation plan with a database unit test, this drawback is not significant. You must also specify the correct number of rows to generate in the table that uses the Sequential Data Bound Generator.

-Steven Powell

Category
DevOps

Author

0 comments

Discussion are closed.