SQL SERVER - SCRAMBLE / MASK DATABASE PRODUCTION DATA

Sometimes you might need to test your product using some ‘production’ data to better simulate what users do with your software.

There are situations in which having only your test database, with your own (well formed) test data is not enough: just think of a continuously growing database with frequent schema changes...you might likely have inconsistencies between the data as you upgrade it or unpredicted behaviors.

If you want use a copy or a snapshot of a production database for test purposes, you should at least try to hide the sensitive data away (in a way that is really hard or impossible for your development team or anyone else that have access to the database to reconstruct them).

As you can imagine the task here can be extremely difficult to accomplish and there might be some tools out there that can help you doing this job.

If you do not want to invest in them a quick and dirty solution (it took me more or less 15 minutes to put it all together, and I’m not a SQL expert) can be writing some SQL scripts to modify what you consider to be your most sensitive data, leaving everything else intact. It doesn’t guarantee that all the sensitive data are hidden/changed because it relies on the code you write.

The typical scenario is to modify the names and telephone numbers of some personal data records, but you can apply this technique to anything you want.

Fist we need some functions to create test data, here you can use anything you want but I prefer something that generates data based on a prefix I can pass in; you can easily create functions that generate completely random data, but it will be hard to talk to your team and say something like: “ehi...check the person named ‘XWggTRDnnns’!”, a simple string concatenation does the job here.

--concatenates a generic string and an incremental number
CREATE FUNCTION fn_GenerateText(@prefix varchar(255), @incr int)
RETURNS varchar(255)
AS
BEGIN
return @prefix + CONVERT(varchar(100),@incr)
END
GO

Next we need to scan all our data tables and update the data we find there, here I decided to use a cursor (it’s not efficient but you can handle any type of key in this way, from guids to strings), so the idea is:

read each line of the table
generate some test data for each of the sensitive columns with text like: ‘name0’, ‘surname0’, etc... the number will be incremented for each line
update the data row with the given data for each of the columns you need to modify

-- a variable we use to generate incremental numbers
Declare @i int
set @i = 0

-- let's use a cursor, i don't know the data type of the key column
declare @id uniqueidentifier;
Declare id_cursor CURSOR
	FOR select AP_ID from dbo.PersonalData;
open id_cursor;
fetch next from id_cursor into @id;

WHILE @@FETCH_STATUS = 0
BEGIN

update  dbo.PersonalData
set 
AP_Note = dbo.fn_GenerateText('test', @i),
AP_FiscalCode = dbo.fn_GenerateText('fc', @i), -- or use a valid fiscal code generation routine here
AP_Surname = dbo.fn_GenerateText('surname', @i),
AP_Name = dbo.fn_GenerateText('name', @i),
AP_Phone = dbo.fn_GenerateText('t', @i),
AP_CellPhone = dbo.fn_GenerateText('c', @i)
Where AP_ID = @id 

-- increment the suffix and get the new data from the cursor
set @i = @i+1;
fetch next from id_cursor into @id;
END
CLOSE id_cursor;
DEALLOCATE id_cursor;
GO

In the end we just cleanup the database from every support function we created:

DROP FUNCTION dbo.fn_GenerateText
GO

As you can see this is a very quick solution and you will have to write some SQL code and manage the relations by yourself if needed, nonetheless once you have your scripts in place it’s a quite easy solution to use and maintain.

PrimordialCode

Sql Server - Scramble / Mask database production data

Related Content

Alessandro Giorgetti