If you're a developer, you always find sanctuary in code, but there are times when you're presented with data in an unusual format and things can get... strange.
For example, I'm working with a client where they provide me with an Excel spreadsheet of rows and want to import the data into SQL Server.
Until I get these maintenance screens in place, I continue to import the data. I already wrote one of the data screens based on a previous post I wrote called Import a Data Hierarchy From Excel Into SQL Server.
Now I know what you're thinking. Why not use DTS (Data Transformation Services) of SQL Server?
When importing the data, you have an option to write a SQL Statement or define the columns and their sizes. While this provides a robust front-end for importing data, there are more inconsistencies in this particular approach.
I started down this path and found it way too time-consuming, so I decided to look for another way.
Back to the spreadsheet.
In this Excel spreadsheet, the data wasn't consistent at all.
- Names are in one cell ("John Doe") instead of in separate cells (First Name, John, Last Name Doe).
- Phone Numbers are in different formats ('(999) 999-9999', '999-999-9999', '999.999.9999', '(999)999-9999 Ext.99', etc.)
Since we want our initial load of data to be consistent in the database, we need to "massage" the data into a usable format.
In today's post, since I've been wrestling with inconsistent data this weekend, I'll show a quick and simple splitter to help with manipulating strings which will save you hours of work.
Creating the Function
Since this is just for an initial load of data, this code is not meant for a production environment.
However, I've been using this function for a long time. It's one of the SQL Server functions I use in my arsenal and I find it quite useful.
The function, when created, is stored in the Table-valued Functions section (<database> > Programmability > Functions > Table-valued Functions) and is always available.
CREATE FUNCTION [dbo].[Split] (
@delimited NVARCHAR(MAX),
@delimiter NVARCHAR(100))
RETURNS @t TABLE (
id INT IDENTITY (1, 1)
,val NVARCHAR(MAX)
)
AS
BEGIN
DECLARE @xml XML
SET @xml =
N'<t>' +
REPLACE(@delimited, @delimiter, '</t><t>')
+ '</t>'
INSERT INTO @t (val)
SELECT
r.value('.', 'varchar(MAX)') AS item
FROM @xml.nodes('/t') AS records (r)
RETURN
END
Very quick and easy. The dbo.Split
function works like a table.
SELECT * FROM dbo.Split('John Doe',' ')
Execute this statement and you receive the following output.
id | val |
---|---|
1 | John |
2 | Doe |
As you can see, this makes splitting names very easy. If you want to save the values into the fields FirstName
and LastName
, it's as easy as:
DECLARE @FirstName VARCHAR(100);
DECLARE @LastName VARCHAR(100);
SELECT @FirstName=val FROM dbo.Split(@Person,' ') WHERE id=1
SELECT @LastName=val FROM dbo.Split(@Person,' ') WHERE id=2
Now, I've run into a couple issues where the name in the Excel spreadsheet is "John A. Doe". If you have a middle initial, it will still split it out, but you'll have three records instead of two.
Perform a count on the resulting split to determine the best approach for your data.
For my needs, I only added the middle name variable to the first name variable and save the FirstName
to the table.
What About the Phone Numbers?
The phone numbers required additional massaging to get the format we need.
The client's database requires all numbers in the field — no dashes, periods, or parentheses.
We need to remove any extraneous characters and perform the same split.
-- (In Excel spreadsheet, possible phone values are '(xxx) xxx-xxxx',
-- 'xxx-xxx-xxxx', 'xxx.xxx.xxxx', '(xxx)xxx-xxxx Ext.99'...or anything else).
DECLARE @PhoneSplitter VARCHAR(30);
--Test
SELECT @PhoneSplitter='999.999.9999';
-- Remove the area code parentheses '()'
SELECT @PhoneSplitter = REPLACE(@PhoneSplitter,')',' ');
SELECT @PhoneSplitter = REPLACE(@PhoneSplitter,'(',' ');
-- Remove the periods
SELECT @PhoneSplitter = REPLACE(@PhoneSplitter,'.',' ');
-- Remove the dashes
SELECT @PhoneSplitter = REPLACE(@PhoneSplitter,'-',' ');
-- Remove extra spaces
SELECT @PhoneSplitter = REPLACE(@PhoneSplitter,' ',' ');
SELECT * FROM dbo.Split(@PhoneSplitter,' ');
Your output will consist of the following:
id | val |
---|---|
1 | 999 |
2 | 999 |
3 | 9999 |
Concatenate, cast as an integer, and save them to your table.
Again, you may have a phone number which includes an extension ("ext. 2044" or "x64") meaning you'll have four records instead of three.
Adjust as necessary for your data needs.
Conclusion
I find it ironic working with databases where I have problems manipulating data to my needs.
Maybe it's because I'm not a DBA or maybe I've been spoiled by C#'s string manipulation methods.
This particular function is one of my tried-and-true functions which I use every time I'm in SQL Server when I have to massage or manipulate data.
In today's post, I demonstrated how to use XML to perform string manipulation and how it's meant to be a starting point for full-stack developers to discover additional ways in writing their own database "functions" to speed up the unusual ways of data manipulation.
Was there a better way to perform this? Do you have an existing library of database functions? Post your comments below and let's discuss.