Introduction  

In this post, let us see how to search for a string / phrase in SQL Server database using hybrid solution of T-SQL LIKE operator & R grep function. Currently the options that exists in SQL Server to perform a search operation are 

  1. LIKE operator 
  2. Using string functions CHARINDEX, PATINDEX 
  3. Full text search 

Consider below example: To search and return only records with string "VAT" . Expected result is to return record 1,5 & 6.

DECLARE @Tmp TABLE (Id INT, Descrip VARCHAR(500))
INSERT @Tmp SELECT 1,'my VAT calculation is incorrect'
INSERT @Tmp SELECT 2,'Private number'
INSERT @Tmp SELECT 3,'Innnovation model'
INSERT @Tmp SELECT 4,'ELEVATE'
INSERT @Tmp SELECT 5,'total VAT'
INSERT @Tmp SELECT 6,'VAT'
SELECT * FROM @Tmp WHERE Descrip LIKE 'VAT'
SELECT * FROM @Tmp WHERE Descrip LIKE '%VAT'
SELECT * FROM @Tmp WHERE Descrip LIKE '%VAT%'
SELECT * FROM @Tmp WHERE Descrip LIKE '% VAT %'
SELECT * FROM @Tmp WHERE Descrip LIKE '% VAT'

 

As shown in above example, to do an exact search on string, there is no straight forward option using first two options mentioned above. However though it is possible with third option using  Full text CONTAINS predicate. Full text catalog, unique index & full text index has to be created on the table on which search operation needs to be performed.

If the exact search of string needs to be performed on the entire database then creating full text catalog, unique index & full text index on each and every table won’t be a viable option.

With the hybrid approach [T-SQL LIKE operator & R grep function], let us see various search types that can be performed [Pattern Search, Exact Search, Multi pattern search and other search scenario's  – based on collation, case sensitive/insensitive search and complex wildcard search].

 

Pre-requisites

We have used SQL Server 2019 evaluation edition on Windows 10 64 bit and WideWorldImporters SQL Server sample database for this example. In this example, we have made use of R services installed as part of SQL Server. Install R services and then from SSMS enable the external scripting feature. Restart the database engine and then verify the installation as mentioned in MSDN.

Below script / this approach will work starting from SQL Server 2016 and above (as execution of R language using T-SQL was introduced in SQL Server 2016). Also please note, no additional R packages need to be installed for this approach.

 

Explanation on how this approach works

 

A stored procedure named "usp_SearchString" has been created. This stored procedure has the capability to do normal T-SQL LIKE operations as well as can search string using R grep function and this can be controlled through input parameter.

Output of the search operation will be stored in a table named "Tbl_SearchString". Also output will be displayed at the end of stored procedure execution.

Below are the various input parameters of stored procedure and it’s usage details:

Parameter Description
@SearchString String to be searched
@SearchType

ES - Exact Search using R

PS - Pattern Search using R

 MPS - Multi Pattern Search (OR condition) using R

 NTLS - Normal T-SQL Like Search

@Match 0 = LIKE Search,
1 = NOT LIKE Search

@IgnoreCase

1 = case insensitive search,

0 = Case sensitive search

(If @IgnoreCase IS NULL then default : case insensitive search)

@SearchSQLMetadata Search sql definitions for presence of input string.
1 = Search,
0 = Don't Search

@SchemaName

List of objects to be searched that fall under schema (Multiple schema's can be passed, separated by Comma)
@ObjectlisttoSearch  List of objects to be searched (Multiple tables can be passed, separated by Comma)
@SearchCollate For @SearchType = NTLS if @IgnoreCase  = 0.
To search based on particular collation, default - COLLATE Latin1_General_CS_AS

 

If both @ObjectlisttoSearch & @SchemaName are blank then entire database is searched including SQL object definitions

 

Please note:

@SearchType  = ES, PS, MPS uses R script to search string

@SearchType  = NTLS does the normal T-SQL LIKE search

Multi string search should always be delimited by pipe. 

@ObjectlisttoSearch, @SchemaName should always be delimited by comma if multiple values specified.

 

Script

 

USE [WideWorldImporters]
GO
  
--Note : Before compiling this SP, search for sqlConnString and provide Databasename, username & password for R SQL connection
CREATE OR ALTER PROC usp_SearchString (  @SearchString NVARCHAR(MAX),
 @SearchType VARCHAR(4),
 @Match BIT,
 @IgnoreCase BIT,
 @SearchSQLMetadata CHAR(1),
 @SchemaName NVARCHAR(50),
 @ObjectlisttoSearch NVARCHAR(MAX),
 @SearchCollate NVARCHAR(500)
 )
  
/*************************************************************************
=================
INPUT PARAMETERS:
=================
  
@SearchString - String to be searched
  
@SearchType  - ES - Exact Search using R
 PS - Pattern Search using R
 MPS - Multi Pattern Search - OR condition using R
 NTLS - Normal T-SQL Like Search
  
@Match - 0 = LIKE Search, 1 = NOT LIKE Search
  
@IgnoreCase - 1 = case insensitive search, 0 = Case sensitive search (If @IgnoreCase IS NULL then default : case insensitive search)
  
@SearchSQLMetadata - Search sql definitions for presence of input string. 1 = Search, 0 = Don't Search
  
@SchemaName  - List of objects to be searched that fall under schema (Multiple schema's can be passed, separated by Comma)
  
@ObjectlisttoSearch  - List of objects to be searched (Multiple table's can be passed, separated by Comma)
  
--IF BOTH @ObjectlisttoSearch & @SchemaName ARE BLANK THEN ENTIRE DATABASE IS SEARCHED INCLUDING SQL DEFINITIONS
  
@SearchCollate - For @SearchType = NTLS if @IgnoreCase  = 0. To search based on particular collation, default - COLLATE Latin1_General_CS_AS
*****************************************************************************/
  
AS
BEGIN
SET NOCOUNT ON;
  
IF @SearchType IN ('ES','PS','MPS','NTLS')
  
 BEGIN
  
 DECLARE @ExecutedBy NVARCHAR(200) = CURRENT_USER
 DECLARE @Serv NVARCHAR(200) = CONCAT(CHAR(39),CHAR(39),@@SERVERNAME,CHAR(39),CHAR(39))
  
  
IF ISNULL(@SchemaName,'') <> ''  OR ISNULL(@ObjectlisttoSearch,'') <> ''
 BEGIN
  
/**** List of table columns to be searched  ****/
DECLARE @TableColList TABLE (Cols NVARCHAR(MAX),colname NVARCHAR(200),Tbl NVARCHAR(128),TblCol NVARCHAR(100),ColType NVARCHAR(150))
INSERT @TableColList
SELECT
 CASE WHEN TY.name IN ('date','datetime2','datetimeoffset','time','timestamp')
 THEN CONCAT('TRY_CONVERT(','VARCHAR(MAX),',C.name,') AS ',QUOTENAME(C.NAME))
 ELSE C.name END Columns  -- To cover poor data type conversions when passed to R dataframe
 ,C.name
 ,CONCAT(SCHEMA_NAME(T.SCHEMA_ID),'.',T.name) TableName
 ,CONCAT(SCHEMA_NAME(T.SCHEMA_ID),'.',T.name,'.',C.name) TblCol
 ,TY.name
FROM Sys.tables T
JOIN sys.columns C
ON T.object_id = C.object_id
JOIN sys.types TY
ON C.[user_type_id] = TY.[user_type_id]
-- Ignore the datatypes that are not required
WHERE TY.name NOT IN ('geography','varbinary','binary','text', 'ntext', 'image', 'hierarchyid', 'xml', 'sql_variant')
AND (Schema_name(T.schema_id) IN (SELECT value FROM STRING_SPLIT(@SchemaName, ','))
OR CONCAT(SCHEMA_NAME(T.SCHEMA_ID),'.',T.name) IN (SELECT value FROM STRING_SPLIT(@ObjectlisttoSearch, ',')))
  
 END ELSE
  
 BEGIN
  
 INSERT @TableColList
SELECT
 CASE WHEN TY.name IN ('date','datetime2','datetimeoffset','time','timestamp')
 THEN CONCAT('TRY_CONVERT(','VARCHAR(MAX),',C.name,') AS ',QUOTENAME(C.NAME))
 ELSE C.name END Columns  -- To cover poor data type conversions when passed to R dataframe
 ,C.name
 ,CONCAT(SCHEMA_NAME(T.SCHEMA_ID),'.',T.name) TableName
 ,CONCAT(SCHEMA_NAME(T.SCHEMA_ID),'.',T.name,'.',C.name) TblCol
 ,TY.name
FROM Sys.tables T
JOIN sys.columns C
ON T.object_id = C.object_id
JOIN sys.types TY
ON C.[user_type_id] = TY.[user_type_id]
-- Ignore the datatypes that are not required
WHERE TY.name NOT IN ('geography','varbinary','binary','text', 'ntext', 'image', 'hierarchyid', 'xml', 'sql_variant')
  
 END
  
DROP TABLE IF EXISTS #ExportTablesList
CREATE TABLE #ExportTablesList (Rn BIGINT IDENTITY(1,1),cols NVARCHAR(500),colname NVARCHAR(200),tbl NVARCHAR(200),ColType NVARCHAR(200))
  
IF @SearchSQLMetadata = 1 OR (@SearchSQLMetadata <> 0 AND (ISNULL(@SchemaName,'') = ''  AND ISNULL(@ObjectlisttoSearch,'') = ''))
 BEGIN
  
 INSERT #ExportTablesList (cols,tbl,ColType) SELECT 'CONCAT(''<'',object_schema_name(sm.object_id),''.'',object_name(sm.object_id),''|'',o.type_desc COLLATE Latin1_General_100_CI_AS,''>'',sm.definition) AS definition'
 ,'sys.sql_modules AS sm  JOIN sys.objects AS o ON sm.object_id = o.object_id'
 ,'sql_modules'
  
 END
  
--Deduplication of object list
;WITH dedup
AS
(
SELECT *,ROW_NUMBER()OVER(PARTITION BY Tbl,Cols ORDER BY Cols) Rn FROM @TableColList
)
INSERT INTO #ExportTablesList
SELECT cols,colname,tbl,ColType FROM dedup
WHERE Rn = 1
AND tbl <> 'dbo.Tbl_SearchString'
  
 /**** List of table columns to be searched  ****/
  
 IF (SELECT COUNT(1) FROM #ExportTablesList) <> 0
  
 BEGIN
  
 --Table to hold search output
IF NOT EXISTS (SELECT 1 FROM sys.tables WHERE name = 'Tbl_SearchString')
BEGIN
CREATE TABLE [dbo].[Tbl_SearchString] (
[RunId] FLOAT,
[SearchIndex] BIGINT,
[SearchValue] NVARCHAR(MAX),
[NoOfOccurance] FLOAT,
[ObjectName] NVARCHAR(200),
[ColumnNameORDefinition] NVARCHAR(200),
[SqlDatatype] NVARCHAR(200),
[InputParameter] NVARCHAR(800),
[ExecutedBy] NVARCHAR(200),
[ExecutedAt] DATETIME
)
  
END
  
DECLARE @RunId FLOAT
SELECT @RunId = COALESCE(MAX([RunId]),0)+1 FROM [dbo].[Tbl_SearchString]
  
--Processing to store input parameters
DECLARE @Input NVARCHAR(MAX) = CONCAT(
 '@SearchString > ' ,CASE WHEN @SearchString = '' OR @SearchString IS NULL THEN 'NULL' ELSE @SearchString END
,',@SearchType > '  ,CASE WHEN @SearchType = '' OR @SearchType IS NULL THEN 'NULL' ELSE @SearchType END
,',@Match > ' ,COALESCE(@Match,0)
,',@IgnoreCase > '  ,COALESCE(@IgnoreCase,1)
,',@SearchSQLMetadata > '  ,CASE WHEN @SearchSQLMetadata = '' OR @SearchSQLMetadata IS NULL THEN 'NULL' ELSE @SearchSQLMetadata END
,',@SchemaName > '  ,CASE WHEN @SchemaName = '' OR @SchemaName IS NULL THEN 'NULL' ELSE @SchemaName END
,',@ObjectlisttoSearch > '  ,CASE WHEN @ObjectlisttoSearch = '' OR @ObjectlisttoSearch IS NULL THEN 'NULL' ELSE @ObjectlisttoSearch END)
  
--By default case insensitive search
SELECT @IgnoreCase = COALESCE(@IgnoreCase,1)
  
--By default LIKE search
SELECT @Match = COALESCE(@Match,0)
  
IF @SearchType = 'NTLS'
  
BEGIN
  
DECLARE @SearchStrings TABLE (Id INT IDENTITY(1,1),String NVARCHAR(MAX))
  
INSERT @SearchStrings
SELECT value FROM STRING_SPLIT(@SearchString, '|')
  
 UPDATE #ExportTablesList SET Tbl = 'sys.sql_modules', colname = 'definition'
 WHERE ColType = 'sql_modules'
  
 SET @SearchCollate = CASE WHEN @SearchCollate = '' THEN NULL ELSE @SearchCollate END
  
 DECLARE @COLLATE NVARCHAR(100)
 SET @COLLATE = CASE WHEN @IgnoreCase = 0 THEN CASE WHEN @SearchCollate = '' OR @SearchCollate IS NULL THEN ' COLLATE Latin1_General_CS_AS '
 ELSE CONCAT(' COLLATE ',@SearchCollate,' ') END
 ELSE CHAR(32) END
  
 DECLARE @SearchOperator NVARCHAR(100)
 SET @SearchOperator = CASE WHEN @Match = 1 THEN ' NOT LIKE ' ELSE ' LIKE ' END
  
 DECLARE @WHEREClause NVARCHAR(MAX)
  
;WITH CTE
AS
(
SELECT  'SearchValue '+ @SearchOperator +''''+String+''''+@COLLATE WhereClause  FROM @SearchStrings
)
  
SELECT @WHEREClause = STUFF(
(SELECT  ' OR ' + WhereClause FROM
(SELECT WhereClause FROM CTE ) AS T FOR XML PATH('')),2,2,'')
  
END
  
SET @SearchString  = CASE WHEN @SearchType = 'ES' THEN REPLACE(@SearchString,'"','') ELSE @SearchString END
  
 /**** Loop through above Objects list and execute R script ****/
 DECLARE @I INT = 1
 ,@SQL NVARCHAR(MAX) = N''
 ,@RScript NVARCHAR(MAX) = N''
 ,@tblname NVARCHAR(128)
 ,@Colname NVARCHAR(200)
 ,@Sqltype NVARCHAR(100)
  
 WHILE @I <= (SELECT MAX(Rn) FROM #ExportTablesList)
 BEGIN
  
 SELECT @SQL = CONCAT('SELECT ',Cols,' FROM ',tbl)
 ,@tblname = Tbl
 ,@Colname = CASE WHEN @SearchType IN ('ES','PS') THEN cols ELSE colname END
 ,@Sqltype = ColType
 FROM #ExportTablesList WHERE Rn = @I
  
IF @SearchType IN ('ES','PS','MPS')
  
BEGIN
  
SET @RScript = '
 #Provide DB credential detail for storing output in a table
 sqlConnString <- "Driver=SQL Server;Server=serv; Database=WideWorldImporters;Uid=sa;Pwd=password"
  
#function to count no of occurences
 countCharOccurrences <- function(char,string,Type) {
 if (Type =="ES")
 {
 Boundchar <- paste0("\\b",char,"\\b",sep ="")
 string1 <- gsub(Boundchar,"",string,ignore.case=IgnoreCase)
 }
 string1 <- gsub(char,"",string,ignore.case=IgnoreCase)
 return ((nchar(string) - nchar(string1))/nchar(char))
 }
  
 #getting input dataset column name into a variable "c"
 c <- colnames(InputDataSet)
  
 if (SearchType == "ES")
 {
 ExactString <- paste0("\\b",SearchString,"\\b",sep ="")
 Output <-  as.data.frame(grep(ExactString,InputDataSet[[c]],ignore.case = IgnoreCase,invert = Match))
 colnames(Output)[1] <- "SearchIndex"
 Output$SearchValue <- grep(ExactString,InputDataSet[[c]],ignore.case = IgnoreCase,value = TRUE,invert = Match)
 Output$NoOfOccurance <- countCharOccurrences(SearchString,Output$SearchValue,SearchType)
 }
  
 if (SearchType == "PS" || SearchType == "MPS")
 {
 Output <-  as.data.frame(grep(SearchString,InputDataSet[[c]],ignore.case = IgnoreCase,invert = Match))
 colnames(Output)[1] <- "SearchIndex"
 Output$SearchValue <- grep(SearchString,InputDataSet[[c]],ignore.case = IgnoreCase,value = TRUE,invert = Match)
 if (SearchType == "PS") {
 Output$NoOfOccurance <- countCharOccurrences(SearchString,Output$SearchValue,SearchType) }
 }
  
 Output$ObjectName <- rep(tblname,nrow(Output))
 Output$ColumnNameORDefinition <- rep(c,nrow(Output))
 Output$SqlDatatype <- rep(Sqltype,nrow(Output))
 Output$ObjectName[Output$SqlDatatype == "sql_modules"] <- "sql_modules"
 Output$InputParameter <- rep(Input,nrow(Output))
 Output$ExecutedBy <- rep(ExecutedBy,nrow(Output))
 Output$ExecutedAt <- rep(format(Sys.time(),usetz = FALSE),nrow(Output))
 Output$RunId <- rep(RunId,nrow(Output))
  
 sqlDS <- RxSqlServerData(connectionString = sqlConnString,table = "Tbl_SearchString")
 rxDataStep(inData = Output, outFile = sqlDS,append = "rows")
 '
  
EXEC  sp_execute_external_script
 @language = N'R'
 ,@script = @RScript
 ,@input_data_1 = @SQL
 ,@params = N'@SearchString NVARCHAR(MAX),@SearchType VARCHAR(4),@Match BIT,@IgnoreCase BIT,@Input NVARCHAR(MAX)
 ,@tblname NVARCHAR(128),@Sqltype NVARCHAR(150),@ExecutedBy NVARCHAR(200),@RunId FLOAT
 ,@Serv NVARCHAR(200)'
 ,@SearchString = @SearchString
 ,@SearchType = @SearchType
 ,@Match = @Match
 ,@IgnoreCase = @IgnoreCase
 ,@Input = @Input
 ,@tblname = @tblname
 ,@Sqltype = @Sqltype
 ,@ExecutedBy = @ExecutedBy
 ,@RunId = @RunId
 ,@Serv = @Serv
  
END
  
IF @SearchType = 'NTLS'
  
BEGIN
  
INSERT [dbo].[Tbl_SearchString]([RunId],[SearchIndex],[SearchValue],[ObjectName]
 ,[ColumnNameORDefinition],[SqlDatatype],[InputParameter],[ExecutedBy],[ExecutedAt])
EXEC ('SELECT '+@RunId+',SearchIndex,SearchValue,'''+@tblname+''','''+@Colname+''','''+@Sqltype+''','''+@Input+''','''+@ExecutedBy+''', GETDATE()
FROM (SELECT ROW_NUMBER()OVER(ORDER BY (SELECT 1)) SearchIndex,'+@Colname+' AS SearchValue FROM '+@tblname+
' ) Tmp WHERE '+@WHEREClause)
  
END
  
SET @I = @I + 1
END
  
 /**** Loop through above table list and execute R script ****/
  
 --Display final search result
 SELECT * FROM [dbo].[Tbl_SearchString] WHERE RunId = @RunId AND ExecutedBy = CURRENT_USER
  
  
 END
  
 ELSE
  
 SELECT 'No valid objects passed in the InputParameter to search the string' AS InvalidParameter
  
 END
  
 ELSE
  
 SELECT 'SearchType parameter is mandatory ES - Exact Search, PS - Pattern Search,MPS - Multi Pattern Search - OR condition
 ,NTLS - Normal T-SQL Like Search' AS InvalidParameter
  
 END

Sample execution code block:

EXEC usp_SearchString         @SearchString = 'VAT'
            ,@SearchType = 'ES'
  ,@Match = 0  -- 0 = LIKE, 1 = NOT LIKE
  ,@IgnoreCase = 1  -- 1 = Case insensitive, 0 = Case Sensitive
  ,@SearchSQLMetadata= 0 -- 1 = Search, 0 = Don't Search
  ,@SchemaName = ''
  ,@ObjectlisttoSearch = 'dbo.Tmp'
  ,@SearchCollate = ''

Sample script execution screenshots

1. Search string in table from DB 

Example 1: If we want to search for a string "Ava" on Application.People table from WideWorldImporters database, we can try by setting parameters values as shown below :

 

2. Exact search in table from DB

In example 1, we did a pattern search. If we want to do a exact search for a string "Ava" on Application.People table from WideWorldImporters database, we can try by setting parameters values as shown below :

3. Exact case sensitive search

Example 3: In example 2, we did a exact search. If we want to do a exact case sensitive search for a string "Ava" on Application.People table from WideWorldImporters database, we can try by setting parameters values as shown below :

4. Exact case sensitive search (NTLS)

If we want to do a exact case sensitive search for a string "male" on Purchasing.PurchaseOrderLines table from WideWorldImporters database, we can try by setting parameters values as shown below :

5. Exact case sensitive search (Using R)

In example 4, it returned two rows (records that contains both "male" and "female") as the @SearchType was set NTLS (Normal T-SQL Like search). But we actually expect the result to be one row with record that contain only "male". If we again do a exact case sensitive search for a string "male" on Purchasing.PurchaseOrderLines table from WideWorldImporters database, by setting @SearchType =  'ES' shown below :

6. Multi string search delimited by pipe

If we want to do a multi string search delimited by pipe (search string "Ava","Amy") on Application.People table from WideWorldImporters database, we can try by setting parameters values as shown below :

Please note NoOfOccurance field will be populated only for @SearchType = "ES" & "PS" (for single string search without wildcard).

7. Multi string search using normal T-SQL LIKE search

In example 6, multi string search was done using R script. If we want to do the same multi string search using normal T-SQL LIKE search, we can try by setting parameters values as shown below :

Also in this example, let us see how we can search string with specific collation setting.

8. Fixed pattern search

If we want to do fixed pattern search say if we know the string to be searched is a two letter alphabet then we can try by setting parameters values as shown below :

9. Search phrase from multiple tables delimited by comma

Example 9: If we want to search for a phrase from multiple tables delimited by comma then we can try by setting parameters values as shown below :

10. Search for a date from multiple schema's delimited by comma

Example 10: If we want to search for a date from multiple schema's delimited by comma then we can try by setting parameters values as shown below :

11. Search string called "Password"  in entire database

Example 11: If we want to search for a string called "Password"  in entire database including SQL object definitions then we can try by setting parameters values as shown below :

12. Wild card search

Example 12: Below example shows how we can do wildcard search when the search is done using R script, refer parameters values as shown below :

To know more about R wild card search using "?","*","^","$", please see the link provided in reference  section.

Tbl_SearchString stores the details about search made on a string. If we want to see the entire details (all the other fields from a table) of the record that matches the searched string then we can try like shown below:

--To get deails of particular RunId
SELECT  DISTINCT RunId
 ,[ObjectName]
FROM [WideWorldImporters].[dbo].[Tbl_SearchString]
WHERE RunId = 12
 
SELECT  A.*
 ,B.*
FROM [WideWorldImporters].[dbo].[Tbl_SearchString] A
JOIN (SELECT row_number()over(order by (SELECT 'A')) Rn,* FROM Warehouse.StockItemHoldings ) B --Change table name
 ON A.SearchIndex = B.Rn
 AND A.ObjectName = 'Warehouse.StockItemHoldings' --Change table name
 AND RunId = 12 --provide run id

 

Summary

R & Python language extension was introduced in SQL Server 2016 & 2017 as part of machine learning. With support of R in Azure SQL database, this new approach can be used extensively as it easy, flexible and supported in both On-premise & Azure SQL database.

This post is just to give an overview of this new approach for searching strings that resides in any corner of the SQL Server database using T-SQL / R script. Based on specific requirement tweaking the solution mentioned above (with other powerful R string packages / glob2rx) can cover any scenario.


See Also

 


Other References