locked
Performance of Set-Type Data types created using CLR RRS feed

  • Question

  • In my applications, I often encounter situations where the domain of a type is some set (e.g. Student, Staff etc. for Membership type etc.). A common solution is to assign integral values to each possible value of the set & convert them to corresponding strings programatically when fetching data, or use a Foreign Key into a relation defining each of the unique value.

    MySql has a Set Data type. So, this time around, I decided to do the same with Sql Server, and resorted to CLR to create a custom data type, whose domain could only be a collection of predefined strings or other data types.

    But, just before deploying the assembly, the performance bug struck me. So, assuming a simple custom data type, like:

    Code Snippet

    <Serializable()> _
    <Microsoft.SqlServer.Server.SqlUserDefinedType(Format.Native)> _
    Public Structure MemberType
        Implements INullable

        Private Const Student As String = "Student"
        Private Const Staff As String = "Staff"

        Private memType As Int16
        Private m_Null As Boolean

        Public Overrides Function ToString() As String
            If (Me.m_Null) Then
                Throw New SqlTypes.SqlNullValueException("Null Reference Exception")
            End If

            Select Case memType
                Case 1
                    Return Student
                Case 2
                    Return Staff
                Case Else
                    Throw New Exception("Invalid State for MemberType")
            End Select
        End Function

        Public ReadOnly Property IsNull() As Boolean Implements INullable.IsNull
            Get
                ' Put your code here
                Return (Me.m_Null)
            End Get
        End Property

        Public Shared ReadOnly Property Null() As MemberType
            Get
                Dim h As MemberType = New MemberType
                h.m_Null = True
                Return h
            End Get
        End Property

        Public Shared Function Parse(ByVal s As SqlString) As MemberType
            If s.IsNull Then
                Return Null

            ElseIf (StrComp(s.ToString(), Student, CompareMethod.Text) = 0) _
                    OrElse StrComp(s.ToString(), "1") = 0 Then
                Dim u As New MemberType
                u.memType = 1
                u.m_Null = False

                Return (u)

            ElseIf (StrComp(s.ToString(), Staff, CompareMethod.Text) = 0) _
                    OrElse StrComp(s.ToString(), "2") = 0 Then
                Dim u As New MemberType
                u.memType = 2
                u.m_Null = False

                Return (u)

            Else
                Throw New SqlTypes.SqlTypeException("Invalid Value for MemberType")
            End If
        End Function
    End Structure


    Would it be a performance issue, if I use such Data types, whose data is fetched or updated frequently???
    Thursday, May 8, 2008 9:57 AM

Answers

  • If the calls to the CLR UDT are very frequent, it may be cost effective.

     

    There is a substaintial 'cost' to build up the CLR UDT; if is is not used very frequently, it may be aged out of cache, and the next call requires the entire cost again.

     

    If the calls are  less frequent, a lookup table, perhaps using a T-SQL UDT, may be more efficient.

     

    There are usually multiple ways to accomplish a goal. Try the obvious and select the one that works the best for the situation.

     

    Saturday, May 10, 2008 5:57 AM

All replies

  • I don't know that performance would be an issue.  In my opinion this is inappropriate use of CLR.  You should use a lookup table with a foreign key column to enforce data rules like this.

    Thursday, May 8, 2008 2:20 PM
  • But if there are a large number of rows in such a table, which are frequently fetched or updated, wouldn't the resulting joins or foreign key management be very time-consuming, especially if multiple tables need to be joined???

    From another perspective, these are not actually data rules or sematic constraints. There is simply a data type with a predefined set type domain (that is almost guaranteed not to change, or atleast change only once a while), and that is not available as a predefined type...

    If you look at it as a data rule, then a boolean value should also be implemented as a Foreign Key Constraint, with predefined set values, true and false.
    Saturday, May 10, 2008 3:06 AM
  • Actually, a boolean in SQL should be implemented in a single table solution as a bit field.  Since you are doing a UDT, then you admit you have known data values that you want to equate to the integers.  The resulting storage and indexing of physical tables with a lookup table would be cheaper and far more efficient than a CLR UDT for this.

    Saturday, May 10, 2008 3:10 AM
  • If the calls to the CLR UDT are very frequent, it may be cost effective.

     

    There is a substaintial 'cost' to build up the CLR UDT; if is is not used very frequently, it may be aged out of cache, and the next call requires the entire cost again.

     

    If the calls are  less frequent, a lookup table, perhaps using a T-SQL UDT, may be more efficient.

     

    There are usually multiple ways to accomplish a goal. Try the obvious and select the one that works the best for the situation.

     

    Saturday, May 10, 2008 5:57 AM
  • I think that Arnie has done a much better job of answering this than I did.  The real check on whether this is feasible is to test both options out, something that I have had to do more than once while figuring out what is better to create in CLR over doing a standard TSQL implementation.  There is a post on here from Kent Waldrop that shows that TSQL can outperform the CLR equivalent depending on the operation being performed.  It is available here:

     

    http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=3036691&SiteID=1

     

    One question you need to look at is whether or not you might need to expand the options of the CLR Type in the future.  I have never tried to change the definition of a CLR Type once it has been implemented, but if you have to do this, it would be really good to know what kind of problems you might encounter.

    Saturday, May 10, 2008 6:24 AM
  •  Jonathan Kehayias wrote:

    One question you need to look at is whether or not you might need to expand the options of the CLR Type in the future.  I have never tried to change the definition of a CLR Type once it has been implemented, but if you have to do this, it would be really good to know what kind of problems you might encounter.

     

    I think that this to me is the most serious argument against using SQL CLR UDTs. If you need to update the definition of a UDT, you need to drop all dependent objects - including dropping all tables that have columns defined of that type - before you can change the definition of the type. For me, this is a compelling argument to use the tried-and-true solution of a "lookup" table and a foreign key. I cannot speak to the relative performance of SQL Server to MySql in this regard, but I have always experienced excellent performance using this more traditional approach.

    Saturday, May 10, 2008 3:44 PM
  •  Jonathan Kehayias wrote:

    One question you need to look at is whether or not you might need to expand the options of the CLR Type in the future.  I have never tried to change the definition of a CLR Type once it has been implemented, but if you have to do this, it would be really good to know what kind of problems you might encounter.



    As this was the first time I was using a UDT, I have already encountered problems regarding this. I did not get the implementation right the first time, so the obvious solution was to modify the code and redeploy.
    To my surprise, VS2005 could not redeploy the assembly, as the datatype was already being used. And I could see this as a major problem with UDTs straight away.

    Perhaps, this is the only reason I have decided to used a standard T-SQL implemenation over the convenience of a .Net UDT.
    And MS should really provide some migration path for this. My case was a trivial problem. But for other more complex situations, where DBAs have used UDTs into thier databases, and need to add a method or two to it, or some other non-breaking minor changes, there should definitely be method to do this.

    Probably, the conversion operators available in .NET framework to convert one type to another can be used. And it can be done this way, that whenever you are trying to replace the definititon of a UDT already being used in the database, you need to provide conversion operators to port the state of UDT values from the old definition to the new definition.
    Sunday, May 11, 2008 4:16 AM
  • I've been playing with this a little off and on, and at lease when it comes to the example provided here, the UDT can be created like the following:

     

    Code Snippet

    using System;

    using System.Data;

    using System.Data.SqlClient;

    using System.Data.SqlTypes;

    using Microsoft.SqlServer.Server;

     

    using External.Assembly.Name;

     

    [Serializable]

    [Microsoft.SqlServer.Server.SqlUserDefinedType(Format.Native)]

    public struct EmployeeType : INullable

    {

    private bool m_Null;

    private int _EmployeeTypeCode;

    public int EmployeeTypeCode

    {

    get { return _EmployeeTypeCode; }

    set { _EmployeeTypeCode = value; }

    }

    public string Value

    {

    get

    {

    string retval = string.Empty;

    try

    {

    retval = Enum.GetName(typeof(Enums.EmployeeType), _EmployeeTypeCode).ToString().Replace("_", " ");

    }

    catch

    {

    retval = "Undefined Value";

    }

    return retval;

    }

    }

     

    public override string ToString()

    {

    return Enum.GetName(typeof(Enums.EmployeeType), _EmployeeTypeCode).ToString();

    }

    public bool IsNull

    {

    get

    {

    return m_Null;

    }

    }

    public static EmployeeType Null

    {

    get

    {

    EmployeeType h = new EmployeeType();

    h.m_Null = true;

    return h;

    }

    }

     

    public static EmployeeType Parse(SqlString s)

    {

    if (s.IsNull)

    return Null;

    EmployeeType u = new EmployeeType();

    try

    {

    u.EmployeeTypeCode = Int32.Parse(s.ToString());

    }

    catch

    {

    throw new ArgumentException("EmployeeType must be convertible to an int32!");

    }

    return u;

    }

    }

     

     

    Then you can create an Enum in an external but referenced Assembly to hold the values:

     

    Code Snippet

    public partial class Enums

    {

    public enum EmployeeType : int

    {

    Manager=0,

    Wait_Staff=1,

    Line_Cook=2,

    Dishwasher=3,

    Prep_Staff=4

    }

    }

     

     

    To add a new Type Value, you update the enum for the types, and redeploy the reference assembly, but no change has been made to the base type assembly so there is no problem caused by having to drop/create the UDT itself.  It is an interesting proposition that you have made here for how to use a UDT. 
    Sunday, May 11, 2008 7:01 PM
  • That was quite innovative!!! I think I might use that approach sometime sooner than later...

    But what do you think of my suggestion for porting a UDT when redeploying an assembly???
    Being an MVP, you definitely know much more about Sql Server internals, and its internal design and architecture, than I do. So, do you think my solution is practical logically or implementation-wise???
    Monday, May 12, 2008 4:13 AM
  • I am not a MVP, Matthew and Arnie are though, so I am not certain if you were addressing me or not, but they may be able to provide further insight into this. 

     

    I did however, test what it would take to migrate from one UDT to another, following the UDT I provided above as an example.  Since you can't redeploy an assembly with a UDT that is currently in use, what I have done is to create a separate Assembly for each UDT, and I have nothing but that UDT in its assembly.  (I am still testing the memory costs of this so I can't say that this is very wise.)  I created the above UDT without the Value Method so that it only stores and provides the TypeCode integer.  Then I added this column to a table and added a few rows:

     

    Code Snippet

    CREATE TABLE Employees

    (EmpID int identity primary key,

    EmpName varchar(30),

    EmpType EmployeeType)

    INSERT INTO Employees SELECT 'Jon Kehayias', '1'

    INSERT INTO Employees SELECT 'Jonathan Kehayias', '3'

    SELECT EmpID, EmpName, EmpType.EmployeeTypeCode [TypeCode]

    FROM Employees

    /*

    EmpID       EmpName                        TypeCode

    ----------- ------------------------------ -----------

    1           Jon Kehayias                   1

    2           Jonathan Kehayias              3

    */

     

     

    Then I created a new Assembly and this one had the UDT name EmployeeType2 and included the Value output.  Once I had this deployed into SQL I looked at how I could change over to this UDT.  Since you can't change a UDT column to a different UDT, and you can't CAST one UDT to another UDT, you get stuck having to manually convert the data to the new datatype.  I did it like the following:

     

    Code Snippet

    CREATE TABLE dbo.Tmp_Employees

    (

    EmpID int NOT NULL IDENTITY (1, 1),

    EmpName varchar(30) NULL,

    EmpType EmployeeType2 NULL

    ) ON [PRIMARY]

    GO

    SET IDENTITY_INSERT dbo.Tmp_Employees ON

    GO

    IF EXISTS(SELECT * FROM dbo.Employees)

    EXEC('INSERT INTO dbo.Tmp_Employees (EmpID, EmpName, EmpType)

    SELECT EmpID, EmpName, CONVERT(nvarchar, EmpType.EmployeeTypeCode)

    FROM Employees WITH (HOLDLOCK TABLOCKX)')

    GO

    SET IDENTITY_INSERT dbo.Tmp_Employees OFF

    GO

    DROP TABLE dbo.Employees

    GO

    EXECUTE sp_rename N'dbo.Tmp_Employees', N'Employees', 'OBJECT'

    GO

    ALTER TABLE dbo.Employees ADD CONSTRAINT

    PK__Employees__41D8BC2C PRIMARY KEY CLUSTERED

    (

    EmpID

    ) WITH( STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    GO

     

     

    This successfully updated the column type to the EmployeeType2 datatype and I could then select the .Value property as well:

    Code Snippet

     

    SELECT EmpID, EmpName, EmpType.EmployeeTypeCode [TypeCode], EmpType.Value [Position]

    FROM Employees

    /*

    EmpID       EmpName                        TypeCode         Position

    ----------- ------------------------------ ---------------- ----------------

    1           Jon Kehayias                   1                Wait Staff

    2           Jonathan Kehayias              3                Dishwasher

    */

     

     

    I then did some playing and added a TypeCode 4 record to the table which returns Unknown Type.  Then by updating the Enum in the reference assembly and redeploying that Assembly I now get the Position returned as Prep Staff. 

     

    I have no idea how this will perform versus the TSQL equivalent.  One thing to keep in mind is that you have to do manual serialization to index the properties of a CLR UDT, something I have yet to figure out how to do, but I am currently playing with on my free time.  According to the BOL, this change was made with SP1 due to issues with index corruption:

     

    http://msdn.microsoft.com/en-us/library/ms131082.aspx

     

    Monday, May 12, 2008 4:30 AM