none
Varbinary(Max) to UTF-8 ? RRS feed

  • Question

  • I have a varbin(max) value that I need to spit out as UTF-8.  Any thoughts anyone?



    declare @vb varbinary(max)
    set @vb = 0x54006800690073002000690073002000610020007400650073007400

    select  some_Magic_function(@vb)



    PAC
    Wednesday, June 24, 2009 11:32 PM

Answers

  • In general, SQL Server does not hold UTF-8 in high regard.  :)  However, .NET has methods to do this and you can get at them via CLR integration.

    Compile this using C#:

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    
    using System.Data.SqlTypes;
    using Microsoft.SqlServer.Server;
    
    namespace UtfLibrary
    {
        public static class UtfMethods
        {
            [SqlFunction(IsDeterministic = true, IsPrecise = true)]
            public static SqlBinary NVarCharToUtf8(SqlString inputText)
            {
                if (inputText.IsNull)
                    return new SqlBinary(); // (null)
    
                return new SqlBinary(Encoding.UTF8.GetBytes(inputText.Value));
            }
    
            [SqlFunction(IsDeterministic = true, IsPrecise = true)]
            public static SqlString Utf8ToNVarChar(SqlBinary inputBytes)
            {
                if (inputBytes.IsNull)
                    return new SqlString(); // (null)
    
                return new SqlString(Encoding.UTF8.GetString(inputBytes.Value));
            }
        }
    }
    

    Import the assembly into your database and create the external functions:

    CREATE ASSEMBLY UtfLibrary
    FROM 'C:\Users\name\Documents\Visual Studio 2008\Projects\UtfLibrary\UtfLibrary\bin\Release\UtfLibrary.dll'
    GO
    CREATE FUNCTION NVarCharToUtf8 (@InputText NVARCHAR(MAX))
    RETURNS VARBINARY(MAX)
    AS EXTERNAL NAME UtfLibrary.[UtfLibrary.UtfMethods].NVarCharToUtf8
    GO
    CREATE FUNCTION Utf8ToNVarChar (@InputBytes VARBINARY(MAX))
    RETURNS NVARCHAR(MAX)
    AS EXTERNAL NAME UtfLibrary.[UtfLibrary.UtfMethods].Utf8ToNVarChar
    • Marked as answer by rusag2 Thursday, June 25, 2009 12:57 AM
    Thursday, June 25, 2009 12:44 AM
    Moderator

All replies

  • UTF-8 is a binary encoding for (Unicode) characters.  As such, its meaningless to say "convert binary to UTF-8", unless you mean you want the binary values converted to a textual representation of UTF-8.  It could also make sense if possibly your binary value held characters in some other encoding besides UTF-8. . . but we'd have to know what that encoding was first.
    Michael Asher
    Thursday, June 25, 2009 12:14 AM
  • In general, SQL Server does not hold UTF-8 in high regard.  :)  However, .NET has methods to do this and you can get at them via CLR integration.

    Compile this using C#:

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    
    using System.Data.SqlTypes;
    using Microsoft.SqlServer.Server;
    
    namespace UtfLibrary
    {
        public static class UtfMethods
        {
            [SqlFunction(IsDeterministic = true, IsPrecise = true)]
            public static SqlBinary NVarCharToUtf8(SqlString inputText)
            {
                if (inputText.IsNull)
                    return new SqlBinary(); // (null)
    
                return new SqlBinary(Encoding.UTF8.GetBytes(inputText.Value));
            }
    
            [SqlFunction(IsDeterministic = true, IsPrecise = true)]
            public static SqlString Utf8ToNVarChar(SqlBinary inputBytes)
            {
                if (inputBytes.IsNull)
                    return new SqlString(); // (null)
    
                return new SqlString(Encoding.UTF8.GetString(inputBytes.Value));
            }
        }
    }
    

    Import the assembly into your database and create the external functions:

    CREATE ASSEMBLY UtfLibrary
    FROM 'C:\Users\name\Documents\Visual Studio 2008\Projects\UtfLibrary\UtfLibrary\bin\Release\UtfLibrary.dll'
    GO
    CREATE FUNCTION NVarCharToUtf8 (@InputText NVARCHAR(MAX))
    RETURNS VARBINARY(MAX)
    AS EXTERNAL NAME UtfLibrary.[UtfLibrary.UtfMethods].NVarCharToUtf8
    GO
    CREATE FUNCTION Utf8ToNVarChar (@InputBytes VARBINARY(MAX))
    RETURNS NVARCHAR(MAX)
    AS EXTERNAL NAME UtfLibrary.[UtfLibrary.UtfMethods].Utf8ToNVarChar
    • Marked as answer by rusag2 Thursday, June 25, 2009 12:57 AM
    Thursday, June 25, 2009 12:44 AM
    Moderator
  • Just noticed this.... your example bytes are not a UTF-8 string.  It is a UCS-2/UTF-16.  This is easy to see as for ASCII text a UTF-8 string looks like ASCII whereas a UCS-2/UTF-16 string alternates with 0x00 bytes (as yours does).  You can just cast it directly in SQL Server.

    declare @vb varbinary(max)
    set @vb = 0x54006800690073002000690073002000610020007400650073007400
    select cast(@vb AS nvarchar(max)) -- Prints: This is a test




    Thursday, June 25, 2009 12:48 AM
    Moderator