locked
Remove html tags RRS feed

  • Question

  • INSERT [dbo].[HTMLTable] ([ID], [HTMLCode], [PlainText]) VALUES (1, N'<B>HEAD</B>', NULL)

    INSERT [dbo].[HTMLTable] ([ID], [HTMLCode], [PlainText]) VALUES (2, N'<B>Title</B>', NULL)

    INSERT [dbo].[HTMLTable] ([ID], [HTMLCode], [PlainText]) VALUES (3, N'<B>Body</B>', NULL)

    INSERT [dbo].[HTMLTable] ([ID], [HTMLCode], [PlainText]) VALUES (4, N'<I>TD</I>', NULL)

    INSERT [dbo].[HTMLTable] ([ID], [HTMLCode], [PlainText]) VALUES (5, N'<I>TH</I>', NULL)

    INSERT [dbo].[HTMLTable] ([ID], [HTMLCode], [PlainText]) VALUES (6, N'<B><I>TR</I>M</B>', NULL)

     

    I have records in this way and i need to remove the html tags and put plain text

    Thursday, October 21, 2010 12:32 PM

Answers

  • This may work:

    declare @str varchar(299)
    select @str = '<B><I>TR</I>M</B>'
    select cast (@str as xml).query('.').value('.', 'varchar(200)')

    I say may because the presumption is that the HTML is well-formed from an XML perspective. In XML tags must be balanced, so this piece of legal HTML will not work:

    <HTML><BODY><P>This is a text with a<BR>line break in it.</P></BODY></HTML>

    Once way to deal with these unbalanced tags is to use replace to get rid of them. (I can only think of two such tags that are common: <BR> and <HR>.)


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Links for SQL Server Books Online: SQL 2008, SQL 2005 and  SQL 2000.
    (Just click the link you need.)
    • Marked as answer by KJian_ Wednesday, October 27, 2010 9:18 AM
    Thursday, October 21, 2010 1:39 PM

All replies

  • This may work:

    declare @str varchar(299)
    select @str = '<B><I>TR</I>M</B>'
    select cast (@str as xml).query('.').value('.', 'varchar(200)')

    I say may because the presumption is that the HTML is well-formed from an XML perspective. In XML tags must be balanced, so this piece of legal HTML will not work:

    <HTML><BODY><P>This is a text with a<BR>line break in it.</P></BODY></HTML>

    Once way to deal with these unbalanced tags is to use replace to get rid of them. (I can only think of two such tags that are common: <BR> and <HR>.)


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Links for SQL Server Books Online: SQL 2008, SQL 2005 and  SQL 2000.
    (Just click the link you need.)
    • Marked as answer by KJian_ Wednesday, October 27, 2010 9:18 AM
    Thursday, October 21, 2010 1:39 PM
  • If you have valid XHTML code than you can use:

    UPDATE [HTMLTable]
    SET [PlainText]=CAST(CAST([HTMLCode] as XML).query('for $a in //text() return $a') as nvarchar(max))
    
    

    Thursday, October 21, 2010 1:46 PM