none
Need help with the query please.

    Question

  • Hello,
    
    Is there an easy way to find ASCII > 127 TSQL.  I need to find all the corrupt doc files and set to 1 but I have to do manually by using Substring string below.
    This can be tedious and wonder if there is a better way to write this query.
    
    Thanks in advance.
    
    corrupt Doc_desc
    ------- -----------
    1       Q@  Q@  R 
    ¥˜€R{P ÑQ‹ˆ 4gœpíI@  Q@  Q@  Q@  Q@  Q@  Q@  Q@  g4P E P E P E P (¢Š 
     h¢€
    (¢€
    (¢€
    (¢€
    CKE sÚŽ"¦±."£ê £Í Ç$ (Y>^˜#'Ö§´±ÖîŠÿ l^Zù äÃk  ÿ Þbz{
    Ú¢€9ëÍ#]½Õ'Yuˆ×H- ·XFý¸åwv×Þº EDTQ…Q€aKE  f€0(¢€ 
    QE  QE  QE  QE  QI¹CmÈÉíšZ (¢Š (¢Š (¤Þ"ön ±œgœRÐ E P E P E P E P E P E P E P HFih   QE  QE  QE  QE  QE bø´ÿ ¡[×ÔÖ" Å™û
    ±ôºúÖÕo?áCçú ´ÿ SÑ~¡E VPQE  QE  QE  QE  QE  QFi²(' 72äc*pGã@ ´}:]:;…šúkÃ,í(i€á ¯×)ðýLšf­
    Ï4›u)ãÞÒ Ø  jŸƒôµÕô­NßPº¹ž Ô&Œ)"ƒÆ I Ÿå@ " ªYˆ rIíU4­B-RÌ]AÌlìªsÔ FJç¼&F§á+›KÖk"‚i­òìrÁO 4Ï ›{ ¦¦±|ñC," <í-Çé@ y   É¢¸Øm&Õü? ÓiSÍys š· z©W# ©Ýo ŒTzÜZ¢|8"êÌðßÛF2ÑË÷°À Hë'Ö€;
    Ñú ä®#ñ ¿‚Ö;Èš{¸dQ*ÄùiažGr*} 
    ãAÕoíî´¹   † [œ£`Žëì{Ð Êêï­6-Œ­*@f|"È :½\}žŸb>"ßÆ-ÀÝ`' OÞ/É®ÀP  8M"Úßý®dòQ"ÉVÂ>{'íW‰ €HÉè+oÕbÙñEuy1$ n]çn@ëÆ™­Z-ñî y£ Á1p®pp?N´ Ö+"  #® JZäf´‡Eñ¾-,o¢Šö)Rd 
    Hb Aç½uÔ ×-8ñ½ÕsÓqÆiÁƒ 
    ‚ ö®CSÔàÒ¼GvuÛGk)Ñ µÉMé *}9æ§ÔîâÑ<!4ú\Ë4rIˆ\6B‰-×Û4 ÒùÑ <¿17ÿ wpÏåNgE8f ûšåµÍ úóNXôû m¯b*Ñ\ þ`G\2sV¼C£IªhJòíS·ŒJ®æ(É þé9 
    òê  €' ÏZZç<;,>#h5¹âU- ò-,çc #ù{WG@  Q@  Q@  
    	=4 ƒâï CáÛ2â/>àŒ`ã çÚ¼Âû^Öoï™õ=JHàq¹bˆâ0 oéZ©,¾3ÖÙ$ƒË·'FoæÎvB ü©tk-3Qñ¤þt,º}¥‹O $ÿ 
    @Ü<þ 
     Þ ¹Ó5	gÓ®Qe¹Y>Ñ 
    " ¹x }Euñ4- " [Úhg 
    ˸后ïŸë^]቟NÕ`¼' 0nvo B HJõx" ޛ㜠ü¸óèy&˜‹"QE 
    ä¾ ´ÖɦËmwhÓÞ¤ äÌ@e öéÚºÚä>%œ[h¤ÿ ÐRë@ l1ˆcXÔ³  ˱b~¤×3ñ$:xtÜGq4O Ññ o 
    
     ‚^µÔW1ñ1xFá½$‹·ûb€:?6$d‰¤@äp¥†OáOfTRÌÁTu$à
    ã¼s¤YZè/"E o­ž9 嘙 Ì'Óž+WÄ6ÐÞÛé×7-Ëmg 
    Ë4ѸÊÏÇ G~{sš Ù†â	óäÍ ¸ë±ÇåKçD[g˜›³Œn ®<¼IñJ-ÚÊ[4¹µ™ ²"  äeG§¾
    ?ÄÖ'i^*Ñõô…|¹f - p"óµÿ Ÿ?J 댈 º†= <šIfŠ Ý4© ú" +8Á ïˆ<֍[ì1í 
    Gñ¿?¢ãþú}2k­Vÿ Wž]>
    Á
    ãZÆ²È DLt  ¹Î{þ  Õ£"¨d`Êz r
    $ˆ$FBH c*pk
    ºV¡¥O~"ùIe,že¼(弜õ Û¥oÐ/ðèÉý™¨$³I1MFt
    # 8 Ç5ÑËuo
    "-xÑ›€À \¯ƒg6š 
    "rˆY¢¿º/©  ]>ÎãWðÄFm>Úy/!.ó<Ÿ0f zqn" Ö³*©f ( $ž G
    ݼÿ êgŽO÷X å.tÝ~×ÁpZHEõÝ´¨ÎˆÇ÷Ñ+gn{ñǾ*Ɖ©hZÖ"o=²}QWV·tØûHÁ ïƒÞ€:9®í p"O lÝ 0 
    8ÍåìV­-Ûò±9çqÆ"kÚ ¥¨C °·³²¸o^)oŽ@ ¸ ( ¨y£GTy YÎ  ‚~o wvÒÎðGq ÊŸy ÁeúŠå<Yhâ 
    ¼™IeoÒGF#8QÇó£Ä  šO‰<=ue [¼·M … 7" 
    ‡Ö€:馊 Ì"H± êÌp)×0]Gæ[Ì'§÷' ÎMuuwãk‹Hâ†XììÑÑ%b æ<°ü0)úv‡¨Ûø¦mOý 
    ;ˆvMo 'söoLÐ ü--ѤŽ÷ *Åþ°- '×ÒžóÂ'yÍ*,dgyaŒ}kˆðþ…§_k^'³"ƒÌ  ¢ c´esŸ¯§¥\Öá[  xjÓË+¤ÆZ%\åD"p™þ™  
    
    
    --Testing...
    
    -- Find all corrupt files.
      SELECT --eval_text AS 'eval_text_BadValue'
             check_sum
            ,[Path]
        FROM My_pat AS a
       WHERE ASCII(SUBSTRING(a.eval_text, 6, 200)) > 127  -- tedious right there....
         AND [corrupt] = 0
    GO
    
    
    --set to 1
    
         UPDATE b
            SET [corrupt] = 1
           FROM dbo.#temp AS a
           JOIN My_Pat AS b
             ON (a.check_sum = b.check_sum)
          WHERE (b.[corrupt] = 0 );

    Friday, June 28, 2013 6:28 PM

Answers

All replies

  • Try this:

    create table #invalid (c char(1))
    insert into #invalid values (char(128))
    insert into #invalid values (char(129))
    -- repeat up to (and including) 255
    insert into #invalid values (char(225))
    
    create table #temp (corrupt int,eval_text varchar(100))
    insert into #temp values (0,'abc')
    insert into #temp values (0,'ábc')
    
    UPDATE #temp
    SET corrupt = 1
    WHERE corrupt = 0
    AND   EXISTS (
      SELECT *
      FROM #invalid
      WHERE CHARINDEX(c, eval_text) > 0
    )
    
    select * from #temp
    
    drop table #temp
    drop table #invalid
    


    Gert-Jan

    Saturday, June 29, 2013 7:54 PM
  • I try your query but it doesn't work because the first character in col1 not always corrupt.  The datatype of col1 is varchar(max).  How do you deal with cases like these.
    
    Col1
    -----
    Mac OS X  â±xþ ë-j÷K§ÚJÍ y° Ž:àj‡Ä÷7·
    ABCD ÍiQJádaÛxGA
    DETBW %Ýë4~…@ >j숾§¢U
    This is a good data
    
    I want the result return good data:
    Col1
    -----
    This is a good data
    
    Or
    
    bad data:
    Col1
    -----
    Mac OS X  â±xþ ë-j÷K§ÚJÍ y° Ž:àj‡Ä÷7·
    ABCD ÍiQJádaÛxGA
    DETBW %Ýë4~…@ >j숾§¢U
    
    Thank you for trying to help.

    Saturday, June 29, 2013 11:31 PM
  • try this,

    create table #temp(c1 nvarchar(100))
    insert into #temp 
    select 'Mac OS X  â±xþ ë-j÷K§ÚJÍ y° Ž:àj‡Ä÷7·'
    union all select 'ABCD ÍiQJádaÛxGA'
    union all select 'DETBW %Ýë4~…@ >j숾§¢U'
    union all select 'This is a good data'
    
    
    select c1 from #temp 
    where patindex('%[^ -~0-9a-zA-Z]%', c1 COLLATE LATIN1_GENERAL_BIN)>0
    
    select c1 from #temp 
    where patindex('%[^ -~0-9a-zA-Z]%', c1 COLLATE LATIN1_GENERAL_BIN)=0

    more details http://iso30-sql.blogspot.in/2010/10/remove-non-printable-unicode-characters.html


    Thanks
    Sarat

    Please use Marked as Answer if my post solved your problem and use Vote As Helpful if a post was useful.

    Sunday, June 30, 2013 8:37 AM
  • Just add all character 128 thru 255 in #invalid (as the comment in the code said) and change

    select * from #temp
    to
    select * from #temp where corrupt=0


    Gert-Jan

    Sunday, June 30, 2013 5:57 PM
  • Thank you Sarat and Gert-Jan.  So far I try Sata Queries against temp table is working.

    I will test Gert-Jan method against big table see what happen.

    Again, Thanks so much.




    • Edited by NguyenBL Sunday, June 30, 2013 8:58 PM mistyping
    Sunday, June 30, 2013 8:53 PM