Is excel the right tool to remove partially duplicate records

Answered Is excel the right tool to remove partially duplicate records

  • Friday, June 22, 2012 6:30 PM
     
     
    I have a huge spreadsheet with 1000's of records and the rows look like:

    z:\images\12346
    z:\images\12346\09132010_1151
    09/13/2010 12:46 PM 14,934 12346_09132010_1151_5_i.bmp
    09/13/2010 12:46 PM 14,934 12346_09132010_1151_6_i.bmp
    z:\images\505722
    z:\images\505722\10252011_0849
    10/25/2011 11:36 AM 56,200 505722_10252011_0849_35_0.d
    10/25/2011 11:36 AM 2,032,930 505722_10252011_0849_21_0.d
    10/25/2011 11:36 AM 14,934 505722_10252011_0849_12_i.b
    z:\images\8921
    z:\images\8921\04192011_1022
    04/19/2011 01:36 PM 14,934 8921_04192011_1022_92_i.bmp

    Is there a way to remove the duplicate records programmatically with excel , so to make the records look like below?

    z:\images\12346\09132010_1151
    09/13/2010 12:46 PM 14,934 12346_09132010_1151_5_i.bmp
    09/13/2010 12:46 PM 14,934 12346_09132010_1151_6_i.bmp
    z:\images\505722\10252011_0849
    10/25/2011 11:36 AM 56,200 505722_10252011_0849_35_0.d
    10/25/2011 11:36 AM 2,032,930 505722_10252011_0849_21_0.d
    10/25/2011 11:36 AM 14,934 505722_10252011_0849_12_i.b
    z:\images\8921\04192011_1022
    04/19/2011 01:36 PM 14,934 8921_04192011_1022_92_i.bmp

    Thanks in advance.

    Jay Tabatabai

All Replies

  • Saturday, June 23, 2012 1:34 AM
     
     Answered
    On Fri, 22 Jun 2012 18:30:08 +0000, jaytaba wrote:
    >I have a huge spreadsheet with 1000's of records and the rows look like:
    >
    >z:\images\12346
    >z:\images\12346\09132010_1151
    >09/13/2010 12:46 PM 14,934 12346_09132010_1151_5_i.bmp
    >09/13/2010 12:46 PM 14,934 12346_09132010_1151_6_i.bmp
    >z:\images\505722
    >z:\images\505722\10252011_0849
    >10/25/2011 11:36 AM 56,200 505722_10252011_0849_35_0.d
    >10/25/2011 11:36 AM 2,032,930 505722_10252011_0849_21_0.d
    >10/25/2011 11:36 AM 14,934 505722_10252011_0849_12_i.b
    >z:\images\8921
    >z:\images\8921\04192011_1022
    >04/19/2011 01:36 PM 14,934 8921_04192011_1022_92_i.bmp
    >
    >Is there a way to remove the duplicate records programmatically with excel , so to make the records look like below?
    >
    >z:\images\12346\09132010_1151
    >09/13/2010 12:46 PM 14,934 12346_09132010_1151_5_i.bmp
    >09/13/2010 12:46 PM 14,934 12346_09132010_1151_6_i.bmp
    >z:\images\505722\10252011_0849
    >10/25/2011 11:36 AM 56,200 505722_10252011_0849_35_0.d
    >10/25/2011 11:36 AM 2,032,930 505722_10252011_0849_21_0.d
    >10/25/2011 11:36 AM 14,934 505722_10252011_0849_12_i.b
    >z:\images\8921\04192011_1022
    >04/19/2011 01:36 PM 14,934 8921_04192011_1022_92_i.bmp
    >
    >Thanks in advance.
    >
    >
    >Jay Tabatabai
    This macro might work.  It assumes your data is in column A, and that everything in column a will be processed.
    One caveat:  this technique can only specify exact and partial matches.  It cannot require, for example, that the match starts at the beginning of the string.
    For example:
    abc 123\456
    zyx123456abc 123\4567890123

    Since the first string is included in the second, it will count as a "partial match" and be deleted.  If you need to also specify that the partial matches begin at the start of the strings being compared, then we need to use a different technique.  I do not present that initially because it takes longer to examine each cell than it does to use the .Find method.  And whether it is necessary depends on your data and desired results.

    I also assumed that ABC = abc for the sake of comparisons, but if that is not the case, make the obvious change in the matchcase parameter.

    To enter this Macro (Sub), <alt-F11> opens the Visual Basic Editor.
    Ensure your project is highlighted in the Project Explorer window.
    Then, from the top menu, select Insert/Module and
    paste the code below into the window that opens.
    To use this Macro (Sub), <alt-F8> opens the macro dialog box. Select the macro by name, and <RUN>.
    ============================
    Option Explicit
    Sub DeletePartialDups()
        Dim rSrc As Range
        Dim c As Range, r As Range
        Dim vRes() As Range
        Dim sFirstAdr As String
        Dim i As Long
    Set rSrc = Range("A1", Cells(Rows.Count, "A").End(xlUp))
    ReDim vRes(0)
    'Get list of cells to delete
    For Each c In rSrc
        sFirstAdr = c.Address
        Set r = rSrc.Find(what:=c, after:=c, LookIn:=xlValues, _
                    lookat:=xlPart, MatchCase:=False)
         If r.Address <> sFirstAdr Then
            Set vRes(UBound(vRes)) = c
            ReDim Preserve vRes(0 To UBound(vRes) + 1)
        End If
    Next c
    Application.ScreenUpdating = False
    For i = UBound(vRes) - 1 To 0 Step -1
        vRes(i).Delete xlShiftUp
    Next i
    Application.ScreenUpdating = True
       
    End Sub
    ==========================

    Ron
    • Edited by Ron Rosenfeld Saturday, June 23, 2012 1:36 AM
    • Marked As Answer by jaytaba Saturday, June 23, 2012 3:30 AM
    •  
  • Saturday, June 23, 2012 3:34 AM
     
     

    Thank you Ron appreciate your help.

    I ran your macro and then

    Sub main()

    Dim StrPath
    For Row = 1 To Range("A1").CurrentRegion.Rows.Count
        If Left(Range("A" & Row).Value, 1) = "z" Then
            StrPath = Range("A" & Row).Value
        Else
            Range("B" & Row).Value = StrPath & " " & Range("A" & Row).Value
        End If
    Next Row

    End Sub

    And I can now import the results of "DIR z:\*.* /s"  of my file system into a database for reporting & comparison.

    Regards

    Jay


    Jay Tabatabai

  • Saturday, June 23, 2012 10:32 AM
     
     
    Glad to help.  Thanks for the feedback.

    Ron

  • Monday, June 25, 2012 1:30 AM
    Moderator
     
     

    Hi,

    Thanks Ron Rosenfeld for helping Jay to solve this issue. And it will help others who meet the same issue!


    Jaynet Zhang

    TechNet Community Support

  • Monday, July 09, 2012 4:19 AM
     
     

    I also learned about the command below, which is worth adding to this post, as it eliminates the need for some of the macros discussed here.  (please note that there is no space between "/a:" & "-D")  

    dir *.* /a:-D /b /s /n > Listing_results.txt 

    Thanks

    Jay


    Jay Tabatabai