locked
Word macro to create an index of all words on large document RRS feed

  • Question

  • I need help in creating a macro, that will run on large documents, that will create a index of all the words in the document with their page numbers, in alphabetical order.  I have a start of one, but it takes a long time to run and in fact seems to crash word.  I would also like the ability to notify the user that it is running so they don't panic and think that it has stopped or crash.  Here is what I have so far:

    Dim colWords as New Collection
    'add words you don't want to index
    colWords
    .Add "and"
    colWords
    .Add "you"

    Dim wrd As Range
    For Each wrd In ActiveDocument.Words

     
    'only if we have 3 chars we index
     
    If Len(Trim(wrd.Text)) > 2 Then

        
    ' prevent the field from being Indexed as well...
        
    Dim infield As Boolean
         infield
    = False
        
    Dim fld As Field
        
    For Each fld In ActiveDocument.Fields
          
    If (wrd.Start >= fld.Code.Start And wrd.End <= fld.Code.End) Then
             infield
    = True
            
    Exit For 'break out
          
    End If
        
    Next

        
    If (Not infield) Then
           
    ' check if we already indexed?
           
    Dim findWord as String
            findWord
    = LCASE(wrd.Text)
           
    For Each cached in colWords
               
    if cached = findWord Then
                   infield
    = True
                  
    Exit For 'break out
               
    end If
           
    Next
           
    If  (Not infield) Then
               ActiveDocument
    .Indexes.MarkAllEntries Range:=wrd, Entry:=wrd.Text, _
                 EntryAutoText
    :=wrd.Text, CrossReference:="", CrossReferenceAutoText:="", _
                 BookmarkName
    :="", Bold:=False, Italic:=False

               colWords
    .Add findWord

            
    End If
        
    End If
      
    End If
    Next

    Saturday, July 27, 2013 4:20 AM

All replies

  • I modified your routine some and it appears to have sped up the process. Because the routine was adding indexed words to the document, your initial "for each" loop was having to rebuild the collection each time it looped and each time the collection was larger so it slowed down. I made the document's "words" a separate collection and looped thru it versus the actual document.

    The other changes involved changing the other two "for each" loops to an indexed retrieval method because I've found this to be faster when the collection contents change on each iteration.

    Regarding your question concerning alerting the user... if you could run the routine from a UserForm, then you could put a counter and message on the form that refreshed with each iteration. For example "Indexing n of 2000 words." Hope that make sense.

    Here is the code I modified. Try it and see if it is faster than before on your test files.

    Sub IndexWords()
        Dim cached As Variant
        Dim colWords As New Collection
        'add words you don't want to index
        colWords.Add "and"
        colWords.Add "you"
        
        Dim wrds As Word.Words
        
        Dim wrd As Range
        Dim f, c As Long
        Set wrds = ActiveDocument.Words
        
        For Each wrd In wrds
          'only if we have 3 chars we index
          If Len(Trim(wrd.Text)) > 2 Then
        
             ' prevent the field from being Indexed as well...
             Dim infield As Boolean
             infield = False
             Dim fld As Field
             For f = 1 To ActiveDocument.Fields.Count
                Set fld = ActiveDocument.Fields(f)
               If (wrd.Start >= fld.Code.Start And wrd.End <= fld.Code.End) Then
                 infield = True
                 Exit For 'break out
               End If
             Next
        
             If (Not infield) Then
                ' check if we already indexed?
                Dim findWord As String
                findWord = LCase(wrd.Text)
                For c = 1 To colWords.Count
                    cached = colWords(c)
                    If cached = findWord Then
                       infield = True
                       Exit For 'break out
                    End If
                Next
                If (Not infield) Then
                   ActiveDocument.Indexes.MarkAllEntries Range:=wrd, Entry:=wrd.Text, _
                     EntryAutoText:=wrd.Text, CrossReference:="", CrossReferenceAutoText:="", _
                     BookmarkName:="", Bold:=False, Italic:=False
        
                   colWords.Add findWord
        
                 End If
             End If
           End If
           
        Next
        
    End Sub


    Kind Regards, Rich ... http://greatcirclelearning.com


    Saturday, July 27, 2013 2:16 PM
  • Hi,

    Just check to see the code of Rich, I think it is what you want.


    Jaynet Zhang
    TechNet Community Support

    Monday, July 29, 2013 6:17 AM
  • On a document with 68 pages and 6,800 + words, it causes Word to be unresponsive.  I was wondering if I made the excluded word list longer and only process selected text it would work better.  If you think that would improve performance, could you guide me on who to do that?
    Monday, July 29, 2013 8:13 PM