What's the Fastest Way to Search Through Thousands of Text Files?

arskiracer

New Member
Joined
Apr 3, 2013
Messages
18
I have written two versions of an Excel 2010 macro (v1.6 and v2.0) that pull data from multiple directories. There about 60,000 total text files with about 20,000 of them being of interest to my macro. There are main directories that the macro searches, and all three of them have multiple sub-directories that need to be searched.

Initially, both macro versions recursively search each of the 3 directories and save the file names so that later, a separate function can mine data out of each file path that was found. Each file name has a serial number that the macro is trying to find. There are 60 total serial numbers.

Macro v1.6 takes one of the 60 serial numbers and then recursively searches all 3 directory locations using the following piece of code (but adapted for vba) which comes from the recursive search example here: HOW TO: Recursively Search Directories by Using FileSystemObject. It then repeats the search for each of the 59 remaining serial numbers (it recursively searches the 3 directories a total of 60 times).

Code:
[COLOR=#333333][FONT=Consolas]FileName = Dir(fso.BuildPath(fld.Path, sFile), vbNormal Or [/FONT][/COLOR][COLOR=#333333][FONT=Consolas]vbHidden Or vbSystem Or vbReadOnly)[/FONT][/COLOR]
   While Len(FileName) <> 0      FindFile = FindFile + FileLen(fso.BuildPath(fld.Path, FileName))      nFiles = nFiles + 1      List1.AddItem fso.BuildPath(fld.Path, FileName)  ' Load ListBox      FileName = Dir()  ' Get next file      DoEvents Wend[COLOR=#222222][FONT=Verdana]
Macro v2.0 recursively loads all file paths from each of the 3 directories into a blank worksheet and then performs a search on the file name of each file path that was just loaded into the worksheet to separate the 20,000 or so desired file paths for the later data mine function. It uses the Like comparison to compare each file name with all 60 serial numbers (all 60 serial numbers compared with the file name of each row in the worksheet - see the following piece of code).This is what imports all the file paths in v2.0:
Code:
[/FONT][/COLOR][COLOR=#222222][FONT=Verdana]Private Function FindFile(ByVal SearchDirectory As String, OutputSheet As String, OutputRow As Long, TotalDirectoriesFound As Long, TotalFileCount As Long) As Currency[/FONT][/COLOR]</pre>    Dim SubFld As Folder
    Dim FileName As String
    Dim File As File
    Dim TempPath() As String
    
    Set fld = fso.GetFolder(SearchDirectory)
    
    For Each File In fld.Files
        DoEvents                                    'allow other processes to execute
        TotalFileCount = TotalFileCount + 1
        TempPath = Split(File, "\")
        Worksheets(OutputSheet).Cells(OutputRow, 1).Value = TempPath(UBound(TempPath))
        Worksheets(OutputSheet).Cells(OutputRow, 2).Value = File
        OutputRow = OutputRow + 1
        Application.StatusBar = "Elapsed Time: " & Format(Now - StartTime, "hh:mm:ss") & " Searching Directories... " & TotalFileCount & " files found so far"
    Next File
    
    TotalDirectoriesFound = TotalDirectoriesFound + 1
    
    If fld.SubFolders.Count > 0 Then
        For Each SubFld In fld.SubFolders
            DoEvents
            FindFile SubFld.Path, OutputSheet, OutputRow, TotalDirectoriesFound, TotalFileCount
        Next
    End If
    
    Set SubFld = Nothing
    

[COLOR=#222222][FONT=Verdana]End Function[/FONT][/COLOR][COLOR=#222222][FONT=Verdana]
And this is what searches the imported file names/paths for the desired serial numbers:
Code:
[/FONT][/COLOR][COLOR=#222222][FONT=Verdana]OutputRow = 2[/FONT][/COLOR]</pre>    InputRow = 1
    Do
        For Count = 0 To UBound(SerialNumbers)
            DoEvents                    'allow other processes to execute
            SearchString = FileNameSearch1 & SerialNumbers(Count) & FileNameSearch2     'this is the file name to match during searching (FileNameSearch1 and 2 are set in the userform)
            If Worksheets(InputSheet).Cells(InputRow, InputFileNameColumn).Value Like SearchString Then
                Worksheets(OutputSheet).Cells(OutputRow, OutputFilePathColumn).Value = Worksheets(InputSheet).Cells(InputRow, InputFilePathColumn).Value    'store file path
                Worksheets(OutputSheet).Cells(OutputRow, OutputFileNameColumn).Value = Worksheets(InputSheet).Cells(InputRow, InputFileNameColumn).Value    'store file name
                Worksheets(OutputSheet).Cells(OutputRow, SNColumn).Value = SerialNumbers(Count)             'print the serial number to output sheet
                
                OutputRow = OutputRow + 1
            End If
        Next Count
        Application.StatusBar = "Elapsed Time: " & Format(Now - StartTime, "hh:mm:ss") & " Finding files... " & InputRow & " of " & TotalFileCount & " file names searched"
        InputRow = InputRow + 1

[COLOR=#222222][FONT=Verdana]    Loop Until Len(Worksheets(InputSheet).Cells(InputRow, 1).Value) = 0[/FONT][/COLOR][COLOR=#222222][FONT=Verdana]
I was expecting macro v2.0 to execute much faster than v1.6. My reasoning was that v1.6 had to go through to the network directories 60 times, while the v2.0 only went through the network directories once. After running the macros, I was surprised to find that v1.6 actually executes faster than v2.0. Does anyone know why this is true? Does the microsoft example somehow use the indexing system that's built into windows or something?Any tips about speeding up this macro are welcome as well. Although combining the searching and data mining would probably decrease the execution time, I would prefer to keep the file searching and the data mining in two separate functions because it makes it much easier to reuse my code for other things.Thanks,Adrian[/FONT][/COLOR]</pre>
 

Excel Facts

Format cells as time
Select range and press Ctrl+Shift+2 to format cells as time. (Shift 2 is the @ sign).

Forum statistics

Threads
1,214,593
Messages
6,120,434
Members
448,961
Latest member
nzskater

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top