Directory & file search

PaddyD

MrExcel MVP
Joined
May 1, 2002
Messages
14,234
VBA’s not really my thing, so here’s hoping…</SPAN>

What we have:
Fairly simple file structure (2 / 3 levels, with one file per day at the root of each tree)
Files are either xls or txt
Files contain daily banking records, including a unique ID
Location of unique ID is fixed (column D or whatever)
Max records / file / day is about 20,000. Usually only files from last 12 months are relevant

Issue:
Team needs to be able to search for a unique ID (usually the most recent), identify the file it's in, go to file and get some other info from the corresponding row. At the moment, they are searching each file manually until they get a hit.

What I thought would work:

Run some code (daily) to list out something of the form:

unique ID | File reference | row reference

...in a worksheet somewhere so they could search a single xl file for the matches, then maybe link of some sort to open the file they want. Probably on big initial run to populate enough history, then code would need to be updatable such that it could be run each night and have the results appended to the rest of the list / not need to interrogate the entire file lists each time.

</SPAN>Obviously, any better ideas most welcome.

Details:
</SPAN>
For xls files:</SPAN>
Directory structure is …\YYYY\MM MMM YY\DDMMYYYYCL.xls</SPAN>
e.g. …\2014\05 MAY 2014\01052014CL.xls</SPAN>
Unique ID is in col N
</SPAN>
For txt files:</SPAN>
Directory structure is …\YYYY\MM MMM\V8 Cards_NZAP_YYYYMMDD.txt</SPAN>
e.g. …\2014\05 May\ V8 Cards_NZAP_20140501.txt</SPAN>
Unique ID is not in a consistent character position, but is always a number string located between the first instance of BD, DC, or AP and the next space.
</SPAN>
As ever, all help gratefully received :)</SPAN></SPAN></SPAN>
 
Thank you, Paddy

And is every row of the text file one of BP, DC or AP? Or might there be some rows which are different (and presumably not wanted in the results)?
 
Upvote 0

Excel Facts

Can Excel fill bagel flavors?
You can teach Excel a new custom list. Type the list in cells, File, Options, Advanced, Edit Custom Lists, Import, OK
Hi, Paddy

I had some time this morning to do some coding. Please see how you go with this.

Macros need to be enabled. In Excel that might be via menu ALT-T-M-S

Please open a new Excel file & save it in the sub-directory just above the "\YYYY\" level. If not in this location it won't work. (It was easier for me to assume that location for coding.) Be sure you save it with a file type that allows macros. So Excel 2003 or *.xlsm

From the Excel file ALT-F11 to the visual basic editor. From the left hand side you should see a list of files (VBAProjects). For the one corrresponding to your file, double click on the "ThisWorkbook" image and enter the short section of code in the big white window. The code is to fully replace anything that is already there. (You might see Option Explicit). Then on the LHS near the "ThisWorkbook" icon right click & choose "insert" and "module". I think unnecessary cause it activates anyway but double click on the new module icon. Copy & paste the larger code from below to fully replace anything in the big white window. Now from the VB editor menu choose "debug" and if available "Compile VBA project". This should be the end of using the VB editor. Back to Excel, to save & close the file. You might even fully close Excel now & re-start it - don't need to but it might help performance a fraction.

When the file is opened code should auto-run and it should create an mdb file in the same path (sub-directory) and trawl through the underlying sub-directories for any expected data. (BTW, searching for the sub-directories is case sensitive & I took it that all text is upper case. If not, data won't be found. ) This should take a little while the first time only. Then a new worksheet should be created with a yellowed cell. Enter the code of interest in the yellow cell & see information on the relevant data files below it.

Subsequent openings of the Excel file should capture any newer data to the mdb file & be available to the query results. Also on subsequent file openings if the mdb file is not there it will be created from scratch, and if a query is not found one will be created from scratch.

Hoping it works. I didn't put error handling in the code... :)

regards


This code goes in the "ThisWorkbook" module
Code:
Option Explicit


Private Sub Workbook_Open()


    Const str_MDB_FILE_NAME As String = "Accounts.mdb"        'name of mdb file to store data


    Call SetupFileAndDatabase(DB_Name:=str_MDB_FILE_NAME)
    Call SetupParamaterisedQuery(DB_Name:=str_MDB_FILE_NAME)


End Sub

This code goes in a code module. From VBE: Insert, Module
Code:
Option Explicit
Option Private Module


'http://www.mrexcel.com/forum/excel-questions/780478-directory-file-search.html


'ASSUMPTION - FILE TO BE LOCATED IN THE SUB-DIRECTORY JUST ABOVE THE DATA FILES FOR EACH YEAR


Sub SetupFileAndDatabase(ByVal DB_Name As String)


    Const str_UPLOADER_WKS_NAME As String = "my_data_uploader_wks"        'name of hidden worksheet for transferring between recordset & database


    Application.ScreenUpdating = False
    '==================================================================================
    'Initialise.
    'If no uploader worksheet, then create one
    If Not WksExists(WksName:=str_UPLOADER_WKS_NAME) Then Call SetupNewWorksheet(WksName:=str_UPLOADER_WKS_NAME)
    'If no mdb file in the same path as this Excel file, then create one
    If Not FileExists(FileName:=ThisWorkbook.Path & Application.PathSeparator & DB_Name) Then Call RunsOnceOnlyToCreateDatabaseFile(DB_Name:=DB_Name)
    '==================================================================================
    Call UpdateDatabase(DB_Full_Name:=ThisWorkbook.Path & Application.PathSeparator & DB_Name, wksUploader:=Worksheets(str_UPLOADER_WKS_NAME))
End Sub


Private Function WksExists(ByVal WksName As String) As Boolean
    'based on Walkenbach "Excel 2000 Power Programming with VBA" page 299
    Dim x As Object


    On Error Resume Next
    Set x = ThisWorkbook.Worksheets(WksName)
    WksExists = Err = 0
End Function


Function FileExists(ByVal FileName As String) As Boolean
    FileExists = Len(Dir(FileName)) > 0
End Function


Private Sub SetupNewWorksheet(ByVal WksName As String)


    Dim wksNew As Excel.Worksheet


    Set wksNew = ThisWorkbook.Worksheets.Add
    With wksNew
        .Name = WksName
        .Visible = xlSheetVeryHidden
    End With
End Sub


Private Sub RunsOnceOnlyToCreateDatabaseFile(ByVal DB_Name As String)
    'Creates database file in same directory as this Excel file.
    'Has a single table : tHistory
    'Containing fields : DataDate, FileType, AccountID, RecordCount, LineNumber, FileReference


    Dim str_DB_File_Path As String
    Dim strSQL As String
    Dim strConn As String


    Dim objRS As Object
    Dim obj_ADOX_Catalog As Object


    str_DB_File_Path = ThisWorkbook.Path
    strSQL = "CREATE TABLE tHistory ([DataDate] Date, [FileType] Text(255), [AccountID] Text(30), [RecordCount] Long, [LineNumber] Long, [FileReference] Text(255))"
    strConn = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & str_DB_File_Path & Application.PathSeparator & DB_Name


    Set obj_ADOX_Catalog = CreateObject("ADOX.Catalog")
    obj_ADOX_Catalog.Create strConn


    Set objRS = CreateObject("ADODB.Recordset")
    objRS.Open strSQL, strConn


    Set objRS = Nothing
    Set obj_ADOX_Catalog = Nothing


End Sub


Private Sub UpdateDatabase(ByVal DB_Full_Name As String, ByRef wksUploader As Excel.Worksheet)


    Const lng_OLDEST_DATA_AGE As Long = 365        'Oldest data age in days. (Use a positive value.)
    'Data older than this will be deleted from the database. The database will be updated with new data:
    'new data will be sought that is both newer than this data age and also newer than any existing data.
    '
    '(Note, if this oldest data age changes to a larger value then best to delete the Accounts.mdb file & have this file create a totally new one.
    'This is because the coding below only seeks data newer than the previous newest data: it doesn't look for older data which is what
    'would be required if the oldest data age increases. Changing to a smaller value should be no problem at all.)


    Const str_XL_DATA_WORKSHEET_NAME As String = "Other Party Details Report"
    Const str_XL_DATA_ACCOUNTID_HEADER As String = "Other Party Account"
    Const lng_XL_DATA_ACCOUNTID_HEADER_ROW As Long = 15
    Const str_XL_DATA_ACCOUNTID_HEADER_COLUMN As String = "N"


    Dim i As Long, j As Long, k As Long, m As Long
    Dim dteLatestDataDateXLS As Date
    Dim dteLatestDataDateTXT As Date
    Dim strConn As String
    Dim strNewDataFileName As String
    Dim strDataLine As String
    Dim strUniqueCode As String
    Dim strSQL As String
    Dim ar As Variant
    Dim arCodeIdentifiers As Variant
    Dim x As Variant


    Dim objConn As Object
    Dim objRS As Object
    Dim objRS_newdata As Object


    arCodeIdentifiers = Array("AP", "BP", "DC")


    Application.StatusBar = "Please wait. Updating database ......."
    'Find the latest data that is in the database, for each file type
    'By querying database table tHistory for MAX(DataDate) by FileType. (FileType is either "xls" or "txt")
    Set objConn = CreateObject("ADODB.Connection")
    strConn = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & DB_Full_Name
    objConn.Open strConn
    Set objRS = CreateObject("ADODB.Recordset")
    'Prior to querying these maxima, delete any data that is very old - just as housekeeping for database.
    objConn.Execute "DELETE * FROM tHistory WHERE [DataDate] < " & CLng(Date - lng_OLDEST_DATA_AGE)
    'Load latest data dates to recordset
    objRS.Open "SELECT MAX(DataDate) AS [DataDate], FileType FROM tHistory GROUP BY FileType", objConn
    'Initialise latest date dates.
    dteLatestDataDateXLS = Date - lng_OLDEST_DATA_AGE
    dteLatestDataDateTXT = Date - lng_OLDEST_DATA_AGE
    'Loop through recordset and update latest data dates.
    Do While Not objRS.EOF
        If objRS.Fields(1).Value = "xls" Then dteLatestDataDateXLS = objRS.Fields(0).Value
        If objRS.Fields(1).Value = "txt" Then dteLatestDataDateTXT = objRS.Fields(0).Value
        objRS.movenext
    Loop
    objRS.Close


    'Loop through from one day after these latest data dates until yesterday, seeking new data & adding it to the database.
    '==================================================================================
    'First the xls files
    'SQL to return two fields from Excel file : AccountID, Count of AccountID
    strSQL = Join$(Array("SELECT [", str_XL_DATA_ACCOUNTID_HEADER, "], COUNT(*) FROM [", str_XL_DATA_WORKSHEET_NAME, "$", _
            str_XL_DATA_ACCOUNTID_HEADER_COLUMN, lng_XL_DATA_ACCOUNTID_HEADER_ROW, ":", str_XL_DATA_ACCOUNTID_HEADER_COLUMN, _
            "65536] WHERE [", str_XL_DATA_ACCOUNTID_HEADER, "] Is Not Null GROUP BY [", str_XL_DATA_ACCOUNTID_HEADER, "]"), vbNullString)


    For i = dteLatestDataDateXLS + 1 To Date - 1
        'IMPORTANT: data file name is case sensitive. Per specification, month name is upper case. If it isn't, data won't be found & the next line will need amendment.
        strNewDataFileName = ThisWorkbook.Path & "\" & Format$(i, "YYYY") & "\" & UCase(Format$(i, "MM MMM YY")) & "\" & Format$(i, "DDMMYYYY") & "CL.xls"
        If FileExists(strNewDataFileName) Then        'want to get new data from file strNewDataFileName
            Set objRS_newdata = CreateObject("ADODB.Recordset")
            objRS_newdata.Open strSQL, Join$(Array("Driver={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)};DBQ=", strNewDataFileName, ";"), vbNullString)        'loads xls data to recordset
            '==================================================================================
            'Method used to transfer recordset to database is via worksheet & array. Could eliminate worksheet step by useing GetRows to take recordset straight to array..
            wksUploader.Cells.Clear        'Initialise worksheet for transfer of data from recordset to worksheet
            wksUploader.Range("A1").CopyFromRecordset objRS_newdata        'Populate transfer worksheet with recordset data. Data in row 1: no headers to worksheet.
            Set objRS_newdata = Nothing


            If Len(wksUploader.Range("A1").Value) > 0 Then        'If there is any data,
                With wksUploader.Range("A1").CurrentRegion
                    ReDim ar(1 To .Rows.Count, 1 To .Columns.Count)
                    ar = .Value        'Load data to array
                End With
                'Chosen to loop through array and transfer one line at a time to database.
                objRS.Open "tHistory", objConn, 1, 3, 2
                For j = LBound(ar, 1) To UBound(ar, 1)        'Data from row 1: no headers.
                    With objRS
                        .AddNew
                        .Fields("DataDate") = i
                        .Fields("FileType") = "xls"
                        .Fields("AccountID") = ar(j, 1)        'Was first field of recordset
                        .Fields("RecordCount") = ar(j, 2)        'Was second field of recordset
                        .Fields("FileReference") = strNewDataFileName
                        .Update
                    End With
                Next j        'At end of loop, data from xls file strNewDataFileName has been transferred to Accounts.mdb database table tHistory
                Erase ar
            End If
            '==================================================================================
        End If
    Next i        'At end of loop, data from all xls files newer than previous newest data has been transferred to Accounts.mdb database table tHistory
    wksUploader.Cells.Clear
    '==================================================================================


    '==================================================================================
    'now the txt files
    'based on Walkenbach "Excel 2000 Power Programming with VBA" page 716 on


    ReDim ar(1 To 20000, 1 To 6)        '20,000 maximum rows per file per specification: 6 to match the 6 fields of tHistory database table
    j = 1        'j is the counter for the row being addressed in ar
    ar(j, 1) = "DataDate"
    ar(j, 2) = "FileType"
    ar(j, 3) = "AccountID"
    ar(j, 4) = "RecordCount"
    ar(j, 5) = "LineNumber"
    ar(j, 6) = "FileReference"
    For i = dteLatestDataDateTXT + 1 To Date - 1
        'IMPORTANT: data file name is case sensitive. Per specification, month name is upper case. If it isn't, data won't be found & the next line will need amendment.
        strNewDataFileName = ThisWorkbook.Path & "\" & Format$(i, "YYYY") & "\" & UCase(Format$(i, "MM MMM YY")) & "\V8 Cards_NZAP_" & Format$(i, "YYYYMMDD") & ".txt"
        If FileExists(strNewDataFileName) Then
            Open strNewDataFileName For Input As #1
            k = 0        'k is the counter for the row of the txt file
            Do While Not EOF(1)
                Line Input #1, strDataLine
                k = k + 1
                '==============================
                'extract unique ID
                For m = LBound(arCodeIdentifiers) To UBound(arCodeIdentifiers)
                    strDataLine = Replace$(strDataLine, arCodeIdentifiers(m), "||")
                Next m
                strUniqueCode = vbNullString
                If InStr(strDataLine, "||") Then
                    x = Split(strDataLine, "||")
                    x = Split(x(1), " ")
                    strUniqueCode = x(0)
                End If
                '==============================


                If Len(strUniqueCode) > 0 Then        'if this is data we want, add it to ar
                    j = j + 1
                    ar(j, 1) = i
                    ar(j, 2) = "txt"
                    ar(j, 3) = strUniqueCode
                    ar(j, 5) = k
                    ar(j, 6) = strNewDataFileName
                End If
            Loop
            Close #1
        End If
    Next i        'At end of loop, data from all txt files newer than previous newest data has been transferred to ar
    Set x = Nothing
    If j > 1 Then        'if there is any data, transfer to Accounts.mdb database table tHistory
        objRS.Open "tHistory", objConn, 1, 3, 2
        For i = 2 To j        're-use i, now counter for which row of ar is being transferred to tHistory. Start at 2 as row 1 is headers
            With objRS
                .AddNew
                For k = LBound(ar, 2) To UBound(ar, 2)        're-use k, now counter for which column of ar is being transferred to tHistory
                    .Fields(ar(1, k)) = ar(i, k)
                Next k
                .Update
            End With
        Next i
    End If
    Erase ar
    '==================================================================================


    Application.StatusBar = False


    Set objRS = Nothing
    objConn.Close: Set objConn = Nothing


End Sub


Sub SetupParamaterisedQuery(ByVal DB_Name As String)


    Dim blnAlreadyHasQuery As Boolean
    Dim strConn As String
    Dim strSQL As String


    Dim wks As Excel.Worksheet
    Dim qt As Excel.QueryTable
    Dim par As Excel.Parameter


    blnAlreadyHasQuery = False
    'see if there is already a query
    For Each wks In ThisWorkbook.Worksheets
        For Each qt In wks.QueryTables
            blnAlreadyHasQuery = True
        Next qt
    Next wks


    'create query if one wasn't found
    If Not blnAlreadyHasQuery Then


        Set wks = Worksheets.Add


        strConn = "ODBC;DSN=MS Access Database;DBQ=" & ThisWorkbook.Path & Application.PathSeparator & DB_Name
        strSQL = "SELECT DataDate, FileType, AccountID, RecordCount, LineNumber, FileReference FROM tHistory WHERE AccountID =? ORDER BY DataDate DESC"


        With wks.QueryTables.Add(Connection:=strConn, Destination:=wks.Range("A4"), Sql:=strSQL)
            Set par = .Parameters.Add("AccountID", xlParamTypeVarChar)
            par.SetParam xlRange, wks.Range("C2")
            par.RefreshOnChange = True
            .Refresh BackgroundQuery:=False
        End With
        With wks.Range("A4").Resize(, 6)
            .Interior.ColorIndex = 24
            .HorizontalAlignment = xlCenter
        End With
        wks.Range("C4").Copy Destination:=wks.Range("C1")
        wks.Range("C2").Interior.ColorIndex = 6
        wks.Range("D2").Value = "<== Enter account ID here"
        wks.Columns(1).NumberFormat = "d-mmm-yy"
        wks.Range("A5").Select
        ActiveWindow.FreezePanes = True
        wks.Range("C2").Select
    End If
    Set wks = Nothing
End Sub
 
Upvote 0
PS

The result is just a list of files & information.

For the txt files, this includes the line number in the source file.

For the xls files, there is no line number rather a count of how many times that AccountID is in the source file.

It would be a small step, I think, to add some further code that would on the double click or right click from the returned results, open the source file and even go to the data of interest. This would help users for sure.

Til later, lunchtime for me.
 
Upvote 0
Hi, Paddy

This morning I re-opened the file from last week and the auto-refresh query wasn't auto-refreshing. I don't know why: it seems like a bug. So I've amended the 'ThisWorkbook' code. New code replaces previous.

regards, Fazza

Code:
Option Explicit


Private Sub Workbook_Open()


    Const str_MDB_FILE_NAME As String = "Accounts.mdb"        'name of mdb file to store data


    Call SetupFileAndDatabase(DB_Name:=str_MDB_FILE_NAME)
    Call SetupParamaterisedQuery(DB_Name:=str_MDB_FILE_NAME)


    'Pre-existing auto-refresh query wasn't auto-refreshing. Seems like a bug. Next line tries to overcome.
    ThisWorkbook.RefreshAll


End Sub
 
Upvote 0
Hi Fazza,

Thanks for that! Will get to run some tests in next day or so so I'll let you know how i get on.
 
Upvote 0
Hi Fazza,

Just thought I'd let you know I've been off work ill, so haven't been ignoring the help you provided, just haven't had opportunity to test etc. Hope to be back next week, so I'll get a bunch of questions soon :)
 
Upvote 0
And now I've bust my hand so not much typing for a while. grrrr...will get to testing some time i promise :)
 
Upvote 0

Forum statistics

Threads
1,215,561
Messages
6,125,542
Members
449,236
Latest member
Afua

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top