Go through subfolders, select matching files and copy a worksheet from each into a master file

Rainmanne

Board Regular
Joined
Mar 10, 2016
Messages
130
Office Version
  1. 2013
Platform
  1. Windows
I've got a big folder with weekly reports in a format "YYYY-MM.DD FILE NAME.xlsx". Each report has a worksheet named "DD report". There are 2.5 years of weekly report. I need to analyse them and I want to copy the "DD report" worksheet from each into one master file under the date name (so "DD report" from the example will become YYYYMMDD) . But it looks like quite an endeavour for me. So far I have managed to come up with a code which apparently goes through all subfolders and finds the matching files (but it does not work if I put "*FILE NAME.xlsx" as a pattern but looks like working with a different file like "MTM*.*". Now I am stuck as I have no idea how to basically open each file, copy the sheet and rename it. I can probably manage to copy and rename operations but I do not know how to start with opening each matching file. Any help please?

VBA Code:
Sub FindPatternMatchedFiles()

    Dim objFSO As Object
    Set objFSO = CreateObject("Scripting.FileSystemObject")

    Dim objRegExp As Object
    Set objRegExp = CreateObject("VBScript.RegExp")
    objRegExp.Pattern = "MTM*.*"
    objRegExp.IgnoreCase = True

    Dim colFiles As Collection
    Set colFiles = New Collection
    
    Dim folderName As String
    Dim folder As Integer
    
    'Set the folder name to a variable
        folder = Application.FileDialog(msoFileDialogFolderPicker).Show
            If folder <> 0 Then
                folderName = Application.FileDialog(msoFileDialogFolderPicker).SelectedItems(1)

            End If
    

    RecursiveFileSearch folderName, objRegExp, colFiles, objFSO

    For Each f In colFiles
        Debug.Print (f)
        'Insert code here to do something with the matched files
    Next

    'Garbage Collection
    Set objFSO = Nothing
    Set objRegExp = Nothing

End Sub

Sub RecursiveFileSearch(ByVal targetFolder As String, ByRef objRegExp As Object, _
                ByRef matchedFiles As Collection, ByRef objFSO As Object)

    Dim objFolder As Object
    Dim objFile As Object
    Dim objSubFolders As Object

    'Get the folder object associated with the target directory
    Set objFolder = objFSO.GetFolder(targetFolder)
    
    MsgBox targetFolder

    'Loop through the files current folder
    For Each objFile In objFolder.Files
        If objRegExp.test(objFile) Then
            matchedFiles.Add (objFile)
        End If
    Next

    'Loop through the each of the sub folders recursively
    Set objSubFolders = objFolder.SubFolders
    For Each objSubfolder In objSubFolders
        RecursiveFileSearch objSubfolder, objRegExp, matchedFiles, objFSO
    Next

    'Garbage Collection
    Set objFolder = Nothing
    Set objFile = Nothing
    Set objSubFolders = Nothing

End Sub
 
Also from one point the columns shift (an extra column is added). I wonder if it's possible to realign them in the same query or I will have to do two queries for each dataset.
Would be a little bit more transformations, but in second way, I can get exact columns which I want - based on column headers.
Not big deal, to identify column (for example) Sales, even in one of file Sales is on 2nd column, in other - on 6th.
 
Upvote 0

Excel Facts

Excel Joke
Why can't spreadsheets drive cars? They crash too often!
Also, when I go to Get Data and select From Folder, what should I do next: Combine, Load or Transform Data? I have selected Transform Data but now I am not sure if it was correct.
If you want to transform more your data, need to use Transform Data, this is a middle step before combining.
If you want to combine data and continue with transformations after combining, need to use Combine.
Depends of data, your plans for transformations and also your experience in Power Query and M Language.

As I wrote you I prefer to use more efficient way from link which I sent.
 
Upvote 0
If you want to transform more your data, need to use Transform Data, this is a middle step before combining.
If you want to combine data and continue with transformations after combining, need to use Combine.
Depends of data, your plans for transformations and also your experience in Power Query and M Language.

As I wrote you I prefer to use more efficient way from link which I sent.
Thanks a lot. But I still struggling with it. Probably I need more background to understand the stuff in the link as it refers to some other posts. Basically, this is a design of all source files. It's a summary worksheet in a workbook.

DATA REPORT
31/08/2023​
31/08/2023​
I.TypeGross ValueShareValueAdj.Value
a.Type 110100%1010
b.Type 22525%6.256.25
c.Type 32335%8.058.05
d.Type 41947%8.938.93
e.Type 5010%00
f.Type 6067%00
Total
77​
33.23​
33.23​
Limit50
50​
Item 1
-5​
100%​
-5​
Item 2
-12​
100%​
-12​
Item 3
-7​
100%​
-7​
Total Items
-24​
-24​
Balance
9.23​
9.23​

I need to transform it into something like that:

Type
31/08/2023​
07/09/2023​
14/09/2023​
21/09/2023​
28/09/2023​
05/10/2023​
etc.
Type 1
Type 2
Type 3
Type 4
Type 5
Type 6


Where each column contains data from each file from the second Value column and the header is a date. While the worksheets are identical in all files, for some reason columns from some files are shifted.

The task looks very simple but I am still struggling.
 
Upvote 0
OK, I have figured out how to do what I need though not in the most efficient way. I did not realise that I need to use Pivot Table to arrange the data. SO I have removed everything I did not need and then used pivot to create a summary woorksheet.

My only issue is with the fact that the layout of the reports was slightly changing over time. The items remained the same but at some point new rows were added (which I do not need in the summary anyway) and then a new column (also not required for my summary). So I had to do the whole thing in batches querying reports with the same layout in each batch. And then just copying and pasting from them into one master spreadsheet. I wonder if it's possible to deal with the changes in the layout so I could do the whole 2.5 years in one go.
 
Upvote 0
My only issue is with the fact that the layout of the reports was slightly changing over time. The items remained the same but at some point new rows were added (which I do not need in the summary anyway) and then a new column (also not required for my summary). So I had to do the whole thing in batches querying reports with the same layout in each batch. And then just copying and pasting from them into one master spreadsheet. I wonder if it's possible to deal with the changes in the layout so I could do the whole 2.5 years in one go.
You don't need to do that. It's possible to transform data and to load it at once - with one query. Any changes can be handled - additional column/s, re-arranged column/s, additional row/s, etc. Depends of transformations, query can be more or less complex.
I was played with your sample table, inserted column, and data table for pivoting looks fine.

I'm not sure how to send you source code and explanation about it, but it's possible to be done.
 
Upvote 0
You don't need to do that. It's possible to transform data and to load it at once - with one query. Any changes can be handled - additional column/s, re-arranged column/s, additional row/s, etc. Depends of transformations, query can be more or less complex.
I was played with your sample table, inserted column, and data table for pivoting looks fine.

I'm not sure how to send you source code and explanation about it, but it's possible to be done.
Thanks a lot so much! I am not sure either regarding sending a workbook. The issue is more like I could not open it on my work laptop due to security restrictions and I do not use Windows on my private laptop. I wonder if the source code could be just extracted and posted here together with explanations?
 
Upvote 0
For your VBA code, trying replacing the following...

VBA Code:
    For Each f In colFiles
        Debug.Print (f)
        'Insert code here to do something with the matched files
    Next

with

VBA Code:
    '
    '
    '

    Dim destinationWorkbook As Workbook
    Dim destinationWorksheet As Worksheet
    Dim currentWorkbook As Workbook
    Dim currentRange As Range
    Dim NextRow As Long
  
    Set destinationWorkbook = Workbooks.Add(xlWBATWorksheet)
    Set destinationWorksheet = destinationWorkbook.Worksheets(1)
  
    NextRow = 2
    For Each f In colFiles
        Debug.Print (f)
        'Insert code here to do something with the matched files
        Set currentWorkbook = Workbooks.Open(Filename:=f, UpdateLinks:=False, ReadOnly:=True) 'set UpdateLinks to True if you're workbooks can contain links, and you want them updated
        Set currentRange = currentWorkbook.Worksheets(1).UsedRange
        currentRange.Copy destinationWorksheet.Cells(NextRow, "A")
        NextRow = NextRow + currentRange.Rows.Count
        currentWorkbook.Close SaveChanges:=False
    Next
  
    'etc
    '
    '

Also, if you only want the values copied to worksheet without the formatting, etc, try replacing...

VBA Code:
        currentRange.Copy destinationWorksheet.Cells(NextRow, "A")

with

VBA Code:
        With currentRange
            destinationWorksheet.Cells(NextRow, "A").Resize(.Rows.Count, .Columns.Count).Value = .Value
        End With

Hope this helps!
Thanks a lot for your help. I did not use it as Power Query has turned out to be an easier root in that case. But I might use it in future if I will need to automate the process.
 
Upvote 0
Thanks a lot so much! I am not sure either regarding sending a workbook. The issue is more like I could not open it on my work laptop due to security restrictions and I do not use Windows on my private laptop. I wonder if the source code could be just extracted and posted here together with explanations?
OK, will try. :)
At first, I prepared table with settings:
1716557442431.png

Used this table as way to change it in case that you can use excel files.

Prepared queries are:
1716557609877.png


Query Settings just load settings table to use as parameters.
Power Query:
let
    Source = Excel.CurrentWorkbook(){[Name="Settings"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Parameter", type text}, {"Value", type text}})
in
    #"Changed Type"

Query DataFolder make transformations and loading data:
Power Query:
let
    Path = Settings{[Parameter = "Path to Data"]}[Value], // Loading defined path with data files
    FileName = Settings{[Parameter = "String in FileName"]}[Value], // Get pattern from Setings table for filtering data files
    SheetName = Settings{[Parameter = "Sheet to load"]}[Value], // Get sheet name from Settings table
    ColumnsToLoad = Text.Split(Settings{[Parameter = "ColumnsToLoad"]}[Value], "; "), // Separate name of columns from Settings table
    ColumnsToFilter = Table.Buffer(Table.FromList(ColumnsToLoad)), // Convert to table with 1 column for filter
    Source = Folder.Files(Path), // Loading file list from folder with data
    #"Keep DataFiles" = Table.SelectRows(Source, each Text.Contains([Name], FileName) and not Text.Contains([Name], "~")), // Filter just files with data (variable FileName)
    LoadingData = Table.TransformColumns(#"Keep DataFiles", {"Content", each Excel.Workbook(Binary.Buffer(_)){[Name = SheetName]}[Data]}), // Loading information from files, just sheet with name as variable SheetName defined in Settings table
    #"Extract Date" = Table.AddColumn(LoadingData, "Date", each #date(Value.FromText(Text.Start([Name], 4)), Value.FromText(Text.Middle([Name], 8, 2)), Value.FromText(Text.Middle([Name], 5, 2))), type date), // Extract date from file name. In case that not possible it's possible to extract it from data in the file
    #"Removed Other Columns" = Table.SelectColumns(#"Extract Date",{"Content", "Date"}), // Keep just colums with important data
    #"Added Custom" = Table.AddColumn(#"Removed Other Columns", "Helper", each Table.UnpivotOtherColumns([Content], {}, "Columns", "Data")), // Create additional helper column with unpivoted data for each data file
    #"Filter Data Columns" = Table.TransformColumns(#"Added Custom", {"Helper", each Table.NestedJoin(_, {"Data"}, ColumnsToFilter, {"Column1"}, "Custom1", JoinKind.RightOuter)[Columns]}), // This step filter just columns defined as data columns (variable ColumnsToFilter) in Settings table. This list will be used in next step as filter for loading information.
    #"Keep Data Columns" = Table.AddColumn(#"Filter Data Columns", "Data", each Table.SelectColumns([Content], [Helper])), // Keep just columns with data (list from previous step)
// Next transformations are just in added column Data, which contains data from each file. All transformation bello are to extract pure data.
    #"Pure Data" = Table.TransformColumns(#"Keep Data Columns", {"Data", each Table.SelectRows(_, (s) => s[Column1] <> null and s[Column2] <> null)}), // Filter data removing blank rows or useless. In case that real data is comlex, need to find suitable filter setting
    #"Keep DataColumns" = Table.TransformColumns(#"Pure Data", {"Data", each Table.SelectColumns(_, List.RemoveFirstN(Table.ColumnNames(_), 1))}), // Remove first column, with a., b., etc.
    #"Add Headers" = Table.TransformColumns(#"Keep DataColumns", {"Data", each Table.PromoteHeaders(_, [PromoteAllScalars=true])}), // Promoting headers for table with information from each file.
    #"Added Date in DataTable" = Table.AddColumn(#"Add Headers", "ForLoading", each Table.AddColumn([Data], "Date", (s) => _[Date], type date)), // Copy date inside data table. It was done for combining all data faster.
    #"KeepColumn with Data" = Table.SelectColumns(#"Added Date in DataTable",{"ForLoading"}), // Keep just column with data
    #"Loading All Data" = Table.Combine(#"KeepColumn with Data"[ForLoading]), // Loading data from all files in one table
    #"Changed Type" = Table.TransformColumnTypes(Custom1,{{"Type", type text}, {"Gross Value", type number}, {"Share", type number}, {"Value", type number}, {"Value_1", type number}}) // Fixing data types for each column.
in
    #"Changed Type"

In case that you prefer more standard (and a little bit slower way). You can use generated from Excel queries:
1. StandardLoading. This query in most cases is just for loading data.
Power Query:
let
    Path = Settings{[Parameter = "Path to Data"]}[Value],
    Source = Folder.Files(Path),
    #"Filtered Hidden Files1" = Table.SelectRows(Source, each [Attributes]?[Hidden]? <> true),
    #"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1", "Transform File", each #"Transform File"([Content])),
    #"Renamed Columns1" = Table.RenameColumns(#"Invoke Custom Function1", {"Name", "Source.Name"}),
    #"Removed Other Columns1" = Table.SelectColumns(#"Renamed Columns1", {"Source.Name", "Transform File"}),
    #"Expanded Table Column1" = Table.ExpandTableColumn(#"Removed Other Columns1", "Transform File", Table.ColumnNames(#"Transform File"(#"Sample File"))),
    #"Extract Date" = Table.AddColumn(#"Expanded Table Column1", "Date", each #date(Value.FromText(Text.Start([Source.Name], 4)), Value.FromText(Text.Middle([Source.Name], 8, 2)), Value.FromText(Text.Middle([Source.Name], 5, 2))), type date),
    #"Changed Type" = Table.TransformColumnTypes(#"Extract Date",{{"Type", type text}, {"Gross Value", type number}, {"Share", type number}, {"Value", type number}, {"Value_1", type number}})
in
    #"Changed Type"
2. Transform Sample File. This query transform each file. In case that you prefer to change transformations, need to focus on it and make all transformations which you want. This transformations will be used for each file.
Power Query:
let
    Source = Excel.Workbook(Parameter1, null, true),
    DataSheet = Source{[Item="Sheet1",Kind="Sheet"]}[Data],
    ColumnsToLoad = Text.Split(Settings{[Parameter = "ColumnsToLoad"]}[Value], "; "),
    ColumnsToFilter = Table.Buffer(Table.FromList(ColumnsToLoad)),
    #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(DataSheet, {}, "Columns", "Value"),
    #"Data Columns" = Table.NestedJoin(#"Unpivoted Other Columns", {"Value"}, ColumnsToFilter, {"Column1"}, "Filter", JoinKind.RightOuter)[Columns],
    #"Keep Data Columns" = Table.SelectColumns(#"DataSheet", #"Data Columns"),
    #"Filtered Rows" = Table.SelectRows(#"Keep Data Columns", each ([Column1] <> null) and ([Column2] <> null)),
    #"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",List.RemoveFirstN(Table.ColumnNames(#"Filtered Rows"), 1)),
    #"Promoted Headers1" = Table.PromoteHeaders(#"Removed Other Columns", [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers1",{{"Type", type text}, {"Gross Value", type number}, {"Share", type number}, {"Value", type number}, {"Value_1", type number}})
in
    #"Changed Type"
Most of steps in this query are the same as steps from query DataFolder.
3. Function query Transform File. This function is the same query as Transform Sample File. Prepared as function can be called in another query (in this case in StandardLoading).

Think that not missed anything, hope that would be useful for you and your task :)
 
Upvote 0

Forum statistics

Threads
1,217,367
Messages
6,136,136
Members
449,994
Latest member
Rocky Mountain High

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top