Can I create a pivot table from two separate worksheets?

scottbass

New Member
Joined
Sep 3, 2012
Messages
46
Hi,

Say I have these sheets:

Claims:

CaseIdCustomerIDClaimNumClaimLineAmountApplicable *
1A11100Y
1A12100N
1A13100Y
1A21100N
1A22100Y
2B11200Y
2B21200Y
2B22200N

<tbody>
</tbody>

Payments:

CaseIdCustomerIDRemitterDateCategory1Category2Category3Total
1AJohn15-Apr-11$50$50
1AMary20-Apr-11$50$50
1ABob30-Apr-11$50$50$100
2BScott01-Jul-13$40$20$10$70

<tbody>
</tbody>

Money:

What I want in this sheet is a complex pivot table, or perhaps a complex VLOOKUP or array formula (???), with the data from these two separate worksheets. I'm not sure if I can do this in Excel or not? If I can do it in Excel, it would be cool as the end user gets immediate feedback/recalculations.

Here is what I need:

CaseIdCustomerNumberTotalAmountApplicableAmountAgreedAmount **WriteOffPaidAmountOutstandingAmount
1A$500$300$200$100$200$0
2B$600$400$200$200$70$130

<tbody>
</tbody>

Definitions:

TotalAmount: The total of all claim lines in the Claims worksheet for a given CaseId
ApplicableAmount: The total of all claim lines in Claims for a given CaseId where Applicable = "Y". * Note the Applicable column is hand coded based on an analyst's investigation.
AgreedAmount: The amount that is agreed is owed based on legal discussions. ** Note that this column is hand coded by the analyst based on those legal discussions.
WriteOff: The difference between the ApplicableAmount and the AgreedAmount.
PaidAmount: The total of all payments for a given CaseId, where the Total column is the sum of payments in three different categories.
OutstandingAmount: The difference between the AgreedAmount and the PaidAmount.

I think (???) I could get close to this with a Pivot table for an individual worksheet. But, I'm unsure if a Pivot table can be created to combine the data from these two worksheets as shown above?

If you KNOW this can't be done, can you let me know asap so I can start working on other alternatives? Which will be to export the Excel data to a database, run a summary program on the two separate tables, join the data, and re-export back into Excel. The downside is this will not give the end user immediate feedback, as this ETL process would run overnight. Also, the derivations would need to be done by the summary process, i.e. there could be no formulas in this worksheet, which would be a bummer.

Thanks for any help you can provide. Much appreciated.

Regards,
Scott
 
3) Or perhaps this is what you mean??? I'll restate to see if I understand:

A) Run a query against claims and payments, writing the results to a scratch worksheet.
B) Run a 2nd query against the scratch worksheet and the money worksheet itself, joining (merging) the agreed amount into the results by caseid.
C) Refresh the lot on the money worksheet.

Is that correct :)


I don't think so. Trying again, after you've looked at some of the above linked earlier threads.

Set up a query table [this is a specific object in an Excel worksheet. It can have formulas with it that automatically extend/reduce (in rows) to match the extent of rows returned] on the 'Money' worksheet. It can pull the data you need from the claims & payments sheets and have a field where users can enter agreed amounts. And formulas for the subsequent fields relying on the agreed amount. The order of the fields can be anything you want except for the first field has to be returned from the query if you want full functionality automatically. So you can have a field or two returned by the query and then a field or two with formulas or empty for users to enter values, and then more fields from the query if you want. Normally the queried fields would be contiguous on the LHS but they don't need to be contiguous. I'm describing a query table: maybe called an external data table (or query) - Excel help will refer. Different from if you use ADO via VBA as it won't have the same functionality. If using recordsets via VBA you will need to programmatically handle fields other than those returned by the query.


When it is time to refresh the query (table), do this via VBA using a two step process. First is to refresh a new (maybe hidden) worksheet that has CaseID, CustomerID & AgreedAmount feeding from this Money worksheet. Then an instant later refreshing the money worksheet - and the money worksheet grabs pre-existing agreed amounts from the just saved new worksheet as well as the claims & payments worksheets.

Any clearer?
 
Upvote 0

Excel Facts

Formula for Yesterday
Name Manager, New Name. Yesterday =TODAY()-1. OK. Then, use =YESTERDAY in any cell. Tomorrow could be =TODAY()+1.
First of all, thanks Fazza for all your help. I really appreciate it. Just knowing about "query tables" enabled me to Google more effectively.

In particular, these links really helped:

http://www.fontstuff.com/ebooks/free/fsadoconnectexcel.pdf
Creating Dynamic Reports with Query Tables in Excel

Using the 2nd link as a template, I've created the below workbook. Since I can't seem to attach files using this forum software, I've uploaded it to the below public Dropbox location:

https://dl.dropboxusercontent.com/u/10002401/Test_Query_Table.xlsb

If you download the worksheet, everything should be apparent:


  • There are three worksheets Claims, Payments, Money mimicking the data from my original post
  • The Query worksheet is the output from the SQL query
  • The Hidden worksheet (not yet hidden) contains the SQL query to execute. Cell A1 is the named range "SQL", used by the VBA code to get the query string. In this way, it's very easy to change the query.
  • Cntl-T is hotkeyed to the Test() macro for fast testing of the query string.

The problem is, everything works fine for this query:

Code:
select 
*
from
(select caseid, sum(amount) as totalamt from [Claims$] group by caseid) a
left join
(select caseid, sum(amount) as applicamt from [Claims$] where applicable='Y' group by caseid) b
on 
a.caseid=b.caseid
However, this query generates the error "Syntax error (missing operator) in query expression ...":

Code:
select 
*
from
(select caseid, sum(amount) as totalamt from [Claims$] group by caseid) a
left join
(select caseid, sum(amount) as applicamt from [Claims$] where applicable='Y' group by caseid) b
on 
a.caseid=b.caseid
left join
(select caseid, sum(agreedamount) as agreedamt from [Money$] group by caseid) c
on 
a.caseid=c.caseid


I've pulled my hair out all day on this. There's nothing bloody well wrong with the query, and the error message is not real helpful! ;)

At first I suspected string variable truncation at 256 characters (lots of Google hits), but I've walked through this in the debugger and the string is fine as it's passed to rstData.Source. And I conjecture that the truncation on the MsgBox text output is a limitation of Err.Description.

I've seen numerous hits on this issue, here is one example:

sql - Are multiple JOINs unavailable when using ADODB to query an Excel file in a VBA procedure? - Stack Overflow

Any ideas? And is the syntax for ADODB SQL queries documented anywhere remotely official, or do I just have to keep Googling?

Thanks,
Scott
 
Upvote 0
thanks, Scott

I will look at your latest post soon. Some quick comments before then.

From a glance at the first two links, the first one is not using query tables. It has ADO recordsets. The second does create a query table. In VBA it can be done easier than that though. For sure there will be old posts of mine that do that.

And, in my earlier posts I have been thinking - if not explicitly explaining - about using query tables created without VBA. It really is simple. They have their limitations without VBA but are a simple introduction. All you need to do to get started is ALT-D-D-N and follow the wizard. (I haven't checked but I think i wrote that near the start of the thread.) Did you try that?

[So I think I haven't explained adequately that there are query tables that can be used in worksheets, without VBA. They can do a lot of things. If you look in Excel help there should be info. Search for external data query.]

regards
 
Upvote 0
Excel 2003 works for me, not newer.

I don't have a reference for documentation/specifications on SQL. I do recall years ago reading of a 65,000+ (strangely I recall it was not 65,536) string length for the SQL. Likely other problems would be encountered before hitting that limit!

Maybe the LEFT JOIN is the problem. Maybe help in MS Query discusses this??

I recall a UNION limit of 25 or 26 tables that someone bumped against in an old thread. I coded something to work around that to get to ~650 tables, FWIW.
 
Upvote 0
And, in my earlier posts I have been thinking - if not explicitly explaining - about using query tables created without VBA. It really is simple. They have their limitations without VBA but are a simple introduction. All you need to do to get started is ALT-D-D-N and follow the wizard. (I haven't checked but I think i wrote that near the start of the thread.) Did you try that?
I investigated Data -> From Other Sources --> From Microsoft Query, and tried to build the query, but it was choking when I manually edited the SQL to build the multi-table left join.

I've searched the Excel help on "query table" and "querytable" (including the quotes), but didn't get any helpful hits.

I'll try Alt-D-D-N tomorrow.
 
Upvote 0
Excel 2003 works for me, not newer.

I don't have a reference for documentation/specifications on SQL. I do recall years ago reading of a 65,000+ (strangely I recall it was not 65,536) string length for the SQL. Likely other problems would be encountered before hitting that limit!

Maybe the LEFT JOIN is the problem. Maybe help in MS Query discusses this??

I recall a UNION limit of 25 or 26 tables that someone bumped against in an old thread. I coded something to work around that to get to ~650 tables, FWIW.

Hi Fazza,

I'm under a lot of time pressure on this project, so I've also cross-posted to Excel Help Forum to get the widest audience. (I wouldn't normally do this if not for the time pressure; if the same folks monitor both forums it can be irritating.)

Anyway, I've got the ADO/recordset approach working with advice on changing the query string. The thread is here, including both an Excel 2007 and Excel 2003 version of the worksheet (that forum software allows file attachments, which would be useful in this forum):

Are multiple (4) left joins supported by ADODB

Perhaps my code isn't the simplest approach. As above, I'll investigate embedding a query table in Excel, and just calling the refresh method on the query table.

Thanks again for your patience and help.
 
Upvote 0
I investigated Data -> From Other Sources --> From Microsoft Query, and tried to build the query, but it was choking when I manually edited the SQL to build the multi-table left join.

I've searched the Excel help on "query table" and "querytable" (including the quotes), but didn't get any helpful hits.

I'll try Alt-D-D-N tomorrow.

Ok, so I tried Alt-D-D-N tonight...

I've uploaded an Excel 2007 and Excel 2003 workbook to Dropbox:

https://dl.dropboxusercontent.com/u/10002401/Test_Query_Table_With_Query_2007.xlsb
https://dl.dropboxusercontent.com/u/10002401/Test_Query_Table_With_Query_2003.xls

Run the Test macro to refresh the Query worksheet using ADO and recordset. Select the Query2 worksheet, select the table, and click Refresh to refresh the embedded query.

Things I've noted:


  1. The ADO/recordset approach (Test macro) refreshes the Query data without having to save the worksheet. IOW it's getting its source data from memory. The embedded query requires the workbook to be saved first or else the table is not refreshed with the latest data. Fiddle with CaseId in the Claims worksheet to test.
  2. The ADO/recordset approach is impervious to workbook renames, since the Data Source is set in VBA by ThisWorkbook.FullName, rather than being hardcoded in the embedded query. To me this is a big advantage, unless there's a way to make the data source for the embedded query dynamic.
  3. However, the ADO/recordset approach removes any formatting (such as autofiltering). I'd have to use VBA to format the output as desired. The embedded query retains the table formatting. Visually this is appealing, but see #2 above :(

I'm not sure which performs better. The embedded query certainly requires less VBA code.
 
Upvote 0
Hi, Scott

You've done some good work.

Addressing a couple of points.


1. There is something fundamental that I'm unclear on.

What happens if a user does some work filling in AgreedAmount values and then someone says, there were some changes to the Claims/Payments data please use this newer data. This last situation is the one I was addressing with the suggested query table and two step refresh previously described. The two steps enable retention of the AgreedAmount against CaseID.

So the basic idea I was heading to was VBA that was two lines (syntax for table object reference is different post 2003)

worksheet("whatever").querytables(1).refresh 'to record by caseid the agreed amount
worksheet(("Money"),querytables(1).refresh 'to update from claims, Payments AND whatever (to retain the previous agreedamount)

(I maybe had a simplistic set up in mind. Manually set up the queries and then just the simple refresh. Your real situation will for sure be more complex.)

It would be just a few more lines to adjust the connection string to cater for changes to the file path.

Have you catered for previous AgreedAmount data to be kept or do users start from a totally blank AgreedAmount column each time Claims or Payments refresh?


2. The file path hard coded in the query table. This is just how way Excel creates the SQL when the table is manually created. It can be deleted & makes no difference - unless you are pulling data from multiple different files. You can edit out the hard coded name and it will be OK.

A query table has two fundamental properties: the connection and the SQL. Edit the hard coded file path out of the SQL and then it is a simple matter, like the ADO approach, of changing the data source in the connection.

Maybe you can see these (query table .connection & .SQL aka commandtext) in the locals window during code execution.

Or debug.print to the immediate window. For excel 2003, something like

for each wks in worksheeets
for each qt in wks.querytables
debug.print qt.connection & vbcr & debug.print qt.sql & vbcr
next qt
next wks


I think you've got things in hand, and I don't know if there is anything else for me to comment on. if there is, please let me know.


regards
 
Last edited:
Upvote 0
Hi, Scott

You've done some good work.
Thanks for all the help

What happens if a user does some work filling in AgreedAmount values and then someone says, there were some changes to the Claims/Payments data please use this newer data.
There are actually more worksheets. The workflow goes like this:


  • Cases: A new case is opened by the analyst. This is a single row, with some general information, such as first and last name. Add a new row generates a "surrogate key" (unique identifier, just an incrementing integer (actually long)) called caseid. This surrogate key is the foreign key for the other worksheets, and is critical to everything hanging together.
  • Claims: A cut and paste from another application. Many lines per caseid. All lines must have the correct caseid. Other general info, such as first and last name, are copied to this and all other worksheets via INDEX/MATCH against Cases.
  • Payments: Incoming payments are entered here. Potentially multiple lines per caseid (partial payments).
  • Money: This is a rollup of "where we're at", and the subject of my post. All the data on this worksheet can be derived from Claims, Payments, and Money itself, except for AgreedAmount, which is the only editable item on this worksheet. This worksheet is a rollup to one line per caseid.

My vision is the query joins these three tables, updates a hidden or "scratch" worksheet, then I use VBA to copy the data back over Money, overwriting/replacing the old data. Or, potentially, the output location of the query can be the Money worksheet itself??? As long as I've regenerated all the columns in Money via the query, there should be no danger in replacing the lot.

worksheet("whatever").querytables(1).refresh 'to record by caseid the agreed amount
worksheet(("Money"),querytables(1).refresh 'to update from claims, Payments AND whatever (to retain the previous agreedamount)
The caseid would already be recorded against the agreed amount on the Money worksheet.

It would be just a few more lines to adjust the connection string to cater for changes to the file path.
Each analyst will have their own copy of the workbook, since shared workbooks are evil. They will all be in the same directory, so will have different names, probably with the analyst's userid in the filename. Plus they could move the workbooks around. If there is a way I can 1) create a query using the wizard, but 2) change the data source to ThisWorkbook.FullName prior to calling the refresh method, that would be my preference, rather than ADO and recordset. Plus this approach may be easier for whoever follows me to maintain. I'd also like to setup Conditional Formatting on the Money worksheet, and want that to remain in place after refresh.

Have you catered for previous AgreedAmount data to be kept or do users start from a totally blank AgreedAmount column each time Claims or Payments refresh?
When the user refreshes the Money worksheet, any new cases would result in AgreedAmount as blank. Once they find out the AgreedAmount, they enter it in. It then stays in the Money worksheet since the Money worksheet itself forms part of the query.

2. The file path hard coded in the query table. This is just how way Excel creates the SQL when the table is manually created. It can be deleted & makes no difference - unless you are pulling data from multiple different files. You can edit out the hard coded name and it will be OK.

A query table has two fundamental properties: the connection and the SQL. Edit the hard coded file path out of the SQL and then it is a simple matter, like the ADO approach, of changing the data source in the connection.
Ok, thanks, I think I got it. So I can override the data source saved via the query wizard via VBA.

Maybe you can see these (query table .connection & .SQL aka commandtext) in the locals window during code execution.
Or debug.print to the immediate window. For excel 2003, something like

for each wks in worksheeets
for each qt in wks.querytables
debug.print qt.connection & vbcr & debug.print qt.sql & vbcr
next qt
next wks
I haven't yet walked through the debugger using a query table created by the query wizard. So thanks for that. The debugger window is invaluable!


I think you've got things in hand, and I don't know if there is anything else for me to comment on. if there is, please let me know.
1) Using the query wizard, I first create a new data source. Is there a way to rename the data source after it's been created? (Minor)
2) I actually need two query tables on my workbook (the 2nd is just a simple rollup, similar to a pivot table). Once I create a data source, can I use it for both queries?
3) Is the data source and query(ies) saved in the workbook? (I'm not exporting them) IOW all I need to send to the end user is the worksheet itself?
[/QUOTE]

If you're ever in Sydney I owe you a beer or three ;)
 
Upvote 0
Thanks for the further explanation. As was becoming more obvious, there was more to it & you are on top of it.

OK, payments are entered directly in the spreadsheet. I had expected like the Claims data it would come from another application/database.

Thank you for explaining your vision of the updating process. That is how you retain already entered AgreedAmount data. Via the hidden/scratch worksheet & VBA. That might be a fraction more work than the query I was envisaging. Maybe just an INDEX/MATCH formula, paste to values only, remove error values. You'll be programmatically getting to the same end point. As you identified already, the query table approach retains formatting & formulas automatically.

The caseid would already be recorded against the agreed amount on the Money worksheet.
Yes, and to re-iterate, I was querying the Money sheet to my scratch sheet with : SELECT CaseID, AgreedAmount FROM [Money$]
as my first query of a two step refresh, and then refreshing the Money sheet to get latest Claims, Payments and from my scratch sheet AgreedAmount

Something like : SELECT all fields for money sheet FROM Claims, Payment, ScratchSheet

So doing the two steps so that the AgreedAmount is simply retained without the mucking around required with an alternative approach. I still am unsure if you're clear on that approach? Though I see you have the same result as you describe Money sheet refreshes have blanks in the AgreedAmount for new records and other values stay there 'since th Money worksheet itself forms part of the query'. It sounds like you're doing the refresh a little like I'm thinking except with a recordset to re-generate the entire Money data each time, hence you query Money when making that recordset. Whereas I am keeping the exisiting Money data and using the extra scratch table to retain the pre-existing AgreedAmount values.

Shared workbooks for sure are evil!

If there is a way I can 1) create a query using the wizard, but 2) change the data source to ThisWorkbook.FullName prior to calling the refresh method, that would be my preference
That is easy.

Conditional formatting in Excel 2007 might be a nuisance. I have used it & found problems such that for me the solution was to re-create it after updates. You may be ok though

So I can override the data source saved via the query wizard via VBA.
Yes, simple text manipulation.

Your questions.

1) Using the query wizard you first create a data source. I've never done that, and I've been doing these queries for ~10 years. ALT-D-D-N and then Excel (or whatever) files & follow wizard.

2) I just make each query from scratch, and like I just wrote, I don't use data sources. Here are three basic ways to do these things: one, create them manually (no VBA) once, refresh manually as required; two, create them manually & use VBA to refresh (probably just the connection once you ensure there are no hard coded file paths in the SQL. Easy enough too to just put the SQL in each time); three, create them via VBA (to be sure they are what you are expecting, & not changed by users) & refresh using VBA.

3) Yes, the .connection and .SQL (aka .commandtext) are in the object in the worksheet. All you need is send the end user the worksheet itself and the VBA can refresh the .connection & the query


Note the code I gave last time to debug.print the .connection and .sql is for Excel 2003 & earlier. For later versions the query table object reference has changed.

Yes, all this writing is thirsty work. I'm sure if we were standing at the one computer discussing this it would have taken only 5 to 10 minutes!!

A beer'd be nice. Thanks, Scott. If you're over in Perth I can shout you a couple
(y)
 
Upvote 0

Forum statistics

Threads
1,214,951
Messages
6,122,442
Members
449,083
Latest member
Ava19

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top