Finding unmatched records for two identical queries

dankar

Board Regular
Joined
Mar 23, 2016
Messages
91
Hi All,

I have two identical queries from two Access databases, I need to check any unmatched fields every week, what will be the best solution to check that, please help me in detail If possible.

Both tables have more than 300k records

The other table will be added as a linked table in the current database.

* I need to find any unmatched fields so I can update it and keep both identical.

Please let me know if you have any question?

Thanks

ID= Unique


IDRoomFreezerShelfRackBoxPosition
 

JonXL

Active Member
Joined
Feb 5, 2018
Messages
450
Office Version
  1. 365
  2. 2016
Platform
  1. Windows
Having done something similar myself, I'll throw in my two cents.

What xenou says about the null values is spot on. You have to use Access's Nz() function to resolve null fields to values that can be compared, such as zero-length strings.

My experience, though, using tables with so many records was that trying to compare all the fields of all the records to all the fields of all the other records is that it can't be done practically in a query. You can write up a beautiful and accurate SQL statement but it will take Access so long to execute it that you won't get your results in anything of a reasonable time.

The approach I ended up going with was to loop through each table's records one-by-one in VBA and compare them to the other table's and when mismatched records were found to write the ID to a third table. Then the process would be done again in the opposite direction (comparing the second table to the first and writing its IDs to a fourth table). Once you have your IDs - and assuming there aren't a gazillion of them - you can more easily use a query to join the tables, make updates, etc.

Those VBA operations together took between 4 and 8 hours to run on record sets of about 200,000 records each with around 20 fields on each record.

Maybe a better way existed for doing that, but I wasn't able to find one at the time. If you find success running these solely as queries, I'd be interested in seeing your full solution.
 

Some videos you may like

Excel Facts

Workdays for a market open Mon, Wed, Friday?
Yes! Use "0101011" for the weekend argument in NETWORKDAYS.INTL or WORKDAY.INTL. The 7 digits start on Monday. 1 means it is a weekend.

dankar

Board Regular
Joined
Mar 23, 2016
Messages
91
You need to explicitly test all the fields in the left join (not just join on ID - join on all the fields).

For example:

SQL:
select
    t1.ID,
    t1.Room as Room1,
    t1.Freezer as Freezer1,
    t1.Shelf as Shelf1,
    t1.Rack as Rack1,
    t1.Box as Box1,
    t1.Position as Position1,
    t2.Room as Room2,
    t2.Freezer as Freezer2,
    t2.Shelf as Shelf2,
    t2.Rack as Rack2,
    t2.Box as Box2,
    t2.Position as Position2
from #Table1 t1
left join #Table2 t2
on t1.ID = t2.ID
where
    (
    Nz(t1.Room, '---') <> Nz(t2.Room, '---')
    or Nz(t1.Freezer, '---') <> Nz(t2.Freezer, '---')
    or Nz(t1.Shelf, '---') <> Nz(t2.Shelf, '---')
    or Nz(t1.Rack, '---') <> Nz(t2.Rack, '---')
    or Nz(t1.Box, '---') <> Nz(t2.Box, '---')
    or Nz(t1.Position, '---') <> Nz(t2.Position, '---')
    )

It is important to remember that Null is always not equal to Null, which can be a false positive in a case where the first and the second table both have Null in the same field -- they would be considered inequal by the rules of equality, but to human thinking if both are Null then they are really the same.

Anyway, that's why the carefulness with Null values...

The rows with all Nulls on the table2 side represent records in table1 that are not in table2. The rest would be records in table1 that are in table2 but have different field values.
Hi Xenou,

Thank you very much for you help.

The SQL statement you wrote worked perfectly, I will try it in the real database and let you know ( the one I tried is a small demo one (only 10 samples) similar to the real one.


Sorry ,I missed one thing hope you can help me and adjust :

The employees in the old database (table2), they enter the position as "A1,A2,....etc) "CharacterNumber", but in the new database (table1) they enter it as "A , 1" "Character , Number")

they are the same but the way its entered is different.

is there a way to let Access pretend they are the same when the query run the compare?

1607354493655.png
 

dankar

Board Regular
Joined
Mar 23, 2016
Messages
91
the way I think..is to add a new column
"Pos new" removing the comma and the spaces and then compare the new column with table2.pos
what will be the formula to do that, I did it in excel can I use the same formula in access?
 

dankar

Board Regular
Joined
Mar 23, 2016
Messages
91
This is what I go but still the issue with the null value :(

** Pos New: Left([Pos],1) & " , " & Right([Pos],Len([Pos])-1)

what to add to make the null stay as is?

Thanks

1607360837512.png
 

xenou

MrExcel MVP
Joined
Mar 2, 2007
Messages
16,825
Office Version
  1. 2019
Platform
  1. Windows

ADVERTISEMENT

I would probably use a strategy of replacing (removing) spaces and commas... (no need to worry about unusual values with maybe two spaces instead of one space)...

SQL:
IIf(IsNull([Position]), null, Replace(Replace([Position],' ', ''),',',''))

This could be incorporated into the query without having to create a new column (but both ways are fine):
SQL:
where
    (
    Nz(t1.Room, '---') <> Nz(t2.Room, '---')
    or Nz(t1.Freezer, '---') <> Nz(t2.Freezer, '---')
    or Nz(t1.Shelf, '---') <> Nz(t2.Shelf, '---')
    or Nz(t1.Rack, '---') <> Nz(t2.Rack, '---')
    or Nz(t1.Box, '---') <> Nz(t2.Box, '---')
    or Nz(IIf(IsNull(t1.Position), Null, Replace(Replace(t1.Position, ' ', ''), ',', '')), '---') <> Nz(IIf(IsNull(t1.Position), Null, Replace(Replace(t2.Position, ' ', ''), ',', '')), '---')
    )

To JonXL's point, I would be a bit concerned about a lot of functions getting into the query - may slow down what is already a pretty heavy query process. So updating a special column for comparison purposes might be worth the extra work. While on the other hand if the query works fast enough it is simpler to not have to update the extra column or even have the extra column in the table.

Sorry couldn't test this - I really hope I got my parentheses right!!!!


Note that now that I think of it this should work too:

SQL:
Replace(Replace([Position],' ', ''),',','')

i.e.,
SQL:
or Nz(Replace(Replace(t1.Position, ' ', ''), ',', ''), '---') <> Nz(Replace(Replace(t2.Position, ' ', ''), ',', ''), '---')

Which is to say, I'm assuming all the replace operations will do nothing on Null and such fields would simply remain Null.... but not sure (I might have guessed the same about your original formula too, but maybe there it is the Len() function that blows up on nulls.
 
Last edited:

dankar

Board Regular
Joined
Mar 23, 2016
Messages
91
I would probably use a strategy of replacing (removing) spaces and commas... (no need to worry about unusual values with maybe two spaces instead of one space)...

SQL:
IIf(IsNull([Position]), null, Replace(Replace([Position],' ', ''),',',''))

This could be incorporated into the query without having to create a new column (but both ways are fine):
SQL:
where
    (
    Nz(t1.Room, '---') <> Nz(t2.Room, '---')
    or Nz(t1.Freezer, '---') <> Nz(t2.Freezer, '---')
    or Nz(t1.Shelf, '---') <> Nz(t2.Shelf, '---')
    or Nz(t1.Rack, '---') <> Nz(t2.Rack, '---')
    or Nz(t1.Box, '---') <> Nz(t2.Box, '---')
    or Nz(IIf(IsNull(t1.Position), Null, Replace(Replace(t1.Position, ' ', ''), ',', '')), '---') <> Nz(IIf(IsNull(t1.Position), Null, Replace(Replace(t2.Position, ' ', ''), ',', '')), '---')
    )

To JonXL's point, I would be a bit concerned about a lot of functions getting into the query - may slow down what is already a pretty heavy query process. So updating a special column for comparison purposes might be worth the extra work. While on the other hand if the query works fast enough it is simpler to not have to update the extra column or even have the extra column in the table.

Sorry couldn't test this - I really hope I got my parentheses right!!!!


Note that now that I think of it this should work too:

SQL:
Replace(Replace([Position],' ', ''),',','')

i.e.,
SQL:
or Nz(Replace(Replace(t1.Position, ' ', ''), ',', ''), '---') <> Nz(Replace(Replace(t2.Position, ' ', ''), ',', ''), '---')

Which is to say, I'm assuming all the replace operations will do nothing on Null and such fields would simply remain Null.... but not sure (I might have guessed the same about your original formula too, but maybe there it is the Len() function that blows up on nulls.
Thank you sooo much xenou,

I posted the image below to make sure everything is right.

I tried it so far on the small demo tables and it worked perfectly, I will try it now on the main one and see how it will work with over 200k

I tried your last formula and I got "#ERROR" for the Null values so I kept the first one you wrote (the one with IIf )

I have two questions please :

1. I used the formula only for table1 (T1) (highlighted in yellow) since its the only table that have the commas and spaces, would this be correct? ( I ran they query and it worked, just want to make sure)

2. From your expert for a big data over 200k, which one will be faster (to add new column removing the spaces and then saving the query and link it with the second table (to prevent using another formula in the query) or using it incorporated into the query is better.

thanks again for your help.

1607438129545.png
 

xenou

MrExcel MVP
Joined
Mar 2, 2007
Messages
16,825
Office Version
  1. 2019
Platform
  1. Windows

ADVERTISEMENT

For your two questions,
in the first case yes you really only need it in table1, but it should do no harm even if used in table2.
In the second case, I would probably just not add a new column and if the performance is okay then I'd keep it that way. Okay performance is subjective - it depends on your requirements and expectations. But all things being equal if you can accomplish this in a single query then that's best (no extra steps to pay attention to).
 

dankar

Board Regular
Joined
Mar 23, 2016
Messages
91
I cant thank you enough xenou for your help. the query worked perfectly and very fast for around 120K row.

Except one thing I couldn't know why:

while checking the results I found that the query retrieved also all blank results meaning it retrieved the IDs that have null in all fields (all field are null in both tables for the same ID)

I tried the same scenario in the small testing Demo tables and it was fine (the query here didn't retrieve the null identical), could you have an explanation for that (What's your thoughts)?

In addition, I removed the outer join, I just used "Only include rows where the joined fields from both tables are equal" as it will matter with this query and also both tables will be the same.

Thanks again
 

dankar

Board Regular
Joined
Mar 23, 2016
Messages
91
This is how the results looked like for the Null values:

1607544066572.png


Can I also ask what's the idea of the "---" ,can you explain it to me? thank you
 

xenou

MrExcel MVP
Joined
Mar 2, 2007
Messages
16,825
Office Version
  1. 2019
Platform
  1. Windows
Sure, it makes sense to use inner join as long as you aren't looking for unmatched records (i.e. an ID that is in one table that is not in the other). If you are always comparing ids that exist in both tables then inner join is best (and faster).

The idea with "---" ...

consider these two made up rows:

1 Cat Dog Null Rat
1 Cat Dog Null Rat

These two rows would be considered "not equal" by usual SQL rules, because 1 = 1, Cat = Cat, Dog = Dog, Rat = Rat, but ... Null <> Null !!
Usually for comparing data we would want to consider them as equal.
So my query in fact would treat the two rows as:

1 Cat Dog --- Rat
1 Cat Dog --- Rat

Now the two nulls are compared both as being three dashes (an arbitrary string value that means nothing in itself) and they can be evaluated as equal for that reason. That's the idea, anyway. I'm not sure I'm understanding what I'm seeing in the above picture (what is Expr1015, Expr1016, etc.) ... basically if Room, Freezer, Shelf, Rack, Box, and Position are null on both sides then they should come out as "equal" in this query (therefore, not included in results if all fields are "equal").

For the benefit of wider discussion (but irrelevant otherwise) I'll mention that in most database systems you could just use EXCEPT and usually Nulls are handled correctly too:
one way --> select * from Table1 EXCEPT select * from Table2
both ways --> select * from Table1 EXCEPT select * from Table2 UNION select * from Table2 EXCEPT select * from Table1
 
Last edited:

Watch MrExcel Video

Forum statistics

Threads
1,127,848
Messages
5,627,244
Members
416,233
Latest member
Riddlemethis

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Top