SELECT TOP n, where n > total

Jaymond Flurrie

Well-known Member
Joined
Sep 22, 2008
Messages
919
Office Version
  1. 365
Platform
  1. Windows
Should be an easy question: If I want top 100000 records and the source happens to have only, say, 100 records, do I need to check something or is SQL smart enough to understand that it takes smaller of these two (either that top ( = 100000) or then everything (= 100))?

The idea is that I want no more than 100000 records as the result and if the source happens to have more, I pull out just a sample of 100000 and I thought to use "SELECT TOP 100000 * FROM customers ORDER BY NEWID()" to accomplish this. Can you spot any possible problems with this?

edit. Another question to continue this a bit is that can I order them by CustomerID as well? I mean, let's say I have only five customers there and I execute the "NEWID"-lottery, getting the table like
1 5
2 3
3 4
4 2
5 1

where on left is CustomerID and on right is the NEWID. I get as the result 5,4,2 and I would like to have them as 2,4,5. What's the smartest way to rearrange them?
 
Last edited:

Excel Facts

Format cells as date
Select range and press Ctrl+Shift+3 to format cells as date. (Shift 3 is the # sign which sort of looks like a small calendar).
I tried this:

"SELECT TOP 50 CustomerID FROM [customers$] ORDER BY NEWID()"

and it says "Undefined function 'NEWID' in expression"

What's the way to pick a random sample? I'm actually using Excel 2010 with "Microsoft ActiveX Data Objects 2.5 library" so this might be slightly out of Access topic, but I just thought that since this is a database question this would fit here.

edit. I tried this:
"SELECT TOP 50 CustomerID FROM [customers$] ORDER BY rnd(CustomerID)"

It gives "Invalid use of Null"-error.
 
Last edited:
Upvote 0
I'm trying now with IS NOT NULL


(sorry for the needless bumping, but this forum was clearly supposed to work that way with the limitation of editing messages.)
 
Upvote 0
Where did NEWID() come from?

I've never heard of such function in Access or ADO, is it something you've created.

As for the ORDER BY rnd(CustomerID), I'm pretty sure you need a field there not just a value.

Mind you, in some versions of SQL you can specify the index number of the field to sort by based on it's position in the list of fields the query will return.

Anyway back to the actual question, I'm afraid I don't quite understand what you mean.

If you specify to return the top N records, and there are X records where N is larger than X then only X records will be returned.

Or is it perhaps that you want to return 100000 records from 100 records?
 
Upvote 0
Where did NEWID() come from?

I've never heard of such function in Access or ADO, is it something you've created.

As for the ORDER BY rnd(CustomerID), I'm pretty sure you need a field there not just a value.

Mind you, in some versions of SQL you can specify the index number of the field to sort by based on it's position in the list of fields the query will return.

Anyway back to the actual question, I'm afraid I don't quite understand what you mean.

If you specify to return the top N records, and there are X records where N is larger than X then only X records will be returned.

Or is it perhaps that you want to return 100000 records from 100 records?

The NEWID() works in some SQL-versions, but seems like in Access/Excel/VBA/AdoDB rnd is the way to go. CustomerID is a field and it seems to pick those correctly now.

And yes, you got the question right. Let's say I want top 5, and I only have three records to pick the five from, so it seems to work so tat I get actually only three and not like it could work where VBA says "you want to pick five of three!? You do understand that's impossible, try again Einstein!", but like said, it's smart enough to know that if I want more than there is, it just gives the ones there are without an error, so this works perfectly in my opinion.

The only question I seem to have left now is the syntax of back-to-back "ORDER BY"s, I mean, let's say I have only five customers there and I execute the "NEWID"-lottery, getting the table like
1 5
2 3
3 4
4 2
5 1

where on left is CustomerID and on right is the rnd(CustomerID). I get as the result 5,4,2 and I would like to have them as 2,4,5. What's the smartest way to rearrange them?
 
Upvote 0
Seems like I solved it. The trick is to do a subquery like this:


SELECT CustomerID
FROM [customers$]
WHERE CustomerID IN
(SELECT TOP 100000 CustomerID
FROM [customers$] AND CustomerID IS NOT NULL
ORDER BY Rnd (-10000000 * TimeValue(Now()) * [CustomerID]))
ORDER BY CustomerID


Things to note there:
1) The Rnd-function has to be made up manually, because there's no call to any "reset seed" - without the manual part we get the same results everytime.
2) The subquery returns the wanted amount of random customers, but that's not sorted
3) The outer function makes IN-query so that it just picks the wanted (ie. results of randomize) customers
4) The last order by sorts the results by the CustomerID (my data is actually already in order, so this might be useless there, but I think it's better to be safe than sorry)
 
Upvote 0
If Customers is the name of your table, you just refer to it as Customers.
Where did Customers$ come from?
Similarly, if NewId is the name of a column in table Customers, you can refer to it as NewId.

SELECT TOP 100000 * FROM customers ORDER BY NEWID
 
Upvote 0
If Customers is the name of your table, you just refer to it as Customers.
Where did Customers$ come from?
Similarly, if NewId is the name of a column in table Customers, you can refer to it as NewId.

SELECT TOP 100000 * FROM customers ORDER BY NEWID

The $ comes from Excel's own syntax ie. that's a worksheet. I'll translate this database part of my program later to Access, so naturally there that $
will be taken out.

I'm not too familiar with that NEWID(), I just found it by Google but it didn't work for me in VBA/ADODB. This Rnd with manual seeding worked much better.
 
Upvote 0
newid is a built in function for some sql engines
I'm pretty sure oracle has it and mysql too

the jet db engine is very basic and is missing a lot of new and cool features

as stated by the op, newid assigns a unique id to each row returned in the query

yeah, for the $, when you're doing an ado connect to an excel workbook the sheets are the tables, and you refer to them with the brackets and $

don't know why, that's just the way you're supposed to do it

i like your solution op, really nice

except, why are some ColumnIDs null ?
that would be the case if that cell in the worksheet was blank, but if CustomerID is the identifier, then it shouldn't be blank

check how the data is getting into excel. Is it being imported/entered correctly ?
 
Last edited:
Upvote 0
newid is a built in function for some sql engines
I'm pretty sure oracle has it and mysql too

the jet db engine is very basic and is missing a lot of new and cool features

as stated by the op, newid assigns a unique id to each row returned in the query

yeah, for the $, when you're doing an ado connect to an excel workbook the sheets are the tables, and you refer to them with the brackets and $

don't know why, that's just the way you're supposed to do it

i like your solution op, really nice

I'm really happy if you found some help from this. I think that it isn't rare when someone needs to pull out a random sample and would like them sorted alphabetically. This is maybe 95% faster than what I used to do - pull out 100k random numbers and combine them to one "IN"-clause. Now all the loops are avoided and SQL is supposed to be pretty fast doing these - It does this now in 34 seconds and I don't even know how much of that is creating the variant of 100k elements.
 
Upvote 0

Forum statistics

Threads
1,213,538
Messages
6,114,218
Members
448,554
Latest member
Gleisner2

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top