Sunday, February 19, 2012

efficiently creating random numbers in very large table

Hello,

I need to sample data in a very large table in SQL Server 2000 (a gazillion rows of Performance Monitor statitics).

I'd like to take the top 5%, for instance, based upon a column containing random numbers.

Can anyone suggest a highly efficient method of populating a column with random numbers.

Thanks in advance.

Rodselect TOP 5 PERCENT * from [YourTable] order by newid()|||select TOP 5 PERCENT * from [YourTable] order by newid()

Thank you, I'll give that a go.

Regards,

Rod|||that won't populate your table with any random numbers obviously.

it will give you a random 5% slice of the table. a different slice each time you run it.|||Thanks, Good point; maybe I can have another column to set a bit , so that I can reproduce. I'll have to test performance, perhaps someone has some experience with this or have a different technique to propose. Thank you.

Rod|||If you really want a column of random values, then just create a GUID column with a default of NEWID(). But this won't give you a random sample every time, of course.|||If you really want a column of random values, then just create a GUID column with a default of NEWID(). But this won't give you a random sample every time, of course.

That's ok blindman, I just neede something that's efficient in terms populating random values. Regards, Rod|||just create a GUID column with a default of NEWID()
Ofcourse this works but if your table is really that big beware of the time it takes to alter the table! SQL Server has to expand each record so numerous page splits will occur, indexes will have to be rebuild, etc, etc. This could take a couple of hours.|||Ofcourse this works but if your table is really that big beware of the time it takes to alter the table! SQL Server has to expand each record so numerous page splits will occur, indexes will have to be rebuild, etc, etc. This could take a couple of hours.

...ugh.. Thanks. There does not seem to be a really efficient way of doing this...

Thanks for you input. Rod|||how many rows is the table?

also, you can generate random numbers in sql using rand() if you don't like guids. if a random number from 0-255 is sufficient you could store it in a tinyint and less page splits would result.

this code ran in 31 sec on my dev box. not great, but it is what it is:

set nocount on
declare @.t table (RandomColumn tinyint)
declare @.i int
set @.i=0

while @.i < 1000000
begin
insert into @.t select round(rand() * 255, 0)
set @.i = @.i + 1
end|||how many rows is the table?

also, you can generate random numbers in sql using rand() if you don't like guids. if a random number from 0-255 is sufficient you could store it in a tinyint and less page splits would result.

this code ran in 31 sec on my dev box. not great, but it is what it is:

set nocount on
declare @.t table (RandomColumn tinyint)
declare @.i int
set @.i=0

while @.i < 1000000
begin
insert into @.t select round(rand() * 255, 0)
set @.i = @.i + 1
end

That maybe ok, you're right, not great but maybe we can live that. Thanks for your code.

Regards,

Rod

No comments:

Post a Comment