# Mysql – Group by but preserve statistical probability of value in limit

MySQL

Consider a table in which a specific column is not unique:

name
--------
john
paul
paul
george
ringo
paul
paul
paul
paul


How might one return N users by random, but preserve the statistical likelyhood that the user with the most entries is most likely to be returned? This would be similar to actually drawing N names by way of slips of paper from a hat, with the stipulation that if I draw a slip with the same name as one already drawn that I would discard it and draw another.

If I use GROUP BY and LIMIT, then MySQL first removes the dupes, then performs the limit. Note also that there are many fields returned, so using DISTINCT on the field list is not an option.

Of course, I would prefer to do this is a single query and not loop over a query. Note also that the database might contain less than N entries, in which case obviously each unique user will be returned.

That was a challenge. I think this will do the job:

CREATE TEMPORARY TABLE t (
name VARCHAR(99)  NOT NULL,   -- this datatype should match your name
rnd DOUBLE  NOT NULL
)
SELECT name, RAND() FROM your_table;

SELECT name, MIN(rnd) AS x
FROM t
GROUP BY name
ORDER BY x
LIMIT ... ;    -- Fill in the "N" you want