Sql-server – How to write a query in SQL Server to find nearest values

sql serversql server 2014

Let's say that I have the following integer values in a table

32
11
15
123
55
54
23
43
44
44
56
23

OK, the list can go on; it doesn't matter. Now I want to query this table and I want to return a certain number of closest records. Let's say I want to return 10 closest record matches to the numer 32. Can I achieve this efficiently?

It is in SQL Server 2014.

Best Answer

  • Assuming the column is indexed the following should be reasonably efficient.

    With two seeks of 10 rows and then a sort of the (up to) 20 returned.

    WITH CTE
         AS ((SELECT TOP 10 *
              FROM   YourTable
              WHERE  YourCol > 32
              ORDER  BY YourCol ASC)
             UNION ALL
             (SELECT TOP 10 *
              FROM   YourTable
              WHERE  YourCol <= 32
              ORDER  BY YourCol DESC))
    SELECT TOP 10 *
    FROM   CTE
    ORDER  BY ABS(YourCol - 32) ASC 
    

    (i.e. potentially something like the below)

    enter image description here

    Or another possibility (that reduces the number of rows sorted to max 10)

    WITH A
         AS (SELECT TOP 10 *,
                           YourCol - 32 AS Diff
             FROM   YourTable
             WHERE  YourCol > 32
             ORDER  BY Diff ASC, YourCol ASC),
         B
         AS (SELECT TOP 10 *,
                           32 - YourCol AS Diff
             FROM   YourTable
             WHERE  YourCol <= 32
             ORDER  BY YourCol DESC),
         AB
         AS (SELECT *
             FROM   A
             UNION ALL
             SELECT *
             FROM   B)
    SELECT TOP 10 *
    FROM   AB
    ORDER  BY Diff ASC
    

    enter image description here

    NB: Execution plan above was for the simple table definition

    CREATE TABLE [dbo].[YourTable](
        [YourCol] [int] NOT NULL CONSTRAINT [SomeIndex] PRIMARY KEY CLUSTERED 
    )
    

    Technically, the Sort on the bottom branch shouldn't be needed either as that too is ordered by Diff, and it would be possible to merge the two ordered results. But I wasn't able to get that plan.

    The query has ORDER BY Diff ASC, YourCol ASC and not just ORDER BY YourCol ASC, because that was what ended up working to get rid of the Sort in the top branch of the plan. I needed to add the secondary column in (even though it won't ever change the result as YourCol will be the same for all values with the same Diff) so it would go through the merge join (concatenation) without adding a Sort.

    SQL Server seems able to infer that an index on X seeked in ascending order will deliver rows ordered by X + Y and no sort is necessary. But it is not able to infer that travelling the index in descending order will deliver rows in the same order as Y-X (or even just unary minus X). Both branches of the plan use an index to avoid a sort but the TOP 10 in the bottom branch are then sorted by Diff (even though they are already in that order) to get them in the desired order for the merge.

    For other queries/table definitions it may be trickier or not possible to get the merge plan with just a sort of one branch - as it relies on finding an ordering expression that SQL Server:

    1. Accepts that the index seek will supply the specified order so no sort is needed before the top.
    2. Is happy to use in the merge operation so requires no sort after the TOP