Sql-server – Vendors with Most Invoices in State

aggregatesql serversql-server-2012subquery

I have the tables Vendors(VendorName, VendorID,…) and Invoices(InvoiceID,…) and I am trying to find vendors with the most invoices in their respective states. Please critique my work:

SELECT Max(NumInvoicesState)
   SELECT  VendorState,VendorName,
      Count(InvoiceID) AS NumInvoicesState 
   JOIN Invoices I
     on V.VendorID=I.VendorID
   GROUP by  VendorState, VendorName
 ) AS Alias

My idea is to first do a query to get a table with a list of vendors by state and the number of invoices for each vendor.

I thought of joining the inner-table to Vendors and then group by VendorState, but that does not seem to work.

I guess I could SELECT VendorName, VendorState then join it with Vendors on, say VendorState and then Group by Vendors. But then I get all the Vendors, not just the ones with the max.

Best Answer

I think the inner query is right.

The MAX without a matching GROUP BY in the outer query will give a single value from the whole dataset. In your example the dataset would be the inner query. This is not what you want, so you have to add group by state.

Now you have the problem of vendor name. It can't be in the select list unless it is also in the GROUP BY, but that will give the wrong result. It is necessary to do this in two steps. In the outer query just deal with states:

Select state, max (..) as MaxByState
From (inner query)
Group by state

Then join this back to the inner query on the MAX() value:

from (
    Select state, max (..) as MaxByState
    From (inner query)
    Group by state
) as mbs
inner join (
  Inner query
) as iq
on mbs.state = iq.state
and mbs.MaxByState = iq.NumInvoicesState;

In other words, here are the most numerous invoices in each state. Which vendors have these many invoices?

If two vendors have the same number of invoices in a state there will be two rows output.

Since the inner query is repeated verbatim it would be better presented as a non-recursive CTE.

SQL Server 2012 has a number of windowing functions built in. The most useful one here is RANK (). It does just what you think it should do.

  RANK ( ) OVER (PARTITION BY state order by NumInvoicesState )
from (inner query) as a

This eliminates all of the second-order dataset alignment on derived aggregate values required above. It is much easier to read and understand, and hence to maintain.