Sql-server – Alternative query to this (avoid DISTINCT)

join;queryquery-refactorsql-server-2008

Note: I work with MSSQL 2008, but I guess it's valid for many others DB engines
I have this table "Users":

UserID  User    CountryID
1     user 1    1
2     user 2    2
3     user 3    3
4     user 4    4
5     user 5    4
6     user 6    3

And this table "Countries"

CountryID   Country
1           MX
2           USA
3           CAN
4           ENGLAND

As you can see, every user belongs to a country.

If I want to know, all the different countries where I have at least, one user on the Users table, right now I do this query:

select distinct country 
from Users inner join
countries on users.CountryID=countries.CountryID

And achieve the next result set:

CAN
ENGLAND
MX
USA 

Which is indeed, all the different countries, where I have at least one user on muy table Users.

My doubt is, is possible to achieve the above result set, with out using "DISTINCT", I mean only using JOINS and conditions ?

Here it's de DDL scripts:

USE [TEST]
GO
/****** Object:  Table [dbo].[Users]    Script Date: 09/21/2012 16:21:14 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Users](
    [UserID] [int] NULL,
    [User] [nvarchar](50) NULL,
    [CountryID] [int] NULL
) ON [PRIMARY]
GO
INSERT [dbo].[Users] ([UserID], [User], [CountryID]) VALUES (1, N'user 1', 1)
INSERT [dbo].[Users] ([UserID], [User], [CountryID]) VALUES (2, N'user 2', 2)
INSERT [dbo].[Users] ([UserID], [User], [CountryID]) VALUES (3, N'user 3', 3)
INSERT [dbo].[Users] ([UserID], [User], [CountryID]) VALUES (4, N'user 4', 4)
INSERT [dbo].[Users] ([UserID], [User], [CountryID]) VALUES (5, N'user 5', 4)
INSERT [dbo].[Users] ([UserID], [User], [CountryID]) VALUES (6, N'user 6', 3)
/****** Object:  Table [dbo].[Countries]    Script Date: 09/21/2012 16:21:14 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Countries](
    [CountryID] [int] NULL,
    [Country] [nvarchar](50) NULL
) ON [PRIMARY]
GO
INSERT [dbo].[Countries] ([CountryID], [Country]) VALUES (1, N'MX')
INSERT [dbo].[Countries] ([CountryID], [Country]) VALUES (2, N'USA')
INSERT [dbo].[Countries] ([CountryID], [Country]) VALUES (3, N'CAN')
INSERT [dbo].[Countries] ([CountryID], [Country]) VALUES (4, N'ENGLAND')
INSERT [dbo].[Countries] ([CountryID], [Country]) VALUES (5, N'BRAZIL')

Best Answer

The two scripts in RThomas' answer are both useful. You could also use GROUP BY, which gives a similar advantage to RThomas' methods, but keeping a similar form to your original query.

select country 
from Users inner join
countries on users.CountryID=countries.CountryID
GROUP BY countries.CountryID, countries.country;

The reason why you group by CountryID is that it's the primary key of your countries table, giving the Query Optimizer some better options.

...except that it's not in your scripts.

Put PKs (with Clustered Indexes) on your tables, and a FK relationship between them. Index CountryID in the Users table, and put a Unique Index on the Country field.

Once you've done all that, using DISTINCT how you have will actually give you the ideal execution plan.