Sql-server – Can STRING_AGG be used multiple times with different delimiters

sql serversql-server-2017

I was doing some string aggregation and noticed what looked like a bug to me. STRING_AGG was used twice in the same query with different separator arguments. However, both produced the same results (the "first" separator is used in both cases). It seems to happen only in some circumstances. Is this a bug or is this documented behavior?

First, the setup:

CREATE TABLE #Data
  ([Group] INT
 , Member  CHAR(1));

INSERT INTO #Data
VALUES (1, 'a'), (1, 'b')
     , (2, 'c'), (2, 'd');

From a table vs. VALUES constructor. Without WITHIN GROUP ORDER BY, everything is fine. With it, the bug appears

SELECT Commas = STRING_AGG(Member, ', ') --WITHIN GROUP(ORDER BY Member)
     , Colons = STRING_AGG(Member, '::') --WITHIN GROUP(ORDER BY Member)
FROM #Data;

SELECT Commas = STRING_AGG(Member, ', ') --WITHIN GROUP(ORDER BY Member)
     , Colons = STRING_AGG(Member, '::') --WITHIN GROUP(ORDER BY Member)
FROM (VALUES (1, 'a'), (1, 'b')
           , (2, 'c'), (2, 'd')) [v] ([Group], Member);

Adding in a GROUP BY adds a sort and the error is there for both cases now:

SELECT [Group]
     , Commas = STRING_AGG(Member, ', ')
     , Colons = STRING_AGG(Member, '::')
FROM #Data
GROUP BY [Group];

SELECT [Group]
     , Commas = STRING_AGG(Member, ', ')
     , Colons = STRING_AGG(Member, '::')
FROM (VALUES (1, 'a'), (1, 'b')
           , (2, 'c'), (2, 'd')) [v] ([Group], Member)
GROUP BY [Group];

Add it's definitely broken… like deep down. Combine both expressions in a HAVING and you get nothing back. Use either one alone, and you get data.

SELECT [Group]
     , Colons = STRING_AGG(Member, '::')
     , Commas = STRING_AGG(Member, ', ')
FROM #Data
GROUP BY [Group]
HAVING STRING_AGG(Member, ', ') LIKE '%, %'
   AND STRING_AGG(Member, '::') LIKE '%::%';

Switching the order of the columns switches the separator used.

I used these trace flags for investigating, but I can't find anything about "optimizing" away the second aggregate.

OPTION (QUERYTRACEON 3604 -- Output info to client
      , QUERYTRACEON 8619 -- Show applied optimization rules
      , QUERYTRACEON 8606 -- Show logical query trees
      , QUERYTRACEON 8607 -- Show physical query tree
      , RECOMPILE);

Best Answer

This is a bug. This answer indicates that it is fixed in CU17 but I just tested that and found it negative (and there is no indication that the fix was intended to be backported here).

I tried

SELECT [Group]
     , Commas = STRING_AGG(Member, ', ')
     , Colons = STRING_AGG(Member, '::')
FROM (VALUES (1, 'a'), (1, 'b')
           , (2, 'c'), (2, 'd')) [v] ([Group], Member)
GROUP BY [Group]
OPTION (QUERYTRACEON 3604 -- Output info to client
      , QUERYTRACEON 8607 -- Optimization output tree, before post-optimization rewrite phase.
      , QUERYTRACEON 7352 -- The final query tree after Post-optimization re-write.
      , RECOMPILE);
;

In 2017 CU17. This gives plan

enter image description here

On 2019 there is no compute scalar and the Stream Aggregate has defined values

[Expr1010] = Scalar Operator(STRING_AGG([Union1009],', ')), 
[Expr1011] = Scalar Operator(STRING_AGG([Union1009],'::'))

On 2017 the Stream Aggregate has defined value

[Expr1010] = Scalar Operator(STRING_AGG([Union1009],', '))

And there is an additional Compute Scalar with expression

[Expr1011] = Scalar Operator([Expr1010])

The output of trace flag 8607is the same on 2017 and 2019 but 7352 differs so I assume this is a faulty post optimisation rewrite.

As a workaround you can change the expression to do a no-op as below to prevent this faulty optimisation.

SELECT [Group]
     , Commas = STRING_AGG(Member, ', ')
     , Colons = STRING_AGG('' + Member, '::')