Sql-server – Add missing month in a table of values

sql server

I have a table with the following entries:

customer_IDlastDateSum
1'2012-04-30'5
1'2012-06-30'4
1'2013-07-31'25
2'2012-04-30'7
2'2012-05-31'4
2'2012-06-30'1
2'2012-07-31'6

I need to add missing month and the date which gets added should be the last date of that month with frequency value as 0.

The expected output is:

customer_IDlastDateSum
12012-04-305
12012-05-310
12012-06-304
12012-07-310
12012-08-310
12012-09-300
12012-10-310
12012-11-300
12012-12-310
12013-01-310
12013-02-280
12013-03-310
12013-04-300
12013-05-310
12013-06-300
12013-07-3125
22012-04-307
22012-05-314
22012-06-301
22012-07-316

I need to add missing month and the date which gets added should be the last date of that.

We can achieve this using CTE method. But can a similar result be obtained using joins?

Solution through CTE:

with cte as (
      select id, date, frequency,
             lead(date) over (partition by id order by date) as next_date
      from t
      union all
      select id, eomonth(date, 1), 0, next_date
      from cte
      where eomonth(date, 1) < dateadd(day, -1, next_date)
     )
select id, date, frequency
from cte
order by id, date;

I am trying to build a data pipeline and when I run this on that editor, it does not support recursive CTEs. I have no options.

Best Answer

If you have a table of numbers, or can access the built-in table in master:

SELECT
    T1.id,
    Expanded.[date],
    Expanded.frequency
FROM 
(
    -- Add next date for the current id
    SELECT
        T.*,
        next_date = 
            LEAD(T.[date], 1) OVER (
                PARTITION BY T.id
                ORDER BY T.[date])
    FROM dbo.t AS T
) AS T1
CROSS APPLY
(
    -- All month ends >= the current date and < next date
    SELECT
        [date] = EOMONTH(T1.[date], SV.number),
        frequency = IIF(SV.number = 0, T1.frequency, 0),
        SV.number
    FROM master.dbo.spt_values AS SV
    WHERE
        SV.[type] = N'P'
        AND SV.number < ISNULL(DATEDIFF(MONTH, T1.[date], T1.next_date), 1)
) AS Expanded
ORDER BY
    T1.id,
    T1.[date],
    Expanded.number;

The output matches that of the recursive CTE you provided (see demo link below):

iddatefrequency
12012-04-305
12012-05-310
12012-06-304
12012-07-310
12012-08-310
12012-09-300
12012-10-310
12012-11-300
12012-12-310
12013-01-310
12013-02-280
12013-03-310
12013-04-300
12013-05-310
12013-06-300
12013-07-3125
22012-04-307
22012-05-314
22012-06-301
22012-07-316

Explanation

Logically, each row from T1 is expanded by the APPLY into the set of month-ends that ought to be present. Let's look at one particular row in the source table:

iddatefrequency
12012-04-305

Adding the LEAD in the table expression aliased to T1 gives:

iddatefrequencynext_date
12012-04-3052012-06-30

This row is passed to the APPLY:

SELECT
    [date] = EOMONTH(T1.[date], SV.number),
    frequency = IIF(SV.number = 0, T1.frequency, 0),
    SV.number
FROM master.dbo.spt_values AS SV
WHERE
    SV.[type] = N'P'
    AND SV.number < ISNULL(DATEDIFF(MONTH, T1.[date], T1.next_date), 1)

The references to columns in T1 are outer references. Substituting the values from the current T1 row for [date], next_date, and frequency gives:

SELECT
    [date] = EOMONTH('2012-04-30', SV.number),
    frequency = IIF(SV.number = 0, 5, 0),
    SV.number
FROM master.dbo.spt_values AS SV
WHERE
    SV.[type] = N'P'
    AND SV.number < ISNULL(DATEDIFF(MONTH, '2012-04-30', '2012-06-30'), 1)

The ISNULL expression determines how many rows are returned from our table of numbers. In the current iteration the ISNULL expression evaluates to 2, so the rows returned from the numbers table have values 0 and 1.

The APPLY table expression as a whole returns:

datefrequencynumber
2012-04-3050
2012-05-3101

Notice:

  • When number = 0, we are dealing with a row that exists in T1, so we just use the frequency given.
  • When number > 0, the row does not exist in T1 so we need to return a zero for frequency.

That gives us the results needed for the single row we chose from T1. We can do the same for the next row from T1 (substituting values as before), and so on until the result is complete. For more on how APPLY works logically, see my article Understanding and Using APPLY.

The only special handling is for the last row per id, where the LEAD returns NULL. In that case, we only need one row from the numbers table (value zero) to process the current row from T1.


The built-in table is sufficient for a range of dates from '2000-01-01' to '2170-08-01'.

Try the db<>fiddle demo