Postgresql – How to count last records from table where cell null before same cell is not null by order

postgresql

I have simple table with conversation.

  id   | room_id | member_id |       message        |          readtime          
-------+---------+-----------+----------------------+----------------------------
 81531 |    7992 |         1 | S:TestMessage        | 2018-07-21 17:27:39.844148
 81533 |    7992 |        16 | R:TestOK             | 2018-07-21 17:27:55.914063
 81545 |    7992 |        16 | R:hi                 | 
 81546 |    7992 |        16 | R:hi2                | 
 81547 |    7992 |        16 | R:ho1                | 
 81548 |    7992 |        16 | R:1111               | 2018-07-22 19:14:14.926177
 81550 |    7992 |        16 | R:33                 | 
 81551 |    7992 |        16 | R:33abc              | 
(8 rows)

As we can see, last 2 messages has not been read.
For other cases, possible several last messages was not read or all messages has been read.

I have only room_id to filter messages.

My query is

SELECT id, room_id, member_id, message, readtime FROM dialog_message WHERE room_id = 7992

I see three conditions:

  1. Last record must have readtime is null
  2. Count from end while readtime is null
  3. member_id like in last records

So, how can select only last unread messages, represented before id 81548?

I mean id 81550 and 81551.

81548 – has been read and above 81547-81545 read too, even readtime is null

PS: Sorry for my English.

Best Answer

Assuming:

  • you need the set of rows with all columns like your query attempt suggests, not the count of rows like your title suggests.

  • "last messages", "before" etc. are defined by the id column in your example.
    Careful: serial ID numbers are not reliably continuous quantities. Rows inserted "later" may have a lower ID under rare circumstances.

  • there can be multiple threads with trailing NULL rows for multiple different member_id in the same room_id.

You'll need a multicolumn index like this one in any case:

CREATE INDEX ON dialog_message(room_id, member_id, id DESC);

With a NOT EXISTS semi-join

For small tables or cases with only few stale rows with readtime IS NULL but a later row with readtime IS NOT NULL excluding those, I'd try a partial multicolumn index

You need this index in addition to the one above for best read performance:

CREATE INDEX ON dialog_message(room_id) WHERE readtime IS NULL;

Query:

SELECT d.*
FROM   dialog_message d
WHERE  d.room_id = 7992
AND    d.readtime IS NULL
AND    NOT EXISTS (
   SELECT FROM dialog_message
   WHERE  room_id   = d.room_id
   AND    member_id = d.member_id
   AND    readtime IS NOT NULL
   AND    id > d.id
   );

Double recursive CTE

For big tables or cases with lots of stale rows with readtime IS NULL, I'd expect a recursive CTE to be fastest among pure SQL solutions:

WITH RECURSIVE cte AS (  -- get latest row for each member_id
   (
   SELECT *
   FROM   dialog_message
   WHERE  room_id = 7992
   ORDER  BY member_id, id DESC
   LIMIT  1
   )
   UNION ALL
   (
   SELECT d.*
   FROM   cte
   JOIN   dialog_message d USING (room_id)
   WHERE  d.member_id > cte.member_id
   ORDER  BY d.member_id, id DESC
   LIMIT  1
   )
   )
, cte2 AS (
   SELECT *
   FROM   cte
   WHERE  readtime IS NULL  -- IF not null, look for more ...

   UNION ALL
   (
   SELECT d.*
   FROM   cte
   JOIN   dialog_message d USING (room_id, member_id)
   WHERE  d.id < cte.id
   AND    cte.readtime IS NULL
   ORDER  BY d.id DESC
   LIMIT  1
   )
   )
SELECT *
FROM   cte2
WHERE  readtime IS NULL;  -- trim the bounding row with readtime IS NOT NULL

Recursive CTE combined with PL/pgSQL function

Might be one of the rare cases where a procedural solution with a PL/pgSQL function is even faster because it can make do with a single index scan per call.

Create this function once:

CREATE OR REPLACE FUNCTION f_latest_unread_messages(_room_id int, _member_id int)
   RETURNS SETOF dialog_message AS
$func$
DECLARE
   _rec dialog_message;
BEGIN
   FOR _rec IN
      SELECT *
      FROM   dialog_message d
      WHERE  d.room_id   = _room_id
      AND    d.member_id = _member_id
      ORDER  BY d.id DESC
   LOOP
      IF _rec.readtime IS NULL THEN
         RETURN NEXT _rec;
      ELSE
         EXIT;
      END IF;
   END LOOP;
END
$func$  LANGUAGE plpgsql;

Combine the outer rCTE from above with the function:

WITH RECURSIVE cte AS (
   (
   SELECT *
   FROM   dialog_message
   WHERE  room_id = 7992
   ORDER  BY member_id, id DESC
   LIMIT  1
   )
   UNION ALL
   (
   SELECT d.*
   FROM   cte
   JOIN   dialog_message d USING (room_id)
   WHERE  d.member_id > cte.member_id
   ORDER  BY d.member_id, id DESC
   LIMIT  1
   )
   )
SELECT m.*
FROM   cte c, f_latest_unread_messages(c.room_id, c.member_id) m
WHERE  c.readtime IS NULL;  -- trim rows with readtime NOT NULL immediately

db<>fiddle here

Related: