Sql-server – Availability Group- Redo Rate displayed on AG Dashboard vs Perfmon counter

availability-groupsperfmonsql serversql-server-2017

I am bit confused on checking the metrics for REDO Rate KB/Sec from Always on AG dashboard, which for some scenarios matches with perfmon counter Database replica : Redone Bytes/Sec ( hopefully it's the correct counter for redo rate via perfmon), and sometimes not at all.

Most of the times when there is lot of activity ongoing, my belief is that AG dashboard and DMV shows REDO rate for e.g. 40 MB/Secs at given time which matches with perfmon counter.

However, during less activity time or nothing much to send over to sec, REDO Rate on Dashboard and dmv seems to be showing incorrect values compared to perfmon counter.

Not able to understand which value is correct and how to analyze. Any idea why or is it bug in dashboard?

Screenshot as requested:

enter image description here

No transactions occurred around that time – no major activity on primary end. I am collecting that perfmon on secondary and primary since after failover new secondary we would still need those counters running. However the data pulled in screenshot perfmon data is from secondary.

Best Answer

Those two numbers are measuring slightly different things. You're right that they are both measure redo, but they do it in different ways.

The Perfmon counter is updated in near-real-time - it's the number of bytes redone in the last second:

Amount of log records redone in the last second to catch up the database replica

The AG dashboard is based on the sys.dm_hadr_database_replica_states DMV, specifically the redo_rate column:

Average Rate at which the log records are being redone on a given secondary database, in kilobytes (KB)/second.

So the AG dashboard is based on an average, but over what period? I suspect it's "the last active period" based on the phrasing in the log_send_rate description from the same DMV:

Average rate at which primary replica instance sent data during last active period, in kilobytes (KB)/second.

Let's try and see. I'll open up a lab AG dashboard, and first thing I notice is that the redo rate is not zero, despite the fact that I haven't used this thing in a couple weeks:

Screenshot of AG dashboard showing non zero redo rate

Perfmon is flat on the secondary, as I'm not doing anything yet:

Screenshot of Perfmon showing the counter in question is zero

Now I'll insert some data into my test database:

INSERT INTO dbo.A
SELECT TOP (1000)
    REPLICATE(N'A', 50)
FROM master.dbo.spt_values;

Now Perfmon on the secondary shows a brief blip:

Screenshot of Perfmon showing a brief increase in redo rate

And if I open up the AG dashboard, I can see the redo rate changed (from 3535 to 3873), but it didn't drop back down to zero:

Screenshot of AG dashboard again showing redo rate has changed

So it looks like this DMV (and the dashboard) is only updated when redo is actually happening, and it holds the last value that it calculated.