MATLAB: Removing a Date (Leap Year) from being Averaged

averagedatesdatetimedayofyearMATLABmean

Hi,.
I have a code. The data takes place over 10 years. And I need the average for each day of the year: 1-365. I think everything is fine with my code.
My biggest problem are the leap years of 2012 and 2016. I don't even want to know their average. I just want them removed.
t= readtable('SSArcGIS.xls'); % read as table
t.S1 = single(strcmp(t.ComplaintLocation,'S1'));
t.S2 = single(strcmp(t.ComplaintLocation,'S2'));
t.S3 = single(strcmp(t.ComplaintLocation,'S3'));
t.S4 = single(strcmp(t.ComplaintLocation,'S4'));
t.S5 = single(strcmp(t.ComplaintLocation,'S5'));
t.S6 = single(strcmp(t.ComplaintLocation,'S6'));
t.S7 = single(strcmp(t.ComplaintLocation,'S7'));
t.S8 = single(strcmp(t.ComplaintLocation,'S8'));
t.S9 = single(strcmp(t.ComplaintLocation,'S9'));
t.S10 = single(strcmp(t.ComplaintLocation,'S10'));
t.S11 = single(strcmp(t.ComplaintLocation,'S11'));
t.S12 = single(strcmp(t.ComplaintLocation,'S12'));
t.S13 = single(strcmp(t.ComplaintLocation,'S13'));
t.S14 = single(strcmp(t.ComplaintLocation,'S14'));
% convert to date time
t.Date = datetime(t.Date, 'Format', 'MM/dd/yyyy');
% get unique dates

T1=datetime('01/01/2010');
T2=datetime('12/31/2019');
T=T1:T2;
unique_dates = T;
for ii =1: length(unique_dates)
tmp = t(ismember(t.Date, unique_dates(ii)),:);
% get unique dates
data(ii).Date = unique_dates(ii);
data(ii).S1 = sum(tmp.S1);
data(ii).S2 = sum(tmp.S2);
data(ii).S3 = sum(tmp.S3);
data(ii).S4 = sum(tmp.S4);
data(ii).S5 = sum(tmp.S5);
data(ii).S6 = sum(tmp.S6);
data(ii).S7 = sum(tmp.S7);
data(ii).S8 = sum(tmp.S8);
data(ii).S9 = sum(tmp.S9);
data(ii).S10 = sum(tmp.S10);
data(ii).S11 = sum(tmp.S11);
data(ii).S12 = sum(tmp.S12);
data(ii).S13 = sum(tmp.S13);
data(ii).S14 = sum(tmp.S14);
end
% convert structure to table
data = struct2table(data);
temp = data;
temp.day = day(temp.Date,'dayofyear');
temp.Date = [];
% Now use varfun with @mean as your function and Day as your GroupingVariable
dailyAverage = varfun(@sum,temp,'GroupingVariables','day');
temp = data;
temp.Year = year(temp.Date);
temp.Date = [];
Any help will be appreciated.
Thanks

Best Answer

  • Not 100% sure what you are trying to end up with, but my guess is the average number of each type of complaint for each day of the year: 365 of them. You are on the right track using 'dayofyear', but there are easier ways to get to that point. First ,read the data into a timetable.
    >> tt1 = readtimetable('SSArcGIS.xls');
    >> tt1.ComplaintLocation = categorical(tt1.ComplaintLocation);
    >> head(tt1)
    ans =
    8×1 timetable
    Date ComplaintLocation
    ___________ _________________
    02-Jan-2010 S13
    02-Jan-2010 S11
    02-Jan-2010 S5
    04-Jan-2010 S5
    05-Jan-2010 S13
    06-Jan-2010 S12
    06-Jan-2010 S5
    06-Jan-2010 S3
    You have one row per complaint, across all types in one variable. You want one variable for each type. unstack is good at doing that. In this case, you want counts, so you'll need to add a dummy variable to "count up".
    >> tt1.Dummy = ones(height(tt1),1);
    >> tt2 = unstack(tt1,'Dummy','ComplaintLocation');
    >> head(tt2)
    ans =
    8×14 timetable
    Date S1 S10 S11 S12 S13 S14 S2 S3 S4 S5 S6 S7 S8 S9
    ___________ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___
    02-Jan-2010 NaN NaN 1 NaN 1 NaN NaN NaN NaN 1 NaN NaN NaN NaN
    04-Jan-2010 NaN NaN NaN NaN NaN NaN NaN NaN NaN 1 NaN NaN NaN NaN
    05-Jan-2010 NaN NaN NaN NaN 1 NaN NaN NaN NaN NaN NaN NaN NaN NaN
    06-Jan-2010 NaN NaN NaN 1 NaN NaN NaN 1 NaN 1 1 NaN NaN NaN
    07-Jan-2010 1 NaN 1 1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
    08-Jan-2010 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1 NaN
    09-Jan-2010 NaN NaN NaN 1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
    13-Jan-2010 NaN NaN NaN NaN NaN 1 NaN NaN NaN NaN NaN NaN 1 NaN
    Unstack uses @sum by default to "aggregate", in other words, each date might have three S1 complaints, so unstack would compute sum([1;1;1]), and fill the corresponding element of the output with 3. For empty cases, sum returns NaN. It would be easy to use the AggregationFunction parameter to fill those with 0, but it's even easier to replace the NaNs afterwards.
    >> tt2 = fillmissing(tt2,'Constant',0);
    >> head(tt2)
    ans =
    8×14 timetable
    Date S1 S10 S11 S12 S13 S14 S2 S3 S4 S5 S6 S7 S8 S9
    ___________ __ ___ ___ ___ ___ ___ __ __ __ __ __ __ __ __
    02-Jan-2010 0 0 1 0 1 0 0 0 0 1 0 0 0 0
    04-Jan-2010 0 0 0 0 0 0 0 0 0 1 0 0 0 0
    05-Jan-2010 0 0 0 0 1 0 0 0 0 0 0 0 0 0
    06-Jan-2010 0 0 0 1 0 0 0 1 0 1 1 0 0 0
    07-Jan-2010 1 0 1 1 0 0 0 0 0 0 0 0 0 0
    08-Jan-2010 0 0 0 0 0 0 0 0 0 0 0 0 1 0
    09-Jan-2010 0 0 0 1 0 0 0 0 0 0 0 0 0 0
    13-Jan-2010 0 0 0 0 0 1 0 0 0 0 0 0 1 0
    The data skips quite a few days:
    >> caldiff(tt1.Date([1 end]),'days')
    ans =
    calendarDuration
    3651d
    >> height(tt2)
    ans =
    3122
    One wonders if maybe the complaint center is closed on weekends?
    >> unique(day(tt2.Date,'dow'))
    ans =
    1
    2
    3
    4
    5
    6
    7
    Nope, apparently not, so all those missing days must have had no complaints. To get the true mean per day, you'll need to add them in.
    >> t = datetime(2010,1,1):caldays(1):datetime(2019,12,31);
    >> tt3 = retime(tt2,t,'FillWithConstant','Constant',0);
    >> head(tt3)
    ans =
    8×14 timetable
    Date S1 S10 S11 S12 S13 S14 S2 S3 S4 S5 S6 S7 S8 S9
    ___________ __ ___ ___ ___ ___ ___ __ __ __ __ __ __ __ __
    01-Jan-2010 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    02-Jan-2010 0 0 1 0 1 0 0 0 0 1 0 0 0 0
    03-Jan-2010 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    04-Jan-2010 0 0 0 0 0 0 0 0 0 1 0 0 0 0
    05-Jan-2010 0 0 0 0 1 0 0 0 0 0 0 0 0 0
    06-Jan-2010 0 0 0 1 0 0 0 1 0 1 1 0 0 0
    07-Jan-2010 1 0 1 1 0 0 0 0 0 0 0 0 0 0
    08-Jan-2010 0 0 0 0 0 0 0 0 0 0 0 0 1 0
    >> height(tt3)
    ans =
    3652
    Now we're in business: add day of year as a grouping variable, and compute grouped means. Once you have day of year, the actual dates are irrelevant, so convert to a table.
    >> tt3.DoY = day(tt3.Date,'dayofyear');
    >> tt4 = timetable2table(tt3,'ConvertRowTimes',false);
    >> tt4 = varfun(@mean,tt4,'GroupingVariable','DoY');
    >> head(tt4)
    ans =
    8×16 table
    DoY GroupCount mean_S1 mean_S10 mean_S11 mean_S12 mean_S13 mean_S14 mean_S2 mean_S3 mean_S4 mean_S5 mean_S6 mean_S7 mean_S8 mean_S9
    ___ __________ _______ ________ ________ ________ ________ ________ _______ _______ _______ _______ _______ _______ _______ _______
    1 10 0.1 0.1 0 0.4 0.1 0 0.1 0.1 0.1 0.1 0 0.1 0.1 0
    2 10 0 0.2 0.4 0.5 0.6 0.2 0 0.4 0.2 0.5 0.1 0.1 0.1 0
    3 10 0.6 0.7 0.1 0.7 0.4 0.2 0.3 0.4 0.2 1 0.1 0 0 0
    4 10 0.1 0.2 0.2 0.6 0.4 0.2 0.2 0.3 0.2 0.9 0 0.4 0.1 0.1
    5 10 0.1 0.2 0.1 0.3 0.3 0.3 0 0.3 0.2 0.3 0.4 0.1 0.4 0.1
    6 10 0 0.1 0.4 0.9 0.5 0.2 0.2 0.6 0.1 0.6 0.6 0.1 0.1 0.3
    7 10 0.1 0.1 0.4 0.8 0.3 0.3 0.1 0 0 0 0 0 0.2 0
    8 10 0.1 0.2 0.2 0.6 0.3 0.1 0.3 0.3 0.1 0.6 0.2 0.1 0.4 0
    You could also do that with groupsummary. Leap days? Just remove them.
    >> tt3(tt3.DoY==366,:)
    ans =
    2×15 timetable
    Date S1 S10 S11 S12 S13 S14 S2 S3 S4 S5 S6 S7 S8 S9 DoY
    ___________ __ ___ ___ ___ ___ ___ __ __ __ __ __ __ __ __ ___
    31-Dec-2012 0 0 0 0 0 0 0 0 0 4 0 0 1 0 366
    31-Dec-2016 0 0 0 0 0 0 0 0 0 0 0 0 0 0 366
    >> tt5 = tt4(tt4.DoY~=366,:);
    There were a bunch of steps there. Several of them could be combined, but simplicity.