# MATLAB: Same gpu operation in loop but two speeds

execution-timegpuarraysloop operation

I am running the code below on Windows 10 64 bits with an intel i5-8300H and a Nvidia GTX 1060:
``dev = gpuDevice;time = zeros(1, length(t)); % preallocation of timetime2 = zeros(1,length(t)); % preallocation of time2for i = 2:length(t)-1 %looping over time vector     wait(dev); tic;    % Computation of dot product between A(:, i:-1:1) and C(:,1:i,1) + C(:,1:i,2) for each row    approx_conv_pair = sum(reshape(A(txN-i*N+1:txN-N).*(C(1:(i-1)*N) + C(txN+1:txN+(i-1)*N)),[N,i-1]),2);    wait(dev); time(i)=toc;    wait(dev); tic;    % Computation of dot product between B(:, i:-1:1) and C(:,1:i,1) - C(:,1:i,2) for each row    approx_conv_impair = sum(reshape(B(txN-i*N+1:txN-N).*(C(1:(i-1)*N) - C(txN+1:txN+(i-1)*N)),[N,i-1]),2);    wait(dev); time2(i)=toc;end``
A, B and C are gpuArrays of size, respectively, (N, length(t)) (N, length(t)) and (N, length(t), 2) .
My indexing simply accesses A(:, i:-1:2), B(:, i:-1:2), C(:, 1:i-1, 1) and C(:, 1:i-1, 2) as column vectors.
Here's what I obtain when I plot time and time2 with respect to looping iterator i when N=252 and length(t) = 13200: Does anybody knows why there's such a difference between execution times?
Is it due to my way of coding or something linked to overhead time on the GPU?
FYI I tried to invert the order of approx_conv_pair and approx_conv_impair and I observe the same problem (second operation almost twice as fast).