I am running the code below on Windows 10 64 bits with an intel i5-8300H and a Nvidia GTX 1060:

`dev = gpuDevice;time = zeros(1, length(t)); % preallocation of time`

time2 = zeros(1,length(t)); % preallocation of time2

for i = 2:length(t)-1 %looping over time vector

wait(dev); tic; % Computation of dot product between A(:, i:-1:1) and C(:,1:i,1) + C(:,1:i,2) for each row

approx_conv_pair = sum(reshape(A(txN-i*N+1:txN-N).*(C(1:(i-1)*N) + C(txN+1:txN+(i-1)*N)),[N,i-1]),2); wait(dev); time(i)=toc; wait(dev); tic; % Computation of dot product between B(:, i:-1:1) and C(:,1:i,1) - C(:,1:i,2) for each row

approx_conv_impair = sum(reshape(B(txN-i*N+1:txN-N).*(C(1:(i-1)*N) - C(txN+1:txN+(i-1)*N)),[N,i-1]),2); wait(dev); time2(i)=toc;end

A, B and C are gpuArrays of size, respectively, (N, length(t)) (N, length(t)) and (N, length(t), 2) .

My indexing simply accesses A(:, i:-1:2), B(:, i:-1:2), C(:, 1:i-1, 1) and C(:, 1:i-1, 2) as column vectors.

Here's what I obtain when I plot time and time2 with respect to looping iterator i when N=252 and length(t) = 13200:

Does anybody knows why there's such a difference between execution times?

Is it due to my way of coding or something linked to overhead time on the GPU?

FYI I tried to invert the order of approx_conv_pair and approx_conv_impair and I observe the same problem (second operation almost twice as fast).

Thank you in advance!

## Best Answer