MATLAB: Convert files into matrix

multiple files

hi,
I have 177000 files, I have to create matrix contain all values in these files.
Each file was split using textscan to get
c{1},c{2},........
then convert it into matrix.
Then convert these matrices into one matrix.
the problem is these files contain some similar values, so I have to specify the similar values ,and drew all other attached values(row) with these values.
I tried running with 100 files to know running time , I found out the running time is very long for just 100 files.
I think if I find function can compare among c{1}for all files, and among c{2} for all files ,…etc . I think that will save time. I'm facing problem with this code:
targetdir = 'd:\social net\dataset\netflix\training_set';
targetfiles = '*.txt';
fileinfo = dir(fullfile(targetdir, targetfiles));
k=0;arr(:,:)=0; inc=0;k=0;y=1;
for i = 1: length(fileinfo)
thisfilename = fullfile(targetdir, fileinfo(i).name);
f=fopen(thisfilename,'r'); f1=fscanf(f,'%c'); f1(1:2)=[];
f2=fopen(thisfilename,'w'); fprintf(f2,'%c',f1);
f3=fopen(thisfilename,'r');
c = textscan(f,'%f %f %s','Delimiter',',','headerLines',1);
c1=c{1};c2=c{2}; c3=c{3};z=1;z1=1;z2=1;z3=0;
for k=1+k:length(c1)+inc
no=c1(z); arr1=arr(:,1); p=find(arr1==no);
if isempty(p)
j=1;
arr(y,j)=c1(z); arr(y,j+1)=i; arr(y,j+2)=c2(z);j=j+3;y=y+1;
else
ind(i,z1)=p;
L=arr(p,:);len=0;
for h=1:length(L)
if L(h)~=0
len=len+1;
end
end
len;
arr(p,len+1)=i;
arr(p,len+2)=c2(z);
z1=z1+1;
end
z=z+1;
end
inc=inc+length(c1);
[u,u1] =size(arr);
end
f4=fopen('netfile.txt','w');
for i=1:u
for j=1:u1
fprintf(f4,'%d ',arr(i,j));
end
fprintf(f4,'\n');
end
fclose all;
thanks

Best Answer

  • What version of MATLAB are you using? It looks like arr is growing in your loop. Prior to r2011a (???) preallocating a variable can speed things up. If you do not know the final size, reallocating in large chunks can speed things up.
    Where are the files saved (locally, network drive, flash drive, external harddrive)? A fast internal harddrive will give you the fastest read times.
    Have you tried using the profiler to find bottlenecks in the code.