Wednesday, April 15, 2009

Reading multiple files in multiple folders

Here is another common problem faced by those reading in data generated by other packages, especially climate data, which arrive in multiple files, some times in multiple folders. To be more concrete, suppose there are "y" number of folders, each of which contain "x_y" number of files (allowing the number of files to vary by folder). Assuming that all files are of the same type (text, csv, matlab etc), the problem is to aggregate these files into one file (i.e. to generate one large file containing all the data).
Here is how one might go about it:

dirs=dir('*');
Dirnames={};
for i=1:numel(dirs)-2; % I do "-2" since the first 2 instances are always the hidden files
Dirnames{i}={dirs(i).name}; % creates as many directories as number of folders
end

for i=1:numel(dirs)-2
s = ['cd ' eval(['dirs(i+2).name']) ]; % the "+2" is to go the third folder, since first 2 are blank
eval(s) %changing directory each time to the required folder
filesStruct = dir('*.csv'); % collect list of all csv files in a folder
files = {filesStruct.name};
dataCell = cell(numel(files), 1); % create as many data cells as number of files in that folder
for j=1:numel(files)
dataCell{j} = load('-ascii',files{j}); %load data
end
Dirnames{i}=cat(1, dataCell{:}); % stack data one file below the other
cd ..
end
data=[];
data=cat(1,Dirnames{:}); % stack data one folder below the other
csvwrite('aggregate.csv',data);

The post above is really a modification of the one here

No comments: