MATLAB: Reading a large .dat file or some parts of it

.datfopenfreadfscanflarge filetextscan

Dear All,
I have a very large DAT file (almost 16GB ). It contains the electricity usage of 8000 customers for around 4 years recorded at every 30 minutes (so it has something around 8000*4*365*24*2 rows!)
MS Excel allows me to open this file, however it's obvious that it loads only a part of it. Based on that I could figure out that the format is something like this:
990814, 246745, 0, 2012-07-22 20:00:00, 3.25, 0,0,0,0
which corresponds with:
CUSTOMER_KEY, CALENDAR_KEY, EVENT_KEY, READING_DATETIME, GENERAL_SUPPLY_KWH, CONTROLLED_LOAD_KWH, GROSS_GENERATION_KWH, NET_GENERATION_KWH, OTHER_KWH
My main problem is that when I want to load it into MATLAB it can't do it because of RAM memory problems.
I read about fopen, fread, fscanf, textscan, etc. However I couldn't figure out if its is possible to read only a part of this DAT file instead of whole of it? Is it any command to read from for example the row 100 to row 1000 of this DAT file before loading whole of it into memory?
I only need the usage of about 1000 customers for one month.
Thanks in advance for your help.

Best Answer

  • The calling sequence for textscan is:
    textscan(SOURCE, FORMAT, COUNT, OPTIONS...)
    where SOURCE is either a file identifier or a string, FORMAT is a string, and COUNT is the maximum number of times to apply the FORMAT.
    So to read a particular portion of the file, you can use the Headerlines option to skip everything before there, and you can use the COUNT to give the number of lines to process.
    It is not exactly number of lines, though, because if you have empty lines then unless you have carefully chosen your options, the empty line will be considered leading whitespace that is automatically ignored without incrementing the count. It is more that, provided there is enough data, the count will be the number of rows of data that are returned.