MATLAB: Extract data from text file

data extractionextract datatext file

I have this 'sample data.txt' text file with the data not in the right form. I need to read this text file and extract the data and tabulate it in the order as shown in figure below. I am not sure how can I do it.
Really appreciate it if someone can help to guide me. Thank you.

Best Answer

  • That is a very badly formatted file. For example, the field delimiters are space characters and space characters also occur within the fields (without any text delimiters to group the fields together). There is no robust general solution for parsing such a poorly formatted file, altough in some limited cases (such as with prior knowledge of the field contents) you might be able to parse it but parsing such files will always be fragile. On that basis I assumed that the fields contain only the text in the number and types that you have shown, i.e. each line contains exactly:
    1. 1 or 2 words (starts with 'Basal' or 'Mid' or 'Apical', or constitutes 'Apex')
    2. 1 number
    3. 1 word
    4. ('Nil' or 'Present')
    5. ('Nil' or 'Present')
    6. ('Nil' or 'Full thickness' or a percentage)
    This matches all of the seventeen rows in your example data file:
    str = fileread('sample data.txt');
    rgx = ['(Apex|(Basal|Mid|Apical)\s+[A-Z][a-z]+)\s+(\d+)\s+([A-Z][a-z]+)',...
    '\s+(Nil|Present)\s+(Nil|Present)\s+(Nil|Full thickness|([<>]\s?)?\d+\%)'];
    tkn = regexpi(str,rgx,'tokens');
    tkn = vertcat(tkn{:})
    Giving:
    tkn =
    'Basal Anterior' '1' 'Hypokinetic' 'Nil' 'Nil' '50%'
    'Basal Anteroseptal' '2' 'Dyskinetic' 'Present' 'Present' 'Full thickness'
    'Basal Inferoseptal' '3' 'Hypokinetic' 'Present' 'Present' '50%'
    'Basal Inferior' '4' 'Hypokinetic' 'Nil' 'Present' '50%'
    'Basal Inferolateral' '5' 'Normal' 'Nil' 'Nil' 'Nil'
    'Basal Anterolateral' '6' 'Normal' 'Nil' 'Nil' 'Nil'
    'Mid Anterior' '7' 'Hypokinetic' 'Nil' 'Nil' '<50%'
    'Mid Anteroseptal' '8' 'Dyskinetic' 'Present' 'Present' 'Full thickness'
    'Mid Inferoseptal' '9' 'Akinetic' 'Present' 'Present' 'Full thickness'
    'Mid Inferior' '10' 'Hypokinetic' 'Nil' 'Present' '<50%'
    'Mid Inferolateral' '11' 'Normal' 'Nil' 'Nil' 'Nil'
    'Mid Anterolateral' '12' 'Normal' 'Nil' 'Nil' '<50%'
    'Apical Anterior' '13' 'Akinetic' 'Nil' 'Nil' '50%'
    'Apical Septal' '14' 'Akinetic' 'Nil' 'Nil' '< 50%'
    'Apical Inferior' '15' 'Akinetic' 'Nil' 'Nil' '> 50%'
    'Apical Lateral' '16' 'Hypokinetic' 'Nil' 'Nil' 'Full thickness'
    'Apex' '17' 'Akinetic' 'Nil' 'Nil' 'Full thickness'
    >> size(tkn)
    ans =
    17 6
    >>
    Clearly you can put that into a table if you really want to:
    >> hdr = {'LeftVentricularSegments','No','WallMotion','PerfusionAtRest','PerfusionAtStress','DelayedGadoliniumEnhancement'};
    >> T = cell2table(tkn,'VariableNames',hdr)
    T =
    LeftVentricularSegments No WallMotion PerfusionAtRest PerfusionAtStress DelayedGadoliniumEnhancement
    _______________________ ____ _____________ _______________ _________________ ____________________________
    'Basal Anterior' '1' 'Hypokinetic' 'Nil' 'Nil' '50%'
    'Basal Anteroseptal' '2' 'Dyskinetic' 'Present' 'Present' 'Full thickness'
    'Basal Inferoseptal' '3' 'Hypokinetic' 'Present' 'Present' '50%'
    'Basal Inferior' '4' 'Hypokinetic' 'Nil' 'Present' '50%'
    'Basal Inferolateral' '5' 'Normal' 'Nil' 'Nil' 'Nil'
    'Basal Anterolateral' '6' 'Normal' 'Nil' 'Nil' 'Nil'
    'Mid Anterior' '7' 'Hypokinetic' 'Nil' 'Nil' '<50%'
    'Mid Anteroseptal' '8' 'Dyskinetic' 'Present' 'Present' 'Full thickness'
    'Mid Inferoseptal' '9' 'Akinetic' 'Present' 'Present' 'Full thickness'
    'Mid Inferior' '10' 'Hypokinetic' 'Nil' 'Present' '<50%'
    'Mid Inferolateral' '11' 'Normal' 'Nil' 'Nil' 'Nil'
    'Mid Anterolateral' '12' 'Normal' 'Nil' 'Nil' '<50%'
    'Apical Anterior' '13' 'Akinetic' 'Nil' 'Nil' '50%'
    'Apical Septal' '14' 'Akinetic' 'Nil' 'Nil' '< 50%'
    'Apical Inferior' '15' 'Akinetic' 'Nil' 'Nil' '> 50%'
    'Apical Lateral' '16' 'Hypokinetic' 'Nil' 'Nil' 'Full thickness'
    'Apex' '17' 'Akinetic' 'Nil' 'Nil' 'Full thickness'