Macos – Splitting a large txt file every 100 lines and including the original header (on a Mac)

macosscripttext editingtextwrangler

I am looking for a tool or script (Textwrangler or Terminal) that can split a larger text file every 100 lines counting from line 5 (the first 4 are header lines) and output individual .txt files which include the original header.

For instance


line1 / line4   HEADER
line5 / line265 DATA


line1/line4   HEADER
line5/line104 DATA

line1/line4   HEADER
line5/line104 DATA

line1/line4   HEADER
line5/line65  DATA

The text file uses Windows line breaks (CR LF) in case that matters.

I am currently doing this manually so any suggestions that can make this process more efficient are very welcome.

Best Answer

  1. Remove the header and put it into a separate file header.txt.
  2. split the data using split --lines=100 data.txt (this generate lots of files with 100 lines in them each named xaa xab xac and so on)
  3. Then prepend the header to each file for a in x??; do cat header.txt $a > $a.txt; done This results in your finished data files (with headers) being called xaa.txt xab.txt xac.txt ...

If the amount of data is so large (or you split on fewer lines) that xxx files is not enough split makes four letter named files. In that case insert an extra ? in the for-statement above.

To automate the extraction of the header use head -4 origdata.txt > header.txt to extract the first four lines. Use tail -n +4 origdata.txt > data.txt to extract everything except the first four lines. Now you have two files one with the header and one with the data. It should not be too hard to combine this to a script. (I have no access to bash today)