I have a .csv file which was exported from a proprietary electronic health records database system. I was a subscriber to their service but upon ending my subscription this is how I get to keep my data and migrate it to another system. In a nutshell, I need to import it into a database or spreadsheet so that I can then use Word mail merge to re-create the medical record documents in a readable format and print them out.

The problem is that when I open the file directly with Excel or import it with Access, there are formatting newlines within many of the data field entries that are getting misinterpreted as record separators. This causes Excel or Access to start a new row in the middle of the entry which of course screws things up. When I opened the file with Notepad ++ and turned on "Show all characters", I can see that the true record separator newlines are CR LF symbols, but the formatting newlines within the entries are LF symbols by themselves. From some quick reading I learned that CSV data field entries which contain a newline should ideally be enclosed in double quotes, but the ones in my file are not, so I presume this is the source of the problem.

How should I go about fixing this? Is there a way with Access, Excel, or any open source program to get it to render the LF's by themselves as a formatting newline within the data field entry? Can you convert the solo LF's to some other newline type so they would be properly rendered? Even if it only ignored the LF's which were not preceded by CR, then perhaps when I later use Word mail merge to output the documents it would then render the LF's properly as newlines? Another possibility I thought might work was to do some kind of a find/replace operation to replace all the solo LF's with another character until I imported it into the database, then replace that character with the proper kind of newline once it is in the database or after I have created the document via mail merge. Or is there some type of find/replace operation which could just enclose each field entry containing a solo LF in double quotes? Heck, even if there is a way to non-selectively enclose every data field entry in the file in double quotes it should still work, right? I just am not sure of the easiest way to proceed from here and appreciate any suggestions.

Best Answer

In notepad++ regular expression "[^\r]\n" will match LF, but not CRLF. Now replace this with whatever you want. This answer has useful information.