How to use sed to replace a pattern at the end of each line in a file with fixed text


I want to compare two files of around 40 MB of comma separated values with lines like this:

hstar,default,"T9883Z ",0d59,c801,7332,5,20120914,4.343618767

For the two files, the last entry which is 4.343618767 in the above example varies between the two files, but almost all the other fields match identically.

I need to diff the two files to locate the few places where the entries other than the last vary between the two files.

I'm thinking the easiest way to do this is to use SED to process the two files and normalize the last field, looking for the number pattern after the seventh comma and replacing it with a fixed string like 9.999999999 on every line and then a simple diff will work.

However, I'm not sure how to construct a sed command to locate the seventh comma and replace the remaining string to the end of the line with a fixed string. What would such a sed command look like? I imagine I would need to use a regular expression but am not sure how to start the pattern after the seventh comma.

Best Answer

You do not have to look for the seventh column. Just go for the last one:

sed 's/,[^,]*$/,9.9999999999/'


,    match the comma
[    beginning of a character group
 ^   negation, i.e. do not match the following characters
 ,   comma
]    end of a character group
*    repeat the preceding thing zero or more times
$    match the end of line