Properly downmix 5.1 to stereo using ffmpeg


I have a 5.1 audio track from a film where front left and front right contains music, and center contains dialogue. Playing the 5.1 track in VLC blends everything together nicely.

I'm trying to convert the 5.1 track to stereo using ffmpeg -ac 2, however the resulting stereo mix has a much weaker volume than playing the 5.1 track natively.

Adding -af "pan=stereo|c0=FL|c1=FR" gives the correct volume, but then there is no dialogue because the center channel is not included.

So the solution is maybe to mix left/center/right into stereo, and throw out the back end subwoofer channels? (I'm guessing here…)

So the question is: How do I make ffmpeg downmix 5.1 to stereo the same way VLC does it, with the same strong volume in the end result?

Best Answer

The answers on this question have since become of a bit of a mess, with many containing redundant information and others complete inaccuracies. This answer is an attempt to streamline the information in these answers while doing away with the problems in them.

Most importantly, it's worth bearing in mind that Gregory's answer, currently the top-voted answer to this question, is no different than using the -ac 2 switch - more on this below.

Downmixing a 5.1 channel audio stream to stereo with -ac 2

FFmpeg comes with built-in capabilities for downmixing a 5.1 track to stereo, and this is also the solution that FFmpeg's own documentation recommends:

Note: ffmpeg integrates a default down-mix (and up-mix) system that should be preferred (the -ac option) over the pan filter unless you have very specific needs.

The -ac 2 switch works by mixing proportions of the first 5 channels from the source's 6-channel stream - Back Left, Back Right, Front Left, Front Right and Front Center - into the Front Left and Front Right channels of the output stereo stream:

enter image description here

When doing so, audio from the LFE channel (the .1 in 5.1, reserved for the subwoofer and used for deep, low-frequency effects) is discarded completely when using this option.

Unfortunately, in my tests -ac 2 resulted in overall levels of both music and dialogue that were the most different to the source, making it the downmix formula that gives the worst output out of all the formulae I tested, although you may test it and find that it gives you a perfectly adequate downmix for your needs, in which case using any other formula would be overkill for you.

To downmix a DTS track with -ac 2 without transcoding it (i.e. to keep its codec and extension the same):

ffmpeg -i "sourcetrack.dts" -c:a dca -ac 2 "stereotrack.dts"

As pointed out by Mephisto in his answer, if the dialogue and the music sound well-balanced among each other to you but simply lack volume, you can downmix the stream while also increasing its volume:

ffmpeg -i "sourcetrack.dts" -c:a dca -ac 2 -vol 425 "stereotrack.dts"

For the -vol switch, 100% volume in the source is equivalent to the integer value 256, and using a larger value than this will increase the overall volume of the audio stream. However, note that doing so too much may result in distortion or artifacts, especially during its louder sections.

To downmix an audio stream to stereo and transcode it to the AC3 codec, for example:

ffmpeg -i "sourcetrack.dts" -c:a ac3 -ac 2 "stereotrack.ac3"

Downmixing a 5.1 channel audio stream to stereo with a custom mix algorithm

If you want a more high quality downmix, or you absolutely must include the LFE stream into your output, you can use FFmpeg's audio filter switch (-af) to downmix the audio using a custom mix formula.

Downmixing with the ATSC formula (Gregory's answer)

As of the time of posting this answer, the top-voted answer to this question was Gregory's, which puts the formula from the ATSC specification (see section 7.8.2, Downmixing into Two Channels) into an FFmpeg audio filter. This specification is itself directly linked to by the FFmpeg documentation on the topic, indicating it's highly likely to be the same formula that FFmpeg already implements for its -ac 2 switch. If this is true, then typing out the entire formula in Gregory's answer would be no different than using the -ac 2 switch, and therefore a waste of time.

I decided to test this for certain by re-encoding the same input audio using both -ac 2 and the -af filter from Gregory's answer (the exact commands used can be seen in the footnotes to this answer).

I then compared the sizes of the resulting output files and found they were, byte-for-byte, the same size:

enter image description here

Finally, I opened both of the two output files in Audacity, and compared their waveforms to confirm they were identical (click to enlarge):

enter image description here

It therefore seems pretty conclusive that the ATSC formula detailed in Gregory's answer is the same one already implemented by FFmpeg, and that using it is entirely redundant when it does nothing that -ac 2 doesn't, and is a much more cumbersome command.

Downmixing without discarding the LFE channel (Dave_750's answer)

Of the several included in the answers, this is the only one of the downmix formulae that appears to mix the LFE channel into the output stereo instead of discarding it entirely, and as a result, the one that ensures the least sound from the source is lost.

The overall volume level is higher and fuller than doing -ac 2, but also still lower than the below Nightmode Dialogue downmix. However, music levels are much closer to source than the Nightmode Dialogue downmix, and due to inclusion of the LFE track, increasing the volume of the output while using this downmix formula can create an output stream that sounds truer to the 5.1 source than all other formulae I tested.

If you have the ability, I would highly recommend encoding your audio stream(s) using both this downmix formula and the Nightmode Dialogue downmix, and carefully comparing the waveforms of the two to determine which one is better.

To downmix a 5.1 track to stereo using this formula and increase its volume level to 425 (where 256 is 100% of the original source's volume level):

ffmpeg -i "sourcetrack.dts" -c dca -vol 425 -af "pan=stereo|FL=0.5*FC+0.707*FL+0.707*BL+0.5*LFE|FR=0.5*FC+0.707*FR+0.707*BR+0.5*LFE" "outputstereo.dts"

Downmixing with Robert Collier's Nightmode Dialogue (Shane Harrelson's answer)

The Nightmode Dialogue formula, created by Robert Collier on the Doom9 forum and sourced by Shane Harrelson in his answer, results in a far better downmix than the -ac 2 switch - instead of overly quiet dialogues, it brings them back to levels that are much closer to the source.

From Robert Collier's description of the mix:

After converting many DTS movie tracks from 5.1 to 2.0 using eac3to, I have found the default eac3to channel mappings to result in very quiet dialogues and overly loud music and action scenes. Although the eac3to channel downmix coefficients have a scientific basis, they often do not sound good in practice bceause of low dialogue volume. This preset is for those looking for clear dialogues with left and right channel music still being audible but more in the background.

As you can see - front center (dialogues) come in properly now and stay at the original level - while the music and explosions remain a background effect and don't overpower you. This preset solves the problem of you having to constantly fiddle with the volume knob when watching DTS 5.1 converted to 2.0 movies in order to hear dialogues. (Especially for watching movies in the night where you don't want to wake others but still want to be able to hear dialogues).

Unfortunately, the music of this downmix formula is much lower than in the 5.1 source (which was likely by design considering Collier's intention to create a "nightmode" mix) and due to complete loss of the LFE track, the overall output audio doesn't sound as full or close to source as Dave_750's formula with boosted volume.

However, if for some reason you want to avoid boosting the overall volume of the stream, then the Nightmode Dialogue would likely be your best option - though again, I would highly recommend encoding your audio stream to both and comparing the waveforms of the two carefully.

To downmix with the Nightmode Dialogue formula in FFmpeg:

ffmpeg -i "sourcetrack.dts" -c dca -af "pan=stereo|FL=FC+0.30*FL+0.30*BL|FR=FC+0.30*FR+0.30*BR" "stereotrack.dts" 

Tarc's answer

This answer simply puts the Nightmode Dialogue downmix formula from Shane Harrelson's answer into a command to convert the audio stream in an MKV container. While the command given in this answer would work fine on such an audio stream, adapting it for a standalone audio track would give the error:

Filtering and streamcopy cannot be used together

This is because the audio codec cannot be copied when downmixing - like all other changes FFmpeg makes to an output stream, a downmix requires that the track be re-encoded for the changes to be applied.

This command also included a redundant -ac 2 switch which FFmpeg would have ignored.

Test commands

To demonstrate the reliability of the tests I conducted for this answer, below are all of the commands I used to test each downmix formula.

The test command used for the -ac 2 option:

ffmpeg -i "signed16bitPCM.wav" -c pcm_s16le -ac 2 "Audio 1 (-ac 2).wav"

The test command used for Gregory's answer:

ffmpeg -i "signed16bitPCM.wav" -c pcm_s16le -af "pan=stereo|FL < 1.0*FL + 0.707*FC + 0.707*BL|FR < 1.0*FR + 0.707*FC + 0.707*BR" "Audio 2 (ATSC Algorithm Downmix).wav"

The test command used for Dave_750's answer:

ffmpeg -i "signed16bitPCM.wav" -c pcm_s16le -vol 425 -af "pan=stereo|FL=0.5*FC+0.707*FL+0.707*BL+0.5*LFE|FR=0.5*FC+0.707*FR+0.707*BR+0.5*LFE" "Audio 4 (Dave750 Downmix).wav"

The test command used for Shane Harrelson's answer:

ffmpeg -i "signed16bitPCM.wav" -c pcm_s16le -af "pan=stereo|FL=FC+0.30*FL+0.30*BL|FR=FC+0.30*FR+0.30*BR" "Audio 3 (Nightmode Dialogue Downmix).wav"
Related Question