FFmpeg add multiple audio files to video at specific points

audioffmpegvideo

I have two audio files, each about 3 minutes long. I want to take the first 10 seconds of each file and add them each to a video file at specific points – 0 seconds and 10 seconds. So the resulting video should be 20 seconds long.

I've got this far:

ffmpeg -i video.mov -ss 0 -t 20 -itsoffset 0 -i audio1.mp3 -itsoffset 10 -i audio2.mp3 -acodec copy -vcodec copy out.mov

…but the resulting video has 20 seconds of the first audio file only, the second audio file doesn't start at 10 seconds like it should.

Any help would be appreciated, thanks!

As requested here is the console output:

  configuration: --cc=/opt/local/bin/clang-mp-3.3 --prefix=/Users/tessus/data/ext/ffmpeg/sw --as=yasm --extra-version=tessus --disable-shared --enable-static --disable-ffplay --enable-gpl --enable-pthreads --enable-postproc --enable-libmp3lame --enable-libtheora --enable-libvorbis --enable-libx264 --enable-libx265 --enable-libxvid --enable-libspeex --enable-bzlib --enable-zlib --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libxavs --enable-version3 --enable-libvo-aacenc --enable-libvo-amrwbenc --enable-libvpx --enable-libgsm --enable-libopus --enable-libmodplug --enable-fontconfig --enable-libfreetype --enable-libass --enable-libbluray --enable-filters --disable-indev=qtkit --enable-runtime-cpudetect
  libavutil      52. 66.100 / 52. 66.100
  libavcodec     55. 52.102 / 55. 52.102
  libavformat    55. 33.100 / 55. 33.100
  libavdevice    55. 10.100 / 55. 10.100
  libavfilter     4.  2.100 /  4.  2.100
  libswscale      2.  5.102 /  2.  5.102
  libswresample   0. 18.100 /  0. 18.100
  libpostproc    52.  3.100 / 52.  3.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/Users/arran/Documents/video.mov':
  Metadata:
    major_brand     : qt  
    minor_version   : 537199360
    compatible_brands: qt  
    creation_time   : 2014-06-03 02:27:17
  Duration: 00:00:10.00, start: 0.000000, bitrate: 981 kb/s
    Stream #0:0(eng): Video: mpeg4 (Advanced Simple Profile) (mp4v / 0x7634706D), yuv420p, 1920x1080 [SAR 1:1 DAR 16:9], 975 kb/s, 1 fps, 1 tbr, 100 tbn, 1k tbc (default)
    Metadata:
      creation_time   : 2014-06-03 02:27:17
      handler_name    : Apple Alias Data Handler
Input #1, mp3, from '/Users/arran/Documents/audio1.mp3':
  Metadata:
    title           : Audio1
    track           : 1
  Duration: 00:02:51.68, start: 0.025057, bitrate: 256 kb/s
    Stream #1:0: Audio: mp3, 44100 Hz, stereo, s16p, 256 kb/s
[mp3 @ 0x102032c00] Estimating duration from bitrate, this may be inaccurate
Input #2, mp3, from '/Users/arran/Documents/audio2.mp3':
  Metadata:
    track           : 1
    title           : Audio2
  Duration: 00:04:32.46, start: 0.000000, bitrate: 320 kb/s
    Stream #2:0: Audio: mp3, 44100 Hz, stereo, s16p, 320 kb/s
Output #0, mov, to '/Users/arran/Documents/out.mov':
  Metadata:
    major_brand     : qt  
    minor_version   : 537199360
    compatible_brands: qt  
    encoder         : Lavf55.33.100
    Stream #0:0(eng): Video: mpeg4 (mp4v / 0x7634706D), yuv420p, 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 975 kb/s, 1 fps, 12800 tbn, 100 tbc (default)
    Metadata:
      creation_time   : 2014-06-03 02:27:17
      handler_name    : Apple Alias Data Handler
    Stream #0:1: Audio: mp3 (.mp3 / 0x33706D2E), 44100 Hz, stereo, 256 kb/s
Stream mapping:
  Stream #0:0 -> #0:0 (copy)
  Stream #1:0 -> #0:1 (copy)
Press [q] to stop, [?] for help
frame=   10 fps=0.0 q=-1.0 Lsize=    1821kB time=00:00:20.03 bitrate= 744.5kbits/s    
video:1190kB audio:626kB subtitle:0 data:0 global headers:0kB muxing overhead 0.245487%

Best Answer

  • This is similar to: ffmpeg add new audio in video (mixing 2 audio), but I guess it's different enough not to be an exact duplicate.

    atrim filter method

    You can use the atrim audio filter to select your desired segments, the asetpts filter to reset timestamps to 0, and the concat filter to concatenate or join them together:

    ffmpeg -i video.mov -i audio1.mp3 -i audio2.mp3 -filter_complex \
    "[1:a]atrim=end=10,asetpts=PTS-STARTPTS[a1]; \
     [2:a]atrim=end=10,asetpts=PTS-STARTPTS[a2]; \
     [a1][a2]concat=n=2:v=0:a=1[a] \"
    -map 0:v -map "[a]" -codec:v copy -codec:a libmp3lame -shortest output.mov
    

    One advantage of this method is that it is a single command, and additional filters can be applied to the filtergraph if desired. A downside is that it requires that the audio be re-encoded, but the video can be stream copied (re-muxed).

    multiple command method

    Using the concat demuxer:

    $ ffmpeg -i audio1.mp3 -i audio2.mp3 -map 0:a -t 10 -c copy temp1.mp3 \
      -map 1:a -t 10 -c copy temp2.mp3
    
    $ echo "file 'temp1.mp3'" >> input.txt
    
    $ echo "file 'temp2.mp3'" >> input.txt
    
    $ ffmpeg -i video.mov -f concat -i input.txt -map 0:v -map 1:a -c copy \
      -shortest output.mov
    

    Four commands, but there is no re-encoding, so the quality will be retained.

    Also see