Music & Sound Effects Best Practices

When you dub a video, the background music and sound effects don't disappear — Dubly automatically separates them from the voice and keeps them in the final mix. This article explains how that works and how to record source audio so you get clean results.

What Dubly does automatically

Every dub goes through stem separation as a standard step. The pipeline splits your source audio into two tracks:

A voice track (just the speech), used for transcription, translation, and voice synthesis.
A music-and-effects track (everything that isn't voice), preserved untouched.

When your dub is rendered, you get: dubbed voice on top of the original music and sound effects, mixed automatically. You don't need to upload a separate music track, and you don't need to edit anything afterwards.

For balance reasons the pipeline also keeps a very quiet, filtered version of the original voice in the background — it makes the dubbed audio feel more grounded in the scene without competing with the translation. You won't hear it consciously.

Where separation struggles

Stem separation is automatic but not magic. Quality drops when:

Music and voice occupy the same frequency range — especially mid-range vocals over a busy mix.
Music is louder than the speaker — the separator tries to isolate the voice, but aggressive music bleeds through.
Heavy effects on the speaker's audio — thick reverb, echo, auto-tune, or telephone filters confuse the model.
Sound effects that overlap speech — gunshots, screams, laughter right on top of dialogue can leak into the wrong track.

Typical symptom of a bad separation: the dubbed voice sounds thin or faint because part of the original voice stayed in the music track.

How to get the best results from the source

If you control the original recording:

Record dialogue with a dedicated microphone. Lavalier or shotgun, close to the mouth, directly to its own track.
Keep music under the speaker at –18 to –12 dB relative to dialogue during speaking passages. It's the single biggest fix.
Avoid heavy post-effects on the voice track (deep reverb, chorus, telephone filters). Clean dry voice separates cleanly.
If possible, supply video with music already ducked when someone is talking. Dubly handles the rest automatically.

If the source is already produced and you can't re-record:

For ad spots and trailers where music is crucial, check the final dub carefully — if the music sounds thin, consider uploading a separate music-and-voice version or contact support.
For podcasts, interviews, and vlogs with light background music, results are usually solid out of the box.

What you can't do

Dubly does not expose manual sliders for music vs. voice balance, per-segment ducking, or mute-music options. The mix is fully automatic. If you need precise control over the final mix, export the dubbed voice-only version from the dub detail page and re-mix it yourself in your editor.

Videos with no music at all

Stem separation still runs, but the "music" track is effectively silent. The dub is unaffected — you'll hear just the dubbed voice against the original ambient sound (room tone, etc.), exactly like the source.