Files

Thomas 2b81d5843a nwe files

some files

2026-01-29 10:51:19 +01:00

3.1 KiB

Raw Permalink Blame History

Design Rationale

This document explains why Auto Clip makes certain musical and engineering choices.

Why “bars” instead of arbitrary seconds?

Electronic music (especially trance) is structured in bars and phrases:

Most tracks are in 4/4
Changes happen on predictable boundaries (every 4/8/16/32 bars)

Cutting on bar boundaries reduces:

awkward mid-kick edits
off-grid transitions
“why does this feel wrong?” moments

That’s why the CLI exposes:

--bars (2 bars for rollcall, 4 bars for mini-mix feel)
--preroll-bars (start a bar earlier so the listener hears the groove before the highlight)

Why “pre-roll bars”?

Highlights often occur at an impact moment:

a stab
a fill
a drop hit

If you cut exactly at the highlight, the listener misses the lead-in groove. Pre-roll gives the ear context, so the transition feels like a DJ brought it in.

Practical defaults:

Rollcall: --bars 2 --preroll-bars 1
Mini-mix: --bars 4 --preroll-bars 1

Why energy + onset for highlight detection?

In EDM, “interesting” moments correlate with:

higher RMS energy (loudness/drive)
strong transient activity (onset strength)

A simple weighted sum (with robust normalization) is:

fast
local-only
works reasonably across many tracks

It’s not perfect (pads/breakdowns can confuse it), but it’s a strong baseline.

Why Camelot (harmonic mixing)?

DJ transitions feel smoother when keys are compatible. The Camelot wheel provides a practical rule-of-thumb:

Same number A<->B (relative major/minor)
Same letter, number +/-1 (adjacent harmonies)

Auto Clip uses best-effort key detection and then maps to Camelot to:

reduce harmonic clashes
keep the teaser musically “coherent”

Caveats:

Key detection can be unreliable on pad-heavy sections, noise, or breakdowns
That’s why V3 calls it best-effort and V4 plans confidence-based fallback

Why “downbeat-ish” snap instead of full ML downbeat detection?

True downbeat detection often needs:

trained ML models
more complex pipelines
sometimes stems / better separation

Auto Clip stays local and lightweight. So we approximate downbeat by:

beat tracking grid
onset accent scoring at bar starts (kick/transient emphasis)

This typically yields:

better bar-aligned cuts than “nearest beat”
without heavy dependencies

Why 2-pass loudnorm?

When you cut from different tracks:

perceived loudness can jump wildly
the teaser feels amateur even if the edits are good

FFmpeg’s loudnorm supports 2-pass measurement + apply, which:

improves consistency
reduces clipping risk
keeps the teaser “radio ready” (for a promo)

That’s why V3 uses 2-pass loudnorm per clip.

Why this repo has V_1 / V_2 / V_3?

Keeping versions side-by-side has benefits:

V_1: minimal baseline
V_2: practical CLI + selection features
V_3: trance/DJ quality logic

It also makes it easy for contributors to:

understand evolution
debug regressions

V4 aims to unify this into a single stable CLI while retaining clarity.

3.1 KiB Raw Permalink Blame History Unescape Escape