Files
DJ_Teaser_Clipper/DESIGN_RATIONALE.md
Thomas 2b81d5843a nwe files
some files
2026-01-29 10:51:19 +01:00

3.1 KiB
Raw Permalink Blame History

Design Rationale

This document explains why Auto Clip makes certain musical and engineering choices.


Why “bars” instead of arbitrary seconds?

Electronic music (especially trance) is structured in bars and phrases:

  • Most tracks are in 4/4
  • Changes happen on predictable boundaries (every 4/8/16/32 bars)

Cutting on bar boundaries reduces:

  • awkward mid-kick edits
  • off-grid transitions
  • “why does this feel wrong?” moments

Thats why the CLI exposes:

  • --bars (2 bars for rollcall, 4 bars for mini-mix feel)
  • --preroll-bars (start a bar earlier so the listener hears the groove before the highlight)

Why “pre-roll bars”?

Highlights often occur at an impact moment:

  • a stab
  • a fill
  • a drop hit

If you cut exactly at the highlight, the listener misses the lead-in groove. Pre-roll gives the ear context, so the transition feels like a DJ brought it in.

Practical defaults:

  • Rollcall: --bars 2 --preroll-bars 1
  • Mini-mix: --bars 4 --preroll-bars 1

Why energy + onset for highlight detection?

In EDM, “interesting” moments correlate with:

  • higher RMS energy (loudness/drive)
  • strong transient activity (onset strength)

A simple weighted sum (with robust normalization) is:

  • fast
  • local-only
  • works reasonably across many tracks

Its not perfect (pads/breakdowns can confuse it), but its a strong baseline.


Why Camelot (harmonic mixing)?

DJ transitions feel smoother when keys are compatible. The Camelot wheel provides a practical rule-of-thumb:

  • Same number A<->B (relative major/minor)
  • Same letter, number +/-1 (adjacent harmonies)

Auto Clip uses best-effort key detection and then maps to Camelot to:

  • reduce harmonic clashes
  • keep the teaser musically “coherent”

Caveats:

  • Key detection can be unreliable on pad-heavy sections, noise, or breakdowns
  • Thats why V3 calls it best-effort and V4 plans confidence-based fallback

Why “downbeat-ish” snap instead of full ML downbeat detection?

True downbeat detection often needs:

  • trained ML models
  • more complex pipelines
  • sometimes stems / better separation

Auto Clip stays local and lightweight. So we approximate downbeat by:

  • beat tracking grid
  • onset accent scoring at bar starts (kick/transient emphasis)

This typically yields:

  • better bar-aligned cuts than “nearest beat”
  • without heavy dependencies

Why 2-pass loudnorm?

When you cut from different tracks:

  • perceived loudness can jump wildly
  • the teaser feels amateur even if the edits are good

FFmpegs loudnorm supports 2-pass measurement + apply, which:

  • improves consistency
  • reduces clipping risk
  • keeps the teaser “radio ready” (for a promo)

Thats why V3 uses 2-pass loudnorm per clip.


Why this repo has V_1 / V_2 / V_3?

Keeping versions side-by-side has benefits:

  • V_1: minimal baseline
  • V_2: practical CLI + selection features
  • V_3: trance/DJ quality logic

It also makes it easy for contributors to:

  • understand evolution
  • debug regressions

V4 aims to unify this into a single stable CLI while retaining clarity.