# Design Rationale This document explains *why* Auto Clip makes certain musical and engineering choices. --- ## Why “bars” instead of arbitrary seconds? Electronic music (especially trance) is structured in **bars** and **phrases**: - Most tracks are in **4/4** - Changes happen on predictable boundaries (every 4/8/16/32 bars) Cutting on bar boundaries reduces: - awkward mid-kick edits - off-grid transitions - “why does this feel wrong?” moments That’s why the CLI exposes: - `--bars` (2 bars for rollcall, 4 bars for mini-mix feel) - `--preroll-bars` (start a bar earlier so the listener hears the groove before the highlight) --- ## Why “pre-roll bars”? Highlights often occur at an impact moment: - a stab - a fill - a drop hit If you cut *exactly* at the highlight, the listener misses the *lead-in groove*. Pre-roll gives the ear context, so the transition feels like a DJ brought it in. Practical defaults: - Rollcall: `--bars 2 --preroll-bars 1` - Mini-mix: `--bars 4 --preroll-bars 1` --- ## Why energy + onset for highlight detection? In EDM, “interesting” moments correlate with: - higher RMS energy (loudness/drive) - strong transient activity (onset strength) A simple weighted sum (with robust normalization) is: - fast - local-only - works reasonably across many tracks It’s not perfect (pads/breakdowns can confuse it), but it’s a strong baseline. --- ## Why Camelot (harmonic mixing)? DJ transitions feel smoother when keys are compatible. The Camelot wheel provides a practical rule-of-thumb: - Same number A<->B (relative major/minor) - Same letter, number +/-1 (adjacent harmonies) Auto Clip uses **best-effort** key detection and then maps to Camelot to: - reduce harmonic clashes - keep the teaser musically “coherent” Caveats: - Key detection can be unreliable on pad-heavy sections, noise, or breakdowns - That’s why V3 calls it best-effort and V4 plans confidence-based fallback --- ## Why “downbeat-ish” snap instead of full ML downbeat detection? True downbeat detection often needs: - trained ML models - more complex pipelines - sometimes stems / better separation Auto Clip stays local and lightweight. So we approximate downbeat by: - beat tracking grid - onset accent scoring at bar starts (kick/transient emphasis) This typically yields: - better bar-aligned cuts than “nearest beat” - without heavy dependencies --- ## Why 2-pass loudnorm? When you cut from different tracks: - perceived loudness can jump wildly - the teaser feels amateur even if the edits are good FFmpeg’s loudnorm supports 2-pass measurement + apply, which: - improves consistency - reduces clipping risk - keeps the teaser “radio ready” (for a promo) That’s why V3 uses 2-pass loudnorm per clip. --- ## Why this repo has V_1 / V_2 / V_3? Keeping versions side-by-side has benefits: - V_1: minimal baseline - V_2: practical CLI + selection features - V_3: trance/DJ quality logic It also makes it easy for contributors to: - understand evolution - debug regressions V4 aims to unify this into a single stable CLI while retaining clarity.