nwe files

some files
2026-01-29 10:51:19 +01:00
parent abf2109171
commit 2b81d5843a
7 changed files with 485 additions and 0 deletions
--- a/DESIGN_RATIONALE.md
+++ b/DESIGN_RATIONALE.md
@@ -0,0 +1,117 @@
+# Design Rationale
+
+This document explains *why* Auto Clip makes certain musical and engineering choices.
+
+---
+
+## Why “bars” instead of arbitrary seconds?
+
+Electronic music (especially trance) is structured in **bars** and **phrases**:
+- Most tracks are in **4/4**
+- Changes happen on predictable boundaries (every 4/8/16/32 bars)
+
+Cutting on bar boundaries reduces:
+- awkward mid-kick edits
+- off-grid transitions
+- “why does this feel wrong?” moments
+
+That’s why the CLI exposes:
+- `--bars` (2 bars for rollcall, 4 bars for mini-mix feel)
+- `--preroll-bars` (start a bar earlier so the listener hears the groove before the highlight)
+
+---
+
+## Why “pre-roll bars”?
+
+Highlights often occur at an impact moment:
+- a stab
+- a fill
+- a drop hit
+
+If you cut *exactly* at the highlight, the listener misses the *lead-in groove*.
+Pre-roll gives the ear context, so the transition feels like a DJ brought it in.
+
+Practical defaults:
+- Rollcall: `--bars 2 --preroll-bars 1`
+- Mini-mix: `--bars 4 --preroll-bars 1`
+
+---
+
+## Why energy + onset for highlight detection?
+
+In EDM, “interesting” moments correlate with:
+- higher RMS energy (loudness/drive)
+- strong transient activity (onset strength)
+
+A simple weighted sum (with robust normalization) is:
+- fast
+- local-only
+- works reasonably across many tracks
+
+It’s not perfect (pads/breakdowns can confuse it), but it’s a strong baseline.
+
+---
+
+## Why Camelot (harmonic mixing)?
+
+DJ transitions feel smoother when keys are compatible.
+The Camelot wheel provides a practical rule-of-thumb:
+
+- Same number A<->B (relative major/minor)
+- Same letter, number +/-1 (adjacent harmonies)
+
+Auto Clip uses **best-effort** key detection and then maps to Camelot to:
+- reduce harmonic clashes
+- keep the teaser musically “coherent”
+
+Caveats:
+- Key detection can be unreliable on pad-heavy sections, noise, or breakdowns
+- That’s why V3 calls it best-effort and V4 plans confidence-based fallback
+
+---
+
+## Why “downbeat-ish” snap instead of full ML downbeat detection?
+
+True downbeat detection often needs:
+- trained ML models
+- more complex pipelines
+- sometimes stems / better separation
+
+Auto Clip stays local and lightweight.
+So we approximate downbeat by:
+- beat tracking grid
+- onset accent scoring at bar starts (kick/transient emphasis)
+
+This typically yields:
+- better bar-aligned cuts than “nearest beat”
+- without heavy dependencies
+
+---
+
+## Why 2-pass loudnorm?
+
+When you cut from different tracks:
+- perceived loudness can jump wildly
+- the teaser feels amateur even if the edits are good
+
+FFmpeg’s loudnorm supports 2-pass measurement + apply, which:
+- improves consistency
+- reduces clipping risk
+- keeps the teaser “radio ready” (for a promo)
+
+That’s why V3 uses 2-pass loudnorm per clip.
+
+---
+
+## Why this repo has V_1 / V_2 / V_3?
+
+Keeping versions side-by-side has benefits:
+- V_1: minimal baseline
+- V_2: practical CLI + selection features
+- V_3: trance/DJ quality logic
+
+It also makes it easy for contributors to:
+- understand evolution
+- debug regressions
+
+V4 aims to unify this into a single stable CLI while retaining clarity.