nwe files
some files
This commit is contained in:
117
DESIGN_RATIONALE.md
Normal file
117
DESIGN_RATIONALE.md
Normal file
@@ -0,0 +1,117 @@
|
||||
# Design Rationale
|
||||
|
||||
This document explains *why* Auto Clip makes certain musical and engineering choices.
|
||||
|
||||
---
|
||||
|
||||
## Why “bars” instead of arbitrary seconds?
|
||||
|
||||
Electronic music (especially trance) is structured in **bars** and **phrases**:
|
||||
- Most tracks are in **4/4**
|
||||
- Changes happen on predictable boundaries (every 4/8/16/32 bars)
|
||||
|
||||
Cutting on bar boundaries reduces:
|
||||
- awkward mid-kick edits
|
||||
- off-grid transitions
|
||||
- “why does this feel wrong?” moments
|
||||
|
||||
That’s why the CLI exposes:
|
||||
- `--bars` (2 bars for rollcall, 4 bars for mini-mix feel)
|
||||
- `--preroll-bars` (start a bar earlier so the listener hears the groove before the highlight)
|
||||
|
||||
---
|
||||
|
||||
## Why “pre-roll bars”?
|
||||
|
||||
Highlights often occur at an impact moment:
|
||||
- a stab
|
||||
- a fill
|
||||
- a drop hit
|
||||
|
||||
If you cut *exactly* at the highlight, the listener misses the *lead-in groove*.
|
||||
Pre-roll gives the ear context, so the transition feels like a DJ brought it in.
|
||||
|
||||
Practical defaults:
|
||||
- Rollcall: `--bars 2 --preroll-bars 1`
|
||||
- Mini-mix: `--bars 4 --preroll-bars 1`
|
||||
|
||||
---
|
||||
|
||||
## Why energy + onset for highlight detection?
|
||||
|
||||
In EDM, “interesting” moments correlate with:
|
||||
- higher RMS energy (loudness/drive)
|
||||
- strong transient activity (onset strength)
|
||||
|
||||
A simple weighted sum (with robust normalization) is:
|
||||
- fast
|
||||
- local-only
|
||||
- works reasonably across many tracks
|
||||
|
||||
It’s not perfect (pads/breakdowns can confuse it), but it’s a strong baseline.
|
||||
|
||||
---
|
||||
|
||||
## Why Camelot (harmonic mixing)?
|
||||
|
||||
DJ transitions feel smoother when keys are compatible.
|
||||
The Camelot wheel provides a practical rule-of-thumb:
|
||||
|
||||
- Same number A<->B (relative major/minor)
|
||||
- Same letter, number +/-1 (adjacent harmonies)
|
||||
|
||||
Auto Clip uses **best-effort** key detection and then maps to Camelot to:
|
||||
- reduce harmonic clashes
|
||||
- keep the teaser musically “coherent”
|
||||
|
||||
Caveats:
|
||||
- Key detection can be unreliable on pad-heavy sections, noise, or breakdowns
|
||||
- That’s why V3 calls it best-effort and V4 plans confidence-based fallback
|
||||
|
||||
---
|
||||
|
||||
## Why “downbeat-ish” snap instead of full ML downbeat detection?
|
||||
|
||||
True downbeat detection often needs:
|
||||
- trained ML models
|
||||
- more complex pipelines
|
||||
- sometimes stems / better separation
|
||||
|
||||
Auto Clip stays local and lightweight.
|
||||
So we approximate downbeat by:
|
||||
- beat tracking grid
|
||||
- onset accent scoring at bar starts (kick/transient emphasis)
|
||||
|
||||
This typically yields:
|
||||
- better bar-aligned cuts than “nearest beat”
|
||||
- without heavy dependencies
|
||||
|
||||
---
|
||||
|
||||
## Why 2-pass loudnorm?
|
||||
|
||||
When you cut from different tracks:
|
||||
- perceived loudness can jump wildly
|
||||
- the teaser feels amateur even if the edits are good
|
||||
|
||||
FFmpeg’s loudnorm supports 2-pass measurement + apply, which:
|
||||
- improves consistency
|
||||
- reduces clipping risk
|
||||
- keeps the teaser “radio ready” (for a promo)
|
||||
|
||||
That’s why V3 uses 2-pass loudnorm per clip.
|
||||
|
||||
---
|
||||
|
||||
## Why this repo has V_1 / V_2 / V_3?
|
||||
|
||||
Keeping versions side-by-side has benefits:
|
||||
- V_1: minimal baseline
|
||||
- V_2: practical CLI + selection features
|
||||
- V_3: trance/DJ quality logic
|
||||
|
||||
It also makes it easy for contributors to:
|
||||
- understand evolution
|
||||
- debug regressions
|
||||
|
||||
V4 aims to unify this into a single stable CLI while retaining clarity.
|
||||
Reference in New Issue
Block a user