Files
DJ_Teaser_Clipper/DESIGN_RATIONALE.md
Thomas 2b81d5843a nwe files
some files
2026-01-29 10:51:19 +01:00

118 lines
3.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Design Rationale
This document explains *why* Auto Clip makes certain musical and engineering choices.
---
## Why “bars” instead of arbitrary seconds?
Electronic music (especially trance) is structured in **bars** and **phrases**:
- Most tracks are in **4/4**
- Changes happen on predictable boundaries (every 4/8/16/32 bars)
Cutting on bar boundaries reduces:
- awkward mid-kick edits
- off-grid transitions
- “why does this feel wrong?” moments
Thats why the CLI exposes:
- `--bars` (2 bars for rollcall, 4 bars for mini-mix feel)
- `--preroll-bars` (start a bar earlier so the listener hears the groove before the highlight)
---
## Why “pre-roll bars”?
Highlights often occur at an impact moment:
- a stab
- a fill
- a drop hit
If you cut *exactly* at the highlight, the listener misses the *lead-in groove*.
Pre-roll gives the ear context, so the transition feels like a DJ brought it in.
Practical defaults:
- Rollcall: `--bars 2 --preroll-bars 1`
- Mini-mix: `--bars 4 --preroll-bars 1`
---
## Why energy + onset for highlight detection?
In EDM, “interesting” moments correlate with:
- higher RMS energy (loudness/drive)
- strong transient activity (onset strength)
A simple weighted sum (with robust normalization) is:
- fast
- local-only
- works reasonably across many tracks
Its not perfect (pads/breakdowns can confuse it), but its a strong baseline.
---
## Why Camelot (harmonic mixing)?
DJ transitions feel smoother when keys are compatible.
The Camelot wheel provides a practical rule-of-thumb:
- Same number A<->B (relative major/minor)
- Same letter, number +/-1 (adjacent harmonies)
Auto Clip uses **best-effort** key detection and then maps to Camelot to:
- reduce harmonic clashes
- keep the teaser musically “coherent”
Caveats:
- Key detection can be unreliable on pad-heavy sections, noise, or breakdowns
- Thats why V3 calls it best-effort and V4 plans confidence-based fallback
---
## Why “downbeat-ish” snap instead of full ML downbeat detection?
True downbeat detection often needs:
- trained ML models
- more complex pipelines
- sometimes stems / better separation
Auto Clip stays local and lightweight.
So we approximate downbeat by:
- beat tracking grid
- onset accent scoring at bar starts (kick/transient emphasis)
This typically yields:
- better bar-aligned cuts than “nearest beat”
- without heavy dependencies
---
## Why 2-pass loudnorm?
When you cut from different tracks:
- perceived loudness can jump wildly
- the teaser feels amateur even if the edits are good
FFmpegs loudnorm supports 2-pass measurement + apply, which:
- improves consistency
- reduces clipping risk
- keeps the teaser “radio ready” (for a promo)
Thats why V3 uses 2-pass loudnorm per clip.
---
## Why this repo has V_1 / V_2 / V_3?
Keeping versions side-by-side has benefits:
- V_1: minimal baseline
- V_2: practical CLI + selection features
- V_3: trance/DJ quality logic
It also makes it easy for contributors to:
- understand evolution
- debug regressions
V4 aims to unify this into a single stable CLI while retaining clarity.