Intro/Outro Detection
Overview
The marker detection system automatically identifies intro and outro sequences for TV episodes by clustering audio fingerprints. Episodes that share similar fingerprints in the first N minutes are flagged as having a common intro; similarly for the last M minutes (outro).
How It Works
Fingerprint Clustering Algorithm
The system uses Jaccard similarity to compare audio fingerprints:
- For each show, we collect all fingerprinted episodes
- We compare fingerprints pairwise using Jaccard similarity
- Episodes with similarity >= 85% are grouped together
- The largest group is selected as the canonical intro/outro
Jaccard Similarity Formula:
J(A, B) = |A ∩ B| / |A ∪ B|Where A and B are the character sets of two fingerprint strings.
Confidence Scoring
Confidence is calculated based on:
- Group size (50% weight): More episodes matching = higher confidence
- Average similarity (50% weight): Higher similarity within group = higher confidence
Final score: min(100, (size_score) + (similarity_score))
Configuration
Edit config/marker_detection.php:
return [
'intro_start_seconds' => 0,
'intro_max_duration' => 180, // Max intro length in seconds
'outro_max_duration' => 180, // Max outro length in seconds
'similarity_threshold' => 0.85, // Jaccard threshold (0.0–1.0)
'min_episodes_for_detection' => 3, // Minimum episodes needed
'job_queue_dir' => '/tmp/phlix_marker_jobs',
'worker_interval' => 30, // Worker poll interval
];Running the Background Worker
Start the detection worker as a separate process:
php scripts/run-marker-detection-worker.phpThe worker runs continuously and processes shows from the queue at the configured interval.
Queue Management
The worker uses a file-based queue in /tmp/phlix_marker_jobs/:
- Each show being processed has a
.lockfile - Use
MarkerCandidateStoreto manage the queue programmatically - The worker auto-removes lock files when processing completes
Data Flow
1. Library scan enqueues shows needing detection
2. BackgroundDetectorWorker polls queue
3. For each show:
a. IntroDetectionJob fetches all episodes
b. FingerprintClusterer groups similar fingerprints
c. MarkerCandidateRepository stores candidates in metadata_json
d. Lock file is removed (job complete)
4. F.3 API consumes marker candidates from metadata_jsonOutput Format
Detection results are stored in media_items.metadata_json:
{
"intro_candidate": {
"start_seconds": 0,
"end_seconds": 90,
"fingerprint": "representative_fingerprint",
"confidence": 85
},
"outro_candidate": {
"start_seconds": 2310,
"end_seconds": 2400,
"fingerprint": "representative_fingerprint",
"confidence": 80
}
}Classes
FingerprintClusterer- Jaccard-based clustering algorithmIntroDetectionJob- Orchestrates detection for a showMarkerCandidateStore- File-based job queueBackgroundDetectorWorker- Queue consumer loopMarkerCandidateRepository- Persists candidates to database
Testing
Run the test suite:
./vendor/bin/phpunit --testsuite UnitRun with coverage:
./vendor/bin/phpunit --coverage-textCoverage targets:
IntroDetectionJob>= 85%FingerprintClusterer>= 85%MarkerCandidateStore>= 85%