VT_Player/DEV_SPEC_FRAME_ACCURATE_PLAYBACK.md
Stu Leak 5e2171a95e Fix video loading and improve player UI
Major fixes:
- Fix drag-and-drop video loading bug (s.source was never set)
- Call switchToVideo() instead of showPlayerView() to properly initialize player state
- Show initial preview frame/thumbnail when video loads
- Improve ffprobe error messages (capture stderr for better diagnostics)

UI improvements:
- Move playlist from left to right side
- Add playlist toggle button (☰) with visibility control
- Load and display preview frame immediately when video loads
- Improve control layout with volume container
- Auto-hide playlist when only one video loaded

Documentation:
- Add FEATURE_ROADMAP.md tracking 30 planned features
- Add ICONS_NEEDED.md listing 53 required SVG icons
- Update .gitignore to exclude binaries

References: DEV_SPEC_FRAME_ACCURATE_PLAYBACK.md

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-05 10:01:59 -05:00

19 KiB

VT_Player Development Specification: Frame-Accurate Playback & Lossless Cutting

Project: VT_Player Target: Lightweight frame-accurate video player with keyframe navigation for lossless cutting Goal: Provide LosslessCut-style functionality using FFmpeg suite exclusively Performance: Competitive with existing tools, optimized for lightweight operation Status: Foundation exists, keyframe features need implementation


Design Philosophy

  • Lightweight First: Minimize memory footprint, avoid bloat
  • FFmpeg Suite Only: Use ffplay, ffmpeg, ffprobe - no external players
  • Hardware Accelerated: Leverage GPU when available, graceful fallback to CPU
  • Responsive UI: All operations feel instant (<100ms response time)
  • Smart Caching: Cache intelligently, clean up aggressively

Current State Analysis

What's Working ✓

  1. Basic Player Controller (internal/player/controller_linux.go)

    • ffplay integration with stdin control
    • Play/pause/seek/volume controls
    • Window embedding via xdotool/SDL
    • ~50MB memory footprint for playback
  2. Player UI (main.go:555-800)

    • Video loading and playlist
    • Basic controls and time display
    • Slider-based seeking
  3. Video Metadata (videoSource struct)

    • FFprobe metadata extraction
    • Duration/resolution/framerate parsing

What's Missing ✗

  1. No keyframe detection - Cannot identify I-frames
  2. No frame-by-frame navigation - Only time-based seeking
  3. No timeline visualization - No keyframe markers
  4. No in/out point marking - Cannot mark cut points
  5. No lossless cut functionality - No stream copy cutting
  6. No frame counter - Only shows time

Implementation Plan

Phase 1: Lightweight Keyframe Detection

Goal: <5s detection time for 1-hour video, <10MB memory overhead

Create internal/keyframe/detector.go

Strategy: Sparse Keyframe Index

package keyframe

// Keyframe represents an I-frame position
type Keyframe struct {
    FrameNum  int     // Frame number
    Timestamp float64 // Time in seconds
}

// Index holds keyframe positions (I-frames only, not all frames)
type Index struct {
    Keyframes   []Keyframe // Only I-frames (~1KB per minute of video)
    TotalFrames int
    Duration    float64
    FrameRate   float64
}

// DetectKeyframes uses FFprobe to find I-frames only
func DetectKeyframes(videoPath string) (*Index, error) {
    // ffprobe -v error -skip_frame nokey -select_streams v:0 \
    //   -show_entries frame=pkt_pts_time -of csv video.mp4
    //
    // -skip_frame nokey = Only I-frames (5-10x faster than scanning all frames)
    // Returns: 0.000000, 2.002000, 4.004000, ...

    cmd := exec.Command("ffprobe",
        "-v", "error",
        "-skip_frame", "nokey",  // KEY OPTIMIZATION: Only I-frames
        "-select_streams", "v:0",
        "-show_entries", "frame=pkt_pts_time",
        "-of", "csv=p=0",
        videoPath,
    )

    // Parse output, build Keyframe array
    // Memory: ~100 bytes per keyframe
    // 1-hour video @ 2s GOP = ~1800 keyframes = ~180KB
}

// FindNearestKeyframe returns closest I-frame to timestamp
func (idx *Index) FindNearestKeyframe(timestamp float64, direction string) *Keyframe {
    // Binary search (O(log n))
    // direction: "before", "after", "nearest"
}

// EstimateFrameNumber calculates frame # from timestamp
func (idx *Index) EstimateFrameNumber(timestamp float64) int {
    return int(timestamp * idx.FrameRate + 0.5)
}

Performance Targets:

  • 1-hour video detection: <5 seconds
  • Memory usage: <1MB for index
  • Cache size: <100KB per video

Cache Strategy:

// Cache in memory during playback, persist to disk
// Location: ~/.cache/vt_player/<video-hash>.kf
// Format: Binary (timestamp as float64, 8 bytes per keyframe)
// Invalidate if: video modified time changes

Phase 2: Frame-Accurate Seeking with ffplay

Goal: Precise navigation using existing ffplay controller

Extend internal/player/controller_linux.go

Position Tracking:

type ffplayController struct {
    // ... existing fields ...

    // NEW: Position tracking
    lastKnownPos  float64   // Last seek position
    lastKnownTime time.Time // When position was updated
    playState     bool      // true = playing, false = paused
}

// GetCurrentPosition estimates current position
func (c *ffplayController) GetCurrentPosition() float64 {
    c.mu.Lock()
    defer c.mu.Unlock()

    if !c.playState {
        // Paused: return last position
        return c.lastKnownPos
    }

    // Playing: estimate based on elapsed time
    elapsed := time.Since(c.lastKnownTime).Seconds()
    return c.lastKnownPos + elapsed
}

// SeekToFrame seeks to specific frame number
func (c *ffplayController) SeekToFrame(frameNum int, frameRate float64) error {
    timestamp := float64(frameNum) / frameRate
    return c.Seek(timestamp)
}

Frame Stepping Strategy:

// For single-frame steps: Use ffplay's built-in frame step
// ffplay keyboard command: 's' = step to next frame

var (
    keyStepForward = []byte{'s'} // Frame step
)

func (c *ffplayController) StepFrame(direction int) error {
    // Ensure paused
    if !c.paused {
        c.Pause()
    }

    if direction > 0 {
        // Step forward: Use 's' key
        return c.send(keyStepForward)
    } else {
        // Step backward: Seek back 1 frame
        currentPos := c.GetCurrentPosition()
        frameRate := c.frameRate // Store from metadata
        backOneFrame := currentPos - (1.0 / frameRate)
        return c.Seek(math.Max(0, backOneFrame))
    }
}

Memory Impact: +40 bytes per controller instance


Phase 3: Custom Timeline Widget with Keyframe Markers

Goal: Visual timeline, smooth interaction, minimal redraw overhead

Create internal/ui/timeline.go

Custom Fyne Widget:

package ui

import (
    "image/color"
    "fyne.io/fyne/v2"
    "fyne.io/fyne/v2/canvas"
    "fyne.io/fyne/v2/widget"
)

// TimelineWidget shows video timeline with keyframe markers
type TimelineWidget struct {
    widget.BaseWidget

    duration    float64      // Total duration
    position    float64      // Current position
    keyframes   []float64    // Keyframe timestamps
    inPoint     *float64     // In-point marker
    outPoint    *float64     // Out-point marker
    onChange    func(float64) // Callback on seek

    // Rendering cache (updated only on resize/data change)
    cachedBackground *canvas.Rectangle
    cachedKeyframes  []*canvas.Line
    cachedScrubber   *canvas.Line
}

// CreateRenderer implements fyne.Widget
func (t *TimelineWidget) CreateRenderer() fyne.WidgetRenderer {
    // Draw once, update only scrubber position on drag
    // Keyframe markers: 1px vertical yellow lines
    // In-point: 2px blue line
    // Out-point: 2px red line
    // Scrubber: 3px white line
}

// Lightweight: Only redraw scrubber on position change
// Full redraw only on resize or keyframe data change

Memory Impact: ~2KB per timeline widget Rendering: <5ms for 1000 keyframes


Phase 4: Enhanced Player UI

Goal: LosslessCut-style controls, keyboard-driven workflow

Update main.go showPlayerView (lines 555-800)

Layout:

┌──────────────────────────────────────────────┐
│  Video Display (ffplay window)  960x540     │
└──────────────────────────────────────────────┘
┌──────────────────────────────────────────────┐
│  [Timeline with keyframe markers]           │
│  [====|==|====I====|==O====|===]            │
└──────────────────────────────────────────────┘
┌──────────────────────────────────────────────┐
│  Frame: 1234 / 15000  Time: 0:41.400        │
│  [<<KF] [<Frame] [Play] [Frame>] [KF>>]     │
└──────────────────────────────────────────────┘
┌──────────────────────────────────────────────┐
│  [Set In] [Set Out] [Clear] [Export Cut]    │
└──────────────────────────────────────────────┘

Components:

// Frame counter (updates every 100ms when playing)
frameLabel := widget.NewLabel("Frame: 0 / 0")
timeLabel := widget.NewLabel("Time: 0:00.000")

// Frame navigation buttons
btnPrevKF := widget.NewButton("<<", func() {
    // Jump to previous keyframe
    kf := s.keyframeIndex.FindNearestKeyframe(currentPos, "before")
    s.player.Seek(kf.Timestamp)
})

btnPrevFrame := widget.NewButton("<", func() {
    // Step back 1 frame
    s.player.StepFrame(-1)
})

btnNextFrame := widget.NewButton(">", func() {
    // Step forward 1 frame
    s.player.StepFrame(1)
})

btnNextKF := widget.NewButton(">>", func() {
    // Jump to next keyframe
    kf := s.keyframeIndex.FindNearestKeyframe(currentPos, "after")
    s.player.Seek(kf.Timestamp)
})

// In/Out controls
btnSetIn := widget.NewButton("Set In [I]", func() {
    s.cutInPoint = currentPosition
    timelineWidget.SetInPoint(s.cutInPoint)
})

btnSetOut := widget.NewButton("Set Out [O]", func() {
    s.cutOutPoint = currentPosition
    timelineWidget.SetOutPoint(s.cutOutPoint)
})

btnClear := widget.NewButton("Clear [X]", func() {
    s.cutInPoint = nil
    s.cutOutPoint = nil
    timelineWidget.ClearPoints()
})

Keyboard Shortcuts:

canvas.SetOnTypedKey(func(ke *fyne.KeyEvent) {
    switch ke.Name {
    case fyne.KeySpace:
        togglePlayPause()
    case fyne.KeyLeft:
        if ke.Modifier&fyne.KeyModifierShift != 0 {
            jumpToPreviousKeyframe()
        } else {
            stepBackOneFrame()
        }
    case fyne.KeyRight:
        if ke.Modifier&fyne.KeyModifierShift != 0 {
            jumpToNextKeyframe()
        } else {
            stepForwardOneFrame()
        }
    case fyne.KeyI:
        setInPoint()
    case fyne.KeyO:
        setOutPoint()
    case fyne.KeyX:
        clearInOutPoints()
    case fyne.KeyE:
        exportCut()
    }
})

Memory Impact: +2KB for UI components


Phase 5: Lossless Cut Export

Goal: Fast, zero-quality-loss cutting using FFmpeg stream copy

Create internal/cut/export.go

Core Functionality:

package cut

import (
    "fmt"
    "os/exec"
    "git.leaktechnologies.dev/stu/VT_Player/internal/keyframe"
)

// ExportOptions configures export
type ExportOptions struct {
    InputPath  string
    OutputPath string
    InTime     float64
    OutTime    float64
    AutoSnap   bool // Snap in-point to nearest keyframe
}

// Export performs lossless cut
func Export(opts ExportOptions, idx *keyframe.Index,
            progress func(float64)) error {

    inTime := opts.InTime
    outTime := opts.OutTime

    // Validate/snap in-point to keyframe
    if opts.AutoSnap {
        kf := idx.FindNearestKeyframe(inTime, "before")
        if kf != nil && math.Abs(kf.Timestamp-inTime) > 0.1 {
            inTime = kf.Timestamp
        }
    }

    // FFmpeg stream copy (no re-encoding)
    args := []string{
        "-hide_banner",
        "-loglevel", "error",
        "-progress", "pipe:1",  // Progress reporting
        "-i", opts.InputPath,
        "-ss", fmt.Sprintf("%.6f", inTime),
        "-to", fmt.Sprintf("%.6f", outTime),
        "-c", "copy",  // Stream copy = lossless
        "-avoid_negative_ts", "make_zero",
        "-y",  // Overwrite
        opts.OutputPath,
    }

    cmd := exec.Command("ffmpeg", args...)

    // Parse progress output
    // Call progress(percentage) callback

    return cmd.Run()
}

// Validate checks if cut points are valid
func Validate(inTime, outTime float64, idx *keyframe.Index) error {
    // Check if in-point is close to a keyframe
    kf := idx.FindNearestKeyframe(inTime, "nearest")
    if kf == nil {
        return fmt.Errorf("no keyframes found")
    }

    diff := math.Abs(kf.Timestamp - inTime)
    if diff > 0.5 {
        return fmt.Errorf("in-point not near keyframe (%.2fs away)", diff)
    }

    if outTime <= inTime {
        return fmt.Errorf("out-point must be after in-point")
    }

    return nil
}

Export UI Integration:

exportBtn := widget.NewButton("Export Cut [E]", func() {
    if s.cutInPoint == nil || s.cutOutPoint == nil {
        dialog.ShowError(errors.New("Set in/out points first"), s.window)
        return
    }

    // Validate
    err := cut.Validate(*s.cutInPoint, *s.cutOutPoint, s.keyframeIndex)
    if err != nil {
        // Show error with option to auto-snap
        dialog.ShowConfirm(
            "Invalid Cut Point",
            fmt.Sprintf("%v\n\nSnap to nearest keyframe?", err),
            func(snap bool) {
                if snap {
                    // Auto-snap and retry
                    performExport(true)
                }
            },
            s.window,
        )
        return
    }

    // Show save dialog
    dialog.ShowFileSave(func(uc fyne.URIWriteCloser, err error) {
        if err != nil || uc == nil {
            return
        }
        outputPath := uc.URI().Path()
        uc.Close()

        // Export with progress
        performExport(false)
    }, s.window)
})

func performExport(autoSnap bool) {
    // Show progress dialog
    progress := widget.NewProgressBar()
    dlg := dialog.NewCustom("Exporting...", "Cancel", progress, s.window)
    dlg.Show()

    go func() {
        err := cut.Export(cut.ExportOptions{
            InputPath:  s.source.Path,
            OutputPath: outputPath,
            InTime:     *s.cutInPoint,
            OutTime:    *s.cutOutPoint,
            AutoSnap:   autoSnap,
        }, s.keyframeIndex, func(pct float64) {
            progress.SetValue(pct)
        })

        dlg.Hide()

        if err != nil {
            dialog.ShowError(err, s.window)
        } else {
            dialog.ShowInformation("Success", "Cut exported", s.window)
        }
    }()
}

Performance:

  • Export speed: Real-time (1-hour video exports in ~30 seconds)
  • No quality loss (bit-perfect copy)
  • Memory usage: <50MB during export

Performance Optimizations

Keyframe Detection

// 1. Parallel processing for multiple videos
var wg sync.WaitGroup
for _, video := range videos {
    wg.Add(1)
    go func(v string) {
        defer wg.Done()
        DetectKeyframes(v)
    }(video)
}

// 2. Incremental loading: Show UI before detection completes
go func() {
    idx, err := DetectKeyframes(videoPath)
    // Update UI when ready
    timeline.SetKeyframes(idx.Keyframes)
}()

// 3. Cache aggressively
cacheKey := fmt.Sprintf("%s-%d", videoPath, fileInfo.ModTime().Unix())
if cached := loadFromCache(cacheKey); cached != nil {
    return cached
}

Memory Management

// 1. Sparse keyframe storage (I-frames only)
// 1-hour video: ~180KB vs 10MB for all frames

// 2. Limit cached indices
const maxCachedIndices = 10
if len(indexCache) > maxCachedIndices {
    // Remove oldest
    delete(indexCache, oldestKey)
}

// 3. Timeline rendering: Canvas reuse
// Don't recreate canvas objects, update positions only

UI Responsiveness

// 1. Debounce position updates
var updateTimer *time.Timer
func updatePosition(pos float64) {
    if updateTimer != nil {
        updateTimer.Stop()
    }
    updateTimer = time.AfterFunc(50*time.Millisecond, func() {
        frameLabel.SetText(formatFrame(pos))
    })
}

// 2. Background goroutines for heavy operations
go detectKeyframes()
go exportCut()
// Never block UI thread

// 3. Efficient timeline redraw
// Only redraw scrubber, not entire timeline

Testing Strategy

Performance Benchmarks

# Target: Competitive with LosslessCut
# Keyframe detection: <5s for 1-hour video
# Frame stepping: <50ms response
# Export: Real-time speed (1x)
# Memory: <100MB total (including ffplay)

# Test suite:
go test ./internal/keyframe -bench=. -benchtime=10s
go test ./internal/cut -bench=. -benchtime=10s

Test Videos

# 1. Generate test video with known keyframe intervals
ffmpeg -f lavfi -i testsrc=duration=60:size=1280x720:rate=30 \
       -c:v libx264 -g 60 -keyint_min 60 \
       test_2s_keyframes.mp4

# 2. Various formats
# - H.264, H.265, VP9, AV1
# - Different GOP sizes
# - Variable framerate

Validation

# Verify cut accuracy
ffprobe -v error -show_entries format=duration \
        -of default=noprint_wrappers=1:nokey=1 cut_output.mp4

# Verify no re-encoding (check codec)
ffprobe -v error -select_streams v:0 \
        -show_entries stream=codec_name cut_output.mp4
# Should match original codec exactly

Resource Usage Targets

Memory:

  • Base application: ~30MB
  • Video playback (ffplay): ~50MB
  • Keyframe index (1-hour): ~1MB
  • UI components: ~5MB
  • Total: <100MB for typical use

CPU:

  • Idle: <1%
  • Playback: 5-15% (ffplay + UI updates)
  • Keyframe detection: 100% single core for <5s
  • Export: 20-40% (FFmpeg stream copy)

Disk:

  • Cache per video: <100KB
  • Total cache limit: 50MB (500 videos)
  • Auto-cleanup on startup

Success Criteria

Performance Parity with LosslessCut:

  • Keyframe detection: <5s for 1-hour video
  • Frame stepping: <50ms response time
  • Timeline rendering: <5ms for 1000 keyframes
  • Export speed: Real-time (1x)
  • Memory usage: <100MB total

Feature Completeness:

  • Frame-by-frame navigation
  • Keyframe visualization
  • In/out point marking
  • Lossless export
  • Keyboard-driven workflow
  • Progress reporting

Quality:

  • Bit-perfect lossless cuts
  • Frame-accurate positioning
  • No UI lag or stuttering
  • Stable under heavy use

Integration with VideoTools

Reusable Components:

  1. internal/keyframe/ - Copy directly
  2. internal/cut/ - Copy directly
  3. internal/ui/timeline.go - Adapt for Trim module

VideoTools Trim Module:

  • Use VT_Player's proven code
  • Add batch trimming
  • Integrate with queue system

Maintenance:

  • VT_Player = Standalone lightweight player
  • VideoTools = Full suite including trim capability
  • Share core keyframe/cut code between projects

Implementation Priority

Week 1:

  • Phase 1: Keyframe detection with caching
  • Test performance (<5s target)

Week 2:

  • Phase 2: Frame-accurate seeking
  • Phase 3: Timeline widget
  • Test responsiveness

Week 3:

  • Phase 4: Enhanced UI + keyboard shortcuts
  • Phase 5: Lossless cut export
  • Integration testing

Week 4:

  • Performance optimization
  • Documentation
  • Prepare for VideoTools integration

Next Steps

  1. Start with Phase 1 (keyframe detection)
  2. Benchmark against 1-hour test video
  3. Verify <5s detection time and <1MB memory
  4. Move to Phase 2 once performance validated
  5. Iterate rapidly, test continuously

Goal: Lightweight, powerful, competitive with industry tools.