Goondex/TAGGING_ARCHITECTURE.md
Stu Leak 16fb407a3c v0.1.0-dev4: Add web frontend with UI component library
- Implement full web interface with Go html/template server
- Add GX component library (buttons, dialogs, tables, forms, etc.)
- Create scene/performer/studio/movie detail and listing pages
- Add Adult Empire scraper for additional metadata sources
- Implement movie support with database schema
- Add import and sync services for data management
- Include comprehensive API and frontend documentation
- Add custom color scheme and responsive layout

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-17 10:47:30 -05:00

9.5 KiB

Goondex Tagging System Architecture

Vision

Enable ML-driven search queries like:

  • "3 black men in a scene where a blonde milf wears pink panties and black heels"
  • Image-based scene detection and recommendation
  • Auto-tagging from PornPics image imports

Core Requirements

1. Tag Categories (Hierarchical Structure)

Tags need to be organized by category for efficient filtering and ML training:

performers/
  └─ [already implemented via performers table]

people/
  ├─ count/          (1, 2, 3, 4, 5+, orgy, etc.)
  ├─ ethnicity/      (black, white, asian, latina, etc.)
  ├─ age_category/   (teen, milf, mature, etc.)
  ├─ body_type/      (slim, athletic, curvy, bbw, etc.)
  └─ hair/
      ├─ color/      (blonde, brunette, redhead, etc.)
      └─ length/     (short, long, bald, etc.)

clothing/
  ├─ type/           (lingerie, uniform, casual, etc.)
  ├─ color/          (pink, black, red, white, etc.)
  ├─ specific/
      ├─ top/        (bra, corset, tank_top, etc.)
      ├─ bottom/     (panties, skirt, jeans, etc.)
      └─ footwear/   (heels, boots, stockings, etc.)

position/
  ├─ category/       (standing, lying, sitting, etc.)
  └─ specific/       (missionary, doggy, cowgirl, etc.)

action/
  ├─ sexual/         (oral, penetration, etc.)
  └─ non_sexual/     (kissing, undressing, etc.)

setting/
  ├─ location/       (bedroom, office, outdoor, etc.)
  └─ time/           (day, night, etc.)

production/
  ├─ quality/        (hd, 4k, vr, etc.)
  └─ style/          (pov, amateur, professional, etc.)

2. Database Schema Extensions

Enhanced Tags Table

CREATE TABLE IF NOT EXISTS tag_categories (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    name TEXT NOT NULL UNIQUE,          -- e.g., "clothing/color"
    parent_id INTEGER,                   -- for hierarchical categories
    description TEXT,
    created_at TEXT NOT NULL DEFAULT (datetime('now')),
    FOREIGN KEY (parent_id) REFERENCES tag_categories(id) ON DELETE CASCADE
);

CREATE TABLE IF NOT EXISTS tags (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    name TEXT NOT NULL,                  -- e.g., "pink"
    category_id INTEGER NOT NULL,        -- links to "clothing/color"
    aliases TEXT,                        -- comma-separated: "hot pink,rose"
    description TEXT,
    source TEXT,                         -- tpdb, user, ml
    source_id TEXT,
    created_at TEXT NOT NULL DEFAULT (datetime('now')),
    updated_at TEXT NOT NULL DEFAULT (datetime('now')),
    UNIQUE(category_id, name),
    FOREIGN KEY (category_id) REFERENCES tag_categories(id) ON DELETE CASCADE
);

-- Enhanced scene-tag junction with ML confidence
CREATE TABLE IF NOT EXISTS scene_tags (
    scene_id INTEGER NOT NULL,
    tag_id INTEGER NOT NULL,
    confidence REAL DEFAULT 1.0,         -- 0.0-1.0 for ML predictions
    source TEXT NOT NULL DEFAULT 'user', -- 'user', 'ml', 'tpdb'
    verified BOOLEAN DEFAULT 0,          -- human verification flag
    created_at TEXT NOT NULL DEFAULT (datetime('now')),
    PRIMARY KEY (scene_id, tag_id),
    FOREIGN KEY (scene_id) REFERENCES scenes(id) ON DELETE CASCADE,
    FOREIGN KEY (tag_id) REFERENCES tags(id) ON DELETE CASCADE
);

-- Track images associated with scenes (for ML training)
CREATE TABLE IF NOT EXISTS scene_images (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    scene_id INTEGER NOT NULL,
    image_url TEXT NOT NULL,
    image_path TEXT,                     -- local storage path
    source TEXT,                         -- pornpics, tpdb, user
    source_id TEXT,
    width INTEGER,
    height INTEGER,
    file_size INTEGER,
    created_at TEXT NOT NULL DEFAULT (datetime('now')),
    FOREIGN KEY (scene_id) REFERENCES scenes(id) ON DELETE CASCADE
);

-- ML model predictions for future reference
CREATE TABLE IF NOT EXISTS ml_predictions (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    scene_id INTEGER,
    image_id INTEGER,
    model_version TEXT NOT NULL,         -- track which ML model made prediction
    predictions TEXT NOT NULL,            -- JSON: [{"tag_id": 123, "confidence": 0.95}, ...]
    created_at TEXT NOT NULL DEFAULT (datetime('now')),
    FOREIGN KEY (scene_id) REFERENCES scenes(id) ON DELETE CASCADE,
    FOREIGN KEY (image_id) REFERENCES scene_images(id) ON DELETE CASCADE
);

Indexes for ML Performance

-- Tag search performance
CREATE INDEX IF NOT EXISTS idx_tags_category ON tags(category_id);
CREATE INDEX IF NOT EXISTS idx_tags_name ON tags(name);

-- Scene tag filtering (critical for complex queries)
CREATE INDEX IF NOT EXISTS idx_scene_tags_tag ON scene_tags(tag_id);
CREATE INDEX IF NOT EXISTS idx_scene_tags_confidence ON scene_tags(confidence);
CREATE INDEX IF NOT EXISTS idx_scene_tags_verified ON scene_tags(verified);

-- Image processing
CREATE INDEX IF NOT EXISTS idx_scene_images_scene ON scene_images(scene_id);
CREATE INDEX IF NOT EXISTS idx_scene_images_source ON scene_images(source, source_id);

3. Complex Query Architecture

For queries like "3 black men + blonde milf + pink panties + black heels":

-- Step 1: Find scenes with all required tags
WITH required_tags AS (
    SELECT scene_id, COUNT(DISTINCT tag_id) as tag_count
    FROM scene_tags st
    JOIN tags t ON st.tag_id = t.id
    WHERE
        (t.name = 'black' AND category_id = (SELECT id FROM tag_categories WHERE name = 'people/ethnicity'))
        OR (t.name = 'blonde' AND category_id = (SELECT id FROM tag_categories WHERE name = 'people/hair/color'))
        OR (t.name = 'pink' AND category_id = (SELECT id FROM tag_categories WHERE name = 'clothing/color'))
        -- etc.
    AND st.verified = 1  -- only human-verified tags
    AND st.confidence >= 0.8  -- or ML predictions above threshold
    GROUP BY scene_id
    HAVING tag_count >= 4  -- all required tags present
)
SELECT s.*
FROM scenes s
JOIN required_tags rt ON s.id = rt.scene_id
-- Additional filtering for performer count, etc.

4. ML Integration Points

Phase 1: Data Collection (Current)

  • Import scenes from TPDB with metadata
  • Import images from PornPics
  • Manual tagging to build training dataset

Phase 2: Tag Suggestion (Future)

  • ML model suggests tags based on images
  • Store predictions with confidence scores
  • Human verification workflow

Phase 3: Auto-tagging (Future)

  • High-confidence predictions auto-applied
  • Periodic retraining with verified data
  • Confidence thresholds per tag category

5. Data Quality Safeguards

Prevent Tag Spam:

  • Tag category constraints (can't tag "bedroom" as "clothing/color")
  • Minimum confidence thresholds
  • Rate limiting on ML predictions

Ensure Consistency:

  • Tag aliases for variations (pink/rose/hot_pink)
  • Batch tag operations
  • Tag merging/splitting tools

Human Oversight:

  • Verification workflow for ML tags
  • Tag dispute resolution
  • Quality metrics per tagger (user/ml)

6. API Design (Future)

// TagService interface
type TagService interface {
    // Basic CRUD
    CreateTag(categoryID int64, name string, aliases []string) (*Tag, error)
    GetTagByID(id int64) (*Tag, error)
    SearchTags(query string, categoryID *int64) ([]Tag, error)

    // Scene tagging
    AddTagToScene(sceneID, tagID int64, source string, confidence float64) error
    RemoveTagFromScene(sceneID, tagID int64) error
    GetSceneTags(sceneID int64, verified bool) ([]Tag, error)

    // Complex queries
    SearchScenesByTags(requirements TagRequirements) ([]Scene, error)

    // ML integration
    StorePrediction(sceneID int64, predictions []TagPrediction) error
    VerifyTag(sceneID, tagID int64) error
    BulkVerifyTags(sceneID int64, tagIDs []int64) error
}

type TagRequirements struct {
    Required []TagFilter  // must have ALL
    Optional []TagFilter  // nice to have (scoring)
    Excluded []TagFilter  // must NOT have
    MinConfidence float64
    VerifiedOnly bool
}

type TagFilter struct {
    CategoryPath string  // "clothing/color"
    Value string         // "pink"
    Operator string      // "equals", "contains", "gt", "lt"
}

Implementation Roadmap

v0.2.0: Enhanced Tagging Foundation

  1. Fix NULL handling (completed)
  2. Implement tag_categories table and seed data
  3. Update tags table with category_id foreign key
  4. Enhance scene_tags with confidence/source/verified
  5. Add scene_images table for PornPics integration
  6. Create TagService with basic CRUD
  1. Implement complex tag query builder
  2. Add tag filtering UI/CLI commands
  3. Performance optimization with proper indexes
  4. Tag statistics and reporting

v0.4.0: ML Preparation

  1. Image import from PornPics
  2. ML prediction storage table
  3. Tag verification workflow
  4. Training dataset export

v0.5.0: ML Integration

  1. Image classification model
  2. Auto-tagging pipeline
  3. Confidence threshold tuning
  4. Retraining automation

Notes

  • Backwards Compatibility: Current tags table can migrate by adding category_id = (category "general")
  • Storage Consideration: Images may require significant disk space - consider cloud storage integration
  • Privacy: All personal data remains local unless explicitly synced
  • Performance: Proper indexing critical - complex queries with 10+ tags need optimization

Example User Flow

  1. User imports scene from TPDB → Basic metadata populated
  2. User uploads/links images from PornPics → scene_images populated
  3. ML model scans images → scene_tags created with confidence < 1.0, source = 'ml'
  4. User reviews suggestions → verified = 1 for accepted tags
  5. User searches "blonde + heels" → Query filters by verified tags or confidence > 0.9
  6. System returns ranked results based on tag match confidence