- Implement full web interface with Go html/template server - Add GX component library (buttons, dialogs, tables, forms, etc.) - Create scene/performer/studio/movie detail and listing pages - Add Adult Empire scraper for additional metadata sources - Implement movie support with database schema - Add import and sync services for data management - Include comprehensive API and frontend documentation - Add custom color scheme and responsive layout 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
9.5 KiB
9.5 KiB
Goondex Tagging System Architecture
Vision
Enable ML-driven search queries like:
- "3 black men in a scene where a blonde milf wears pink panties and black heels"
- Image-based scene detection and recommendation
- Auto-tagging from PornPics image imports
Core Requirements
1. Tag Categories (Hierarchical Structure)
Tags need to be organized by category for efficient filtering and ML training:
performers/
└─ [already implemented via performers table]
people/
├─ count/ (1, 2, 3, 4, 5+, orgy, etc.)
├─ ethnicity/ (black, white, asian, latina, etc.)
├─ age_category/ (teen, milf, mature, etc.)
├─ body_type/ (slim, athletic, curvy, bbw, etc.)
└─ hair/
├─ color/ (blonde, brunette, redhead, etc.)
└─ length/ (short, long, bald, etc.)
clothing/
├─ type/ (lingerie, uniform, casual, etc.)
├─ color/ (pink, black, red, white, etc.)
├─ specific/
├─ top/ (bra, corset, tank_top, etc.)
├─ bottom/ (panties, skirt, jeans, etc.)
└─ footwear/ (heels, boots, stockings, etc.)
position/
├─ category/ (standing, lying, sitting, etc.)
└─ specific/ (missionary, doggy, cowgirl, etc.)
action/
├─ sexual/ (oral, penetration, etc.)
└─ non_sexual/ (kissing, undressing, etc.)
setting/
├─ location/ (bedroom, office, outdoor, etc.)
└─ time/ (day, night, etc.)
production/
├─ quality/ (hd, 4k, vr, etc.)
└─ style/ (pov, amateur, professional, etc.)
2. Database Schema Extensions
Enhanced Tags Table
CREATE TABLE IF NOT EXISTS tag_categories (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL UNIQUE, -- e.g., "clothing/color"
parent_id INTEGER, -- for hierarchical categories
description TEXT,
created_at TEXT NOT NULL DEFAULT (datetime('now')),
FOREIGN KEY (parent_id) REFERENCES tag_categories(id) ON DELETE CASCADE
);
CREATE TABLE IF NOT EXISTS tags (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL, -- e.g., "pink"
category_id INTEGER NOT NULL, -- links to "clothing/color"
aliases TEXT, -- comma-separated: "hot pink,rose"
description TEXT,
source TEXT, -- tpdb, user, ml
source_id TEXT,
created_at TEXT NOT NULL DEFAULT (datetime('now')),
updated_at TEXT NOT NULL DEFAULT (datetime('now')),
UNIQUE(category_id, name),
FOREIGN KEY (category_id) REFERENCES tag_categories(id) ON DELETE CASCADE
);
-- Enhanced scene-tag junction with ML confidence
CREATE TABLE IF NOT EXISTS scene_tags (
scene_id INTEGER NOT NULL,
tag_id INTEGER NOT NULL,
confidence REAL DEFAULT 1.0, -- 0.0-1.0 for ML predictions
source TEXT NOT NULL DEFAULT 'user', -- 'user', 'ml', 'tpdb'
verified BOOLEAN DEFAULT 0, -- human verification flag
created_at TEXT NOT NULL DEFAULT (datetime('now')),
PRIMARY KEY (scene_id, tag_id),
FOREIGN KEY (scene_id) REFERENCES scenes(id) ON DELETE CASCADE,
FOREIGN KEY (tag_id) REFERENCES tags(id) ON DELETE CASCADE
);
-- Track images associated with scenes (for ML training)
CREATE TABLE IF NOT EXISTS scene_images (
id INTEGER PRIMARY KEY AUTOINCREMENT,
scene_id INTEGER NOT NULL,
image_url TEXT NOT NULL,
image_path TEXT, -- local storage path
source TEXT, -- pornpics, tpdb, user
source_id TEXT,
width INTEGER,
height INTEGER,
file_size INTEGER,
created_at TEXT NOT NULL DEFAULT (datetime('now')),
FOREIGN KEY (scene_id) REFERENCES scenes(id) ON DELETE CASCADE
);
-- ML model predictions for future reference
CREATE TABLE IF NOT EXISTS ml_predictions (
id INTEGER PRIMARY KEY AUTOINCREMENT,
scene_id INTEGER,
image_id INTEGER,
model_version TEXT NOT NULL, -- track which ML model made prediction
predictions TEXT NOT NULL, -- JSON: [{"tag_id": 123, "confidence": 0.95}, ...]
created_at TEXT NOT NULL DEFAULT (datetime('now')),
FOREIGN KEY (scene_id) REFERENCES scenes(id) ON DELETE CASCADE,
FOREIGN KEY (image_id) REFERENCES scene_images(id) ON DELETE CASCADE
);
Indexes for ML Performance
-- Tag search performance
CREATE INDEX IF NOT EXISTS idx_tags_category ON tags(category_id);
CREATE INDEX IF NOT EXISTS idx_tags_name ON tags(name);
-- Scene tag filtering (critical for complex queries)
CREATE INDEX IF NOT EXISTS idx_scene_tags_tag ON scene_tags(tag_id);
CREATE INDEX IF NOT EXISTS idx_scene_tags_confidence ON scene_tags(confidence);
CREATE INDEX IF NOT EXISTS idx_scene_tags_verified ON scene_tags(verified);
-- Image processing
CREATE INDEX IF NOT EXISTS idx_scene_images_scene ON scene_images(scene_id);
CREATE INDEX IF NOT EXISTS idx_scene_images_source ON scene_images(source, source_id);
3. Complex Query Architecture
For queries like "3 black men + blonde milf + pink panties + black heels":
-- Step 1: Find scenes with all required tags
WITH required_tags AS (
SELECT scene_id, COUNT(DISTINCT tag_id) as tag_count
FROM scene_tags st
JOIN tags t ON st.tag_id = t.id
WHERE
(t.name = 'black' AND category_id = (SELECT id FROM tag_categories WHERE name = 'people/ethnicity'))
OR (t.name = 'blonde' AND category_id = (SELECT id FROM tag_categories WHERE name = 'people/hair/color'))
OR (t.name = 'pink' AND category_id = (SELECT id FROM tag_categories WHERE name = 'clothing/color'))
-- etc.
AND st.verified = 1 -- only human-verified tags
AND st.confidence >= 0.8 -- or ML predictions above threshold
GROUP BY scene_id
HAVING tag_count >= 4 -- all required tags present
)
SELECT s.*
FROM scenes s
JOIN required_tags rt ON s.id = rt.scene_id
-- Additional filtering for performer count, etc.
4. ML Integration Points
Phase 1: Data Collection (Current)
- Import scenes from TPDB with metadata
- Import images from PornPics
- Manual tagging to build training dataset
Phase 2: Tag Suggestion (Future)
- ML model suggests tags based on images
- Store predictions with confidence scores
- Human verification workflow
Phase 3: Auto-tagging (Future)
- High-confidence predictions auto-applied
- Periodic retraining with verified data
- Confidence thresholds per tag category
5. Data Quality Safeguards
Prevent Tag Spam:
- Tag category constraints (can't tag "bedroom" as "clothing/color")
- Minimum confidence thresholds
- Rate limiting on ML predictions
Ensure Consistency:
- Tag aliases for variations (pink/rose/hot_pink)
- Batch tag operations
- Tag merging/splitting tools
Human Oversight:
- Verification workflow for ML tags
- Tag dispute resolution
- Quality metrics per tagger (user/ml)
6. API Design (Future)
// TagService interface
type TagService interface {
// Basic CRUD
CreateTag(categoryID int64, name string, aliases []string) (*Tag, error)
GetTagByID(id int64) (*Tag, error)
SearchTags(query string, categoryID *int64) ([]Tag, error)
// Scene tagging
AddTagToScene(sceneID, tagID int64, source string, confidence float64) error
RemoveTagFromScene(sceneID, tagID int64) error
GetSceneTags(sceneID int64, verified bool) ([]Tag, error)
// Complex queries
SearchScenesByTags(requirements TagRequirements) ([]Scene, error)
// ML integration
StorePrediction(sceneID int64, predictions []TagPrediction) error
VerifyTag(sceneID, tagID int64) error
BulkVerifyTags(sceneID int64, tagIDs []int64) error
}
type TagRequirements struct {
Required []TagFilter // must have ALL
Optional []TagFilter // nice to have (scoring)
Excluded []TagFilter // must NOT have
MinConfidence float64
VerifiedOnly bool
}
type TagFilter struct {
CategoryPath string // "clothing/color"
Value string // "pink"
Operator string // "equals", "contains", "gt", "lt"
}
Implementation Roadmap
v0.2.0: Enhanced Tagging Foundation
- ✅ Fix NULL handling (completed)
- Implement tag_categories table and seed data
- Update tags table with category_id foreign key
- Enhance scene_tags with confidence/source/verified
- Add scene_images table for PornPics integration
- Create TagService with basic CRUD
v0.3.0: Advanced Search
- Implement complex tag query builder
- Add tag filtering UI/CLI commands
- Performance optimization with proper indexes
- Tag statistics and reporting
v0.4.0: ML Preparation
- Image import from PornPics
- ML prediction storage table
- Tag verification workflow
- Training dataset export
v0.5.0: ML Integration
- Image classification model
- Auto-tagging pipeline
- Confidence threshold tuning
- Retraining automation
Notes
- Backwards Compatibility: Current tags table can migrate by adding category_id = (category "general")
- Storage Consideration: Images may require significant disk space - consider cloud storage integration
- Privacy: All personal data remains local unless explicitly synced
- Performance: Proper indexing critical - complex queries with 10+ tags need optimization
Example User Flow
- User imports scene from TPDB → Basic metadata populated
- User uploads/links images from PornPics → scene_images populated
- ML model scans images → scene_tags created with confidence < 1.0, source = 'ml'
- User reviews suggestions → verified = 1 for accepted tags
- User searches "blonde + heels" → Query filters by verified tags or confidence > 0.9
- System returns ranked results based on tag match confidence