Goondex/README.md
Stu Leak 3b8adad57d 🚀 Goondex v0.1.0-dev3 - Comprehensive ML-Powered Search & Import System
MAJOR FEATURES ADDED:
======================

🤖 ML Analysis System:
- Comprehensive scene image analysis with per-scene predictions
- Enhanced database schema with scene_ml_analysis table
- Advanced detection for clothing colors, body types, age categories, positions, settings
- Support for multiple prediction types (clothing, body, sexual acts, etc.)
- Confidence scoring and ML source tracking

🧠 Enhanced Search Capabilities:
- Natural language parser for complex queries (e.g., "Teenage Riley Reid creampie older man pink thong black heels red couch")
- Category-based search with confidence-weighted results
- ML-enhanced tag matching with automatic fallback to traditional search
- Support for "Money Shot: Creampie" vs "Cum in Open Mouth" detection

🗄️ Advanced Database Schema:
- Male detection: circumcised field (0/1)
- Pubic hair types: natural, shaved, trimmed, landing strip, bushy, hairy
- Scene ML analysis table for storing per-scene predictions
- Comprehensive seed tags for all detection categories

🏗️ Dual Scraper Architecture:
- Flexible import service supporting both TPDB and Adult Empire scrapers
- Bulk scraper implementation for Adult Empire using multiple search strategies
- Progress tracking with Server-Sent Events (SSE) for real-time updates
- Graceful fallback from Adult Empire to TPDB when needed

📝 Enhanced Import System:
- Individual bulk imports (performers, studios, scenes, movies)
- Combined "import all" operation
- Real-time progress tracking with job management
- Error handling and retry mechanisms
- Support for multiple import sources and strategies

🔧 Technical Improvements:
- Modular component architecture for maintainability
- Enhanced error handling and logging
- Performance-optimized database queries with proper indexing
- Configurable import limits and rate limiting
- Comprehensive testing framework

This commit establishes Goondex as a comprehensive adult content discovery platform with ML-powered analysis and advanced search capabilities, ready for integration with computer vision models for automated tagging and scene analysis.
2025-12-30 21:52:25 -05:00

6.5 KiB

Goondex

Fast, local-first media indexer for adult content.

Goondex ingests metadata from external sources (ThePornDB, etc.), normalizes it, and stores it in a small SQLite database for quick search via CLI/TUI and background daemon tasks.

Version

v0.1.0-dev2 - TPDB Integration Release

Note: TPDB import/sync commands are temporarily disabled. Use the Adult Empire commands (goondex adultemp ...) to build your directory for now.

Features (v0.1.0-dev2)

  • SQLite database with WAL mode for performers, studios, scenes, and tags
  • Full TPDB scraper integration with real API calls
  • CLI import commands - Fetch data directly from ThePornDB
  • CLI search commands for local database queries
  • Automatic relationship management (scenes ↔ performers, scenes ↔ tags)
  • Pluggable scraper architecture
  • Configuration via YAML files
  • ML-Powered Scene Analysis: Automatic image analysis and tagging system
  • Advanced Natural Language Search: Complex query parsing ("Teenage Riley Reid creampie older man pink thong black heels red couch")
  • Comprehensive Tag System: Body types, clothing colors, pubic hair styles, positions, settings
  • Dual Scraper Support: TPDB + Adult Empire bulk import capabilities
  • Performer Detection: Male/Female classification and circumcised detection
  • Sex Act Classification: Creampie vs Cum in Open Mouth detection
  • Enhanced Database Schema: ML analysis tables with confidence scoring
  • Stash-inspired metadata resolution strategies (coming in v0.2.x)

Architecture

Scrapers (TPDB, AE, etc.)
    ↓
Metadata Resolver (field strategies, merge rules)
    ↓
SQLite DB (performers, studios, scenes, tags, scene_ml_analysis)
    ↓
ML Analysis Service
    ↓
Advanced Search Engine
    ↓
Bulk Import Manager

CLI/TUI + Daemon (search, identify, sync)


## Installation

```bash
# Clone the repository
git clone <repository-url>
cd Goondex

# Build the CLI
go build -o bin/goondex ./cmd/goondex

# (Optional) Build the daemon
go build -o bin/goondexd ./cmd/goondexd

Configuration

Set your API keys as environment variables:

export TPDB_API_KEY="your-tpdb-api-key"

Edit configuration files in config/:

  • goondex.yml - Runtime settings (database path, cache dir, timeouts)
  • metadata.yml - Field strategies (MERGE/OVERWRITE/IGNORE)
  • source.yml - Scraper sources and priorities

Usage

Quick Start

# 1. Set your TPDB API key
export TPDB_API_KEY="your-api-key-here"

# 2. Import some data from ThePornDB
./bin/goondex import performer "Riley Reid"
./bin/goondex import studio "Brazzers"
./bin/goondex import scene "Big Wet Butts"

# 3. Search your local database
./bin/goondex performer-search "Riley"
./bin/goondex studio-search "Brazzers"
./bin/goondex scene-search "Big Wet"

All Commands

Import from ThePornDB (requires TPDB_API_KEY):

./bin/goondex import performer [query]  # Import performers
./bin/goondex import studio [query]     # Import studios
./bin/goondex import scene [query]      # Import scenes (+ performers, tags, studio)

Search Local Database:

./bin/goondex performer-search [query]  # Search performers
./bin/goondex studio-search [query]     # Search studios
./bin/goondex scene-search [query]      # Search scenes

Other:

./bin/goondex version                   # Show version
./bin/goondex --help                    # Show help

See CLI Reference for complete documentation.

Database Schema

  • performers: id, name, aliases, nationality, country, gender, images, bio
  • studios: id, name, parent_id, images, description
  • tags: id, name
  • scenes: id, title, code, date, studio_id, description, images, director, url
  • scene_performers: junction table for scenes ↔ performers
  • scene_tags: junction table for scenes ↔ tags

Documentation

Comprehensive documentation is available in the docs/ directory:

Roadmap

v0.1.x (Current)

  • CLI search commands
  • SQLite stores for all entities
  • TPDB scraper implementation with real API integration
  • Import commands (performer, studio, scene)
  • Comprehensive documentation
  • Image cache

v0.2.x

  • Identify/import commands
  • TUI list + preview
  • Alias normalization
  • Full-text search (FTS5)

v0.3.x

  • Daemon (goondexd) with schedules
  • Incremental updates
  • Duplicate scene detection
  • Preview sprite generation

v0.4.x

  • Plugin scrapers
  • Headless HTTP API
  • Web UI

Development

# Run tests (when available)
go test ./...

# Build for development
go build -o bin/goondex ./cmd/goondex

# Run without installing
go run ./cmd/goondex performer-search "test"

Scripts

  • source scripts/env.sh - Pin Go caches inside the repo (recommended before building)
  • source scripts/load-env.sh - Load API keys from .env.local (or .env) without hardcoding them
  • scripts/build.sh - Build the CLI (bin/goondex)
  • ADDR=localhost:8788 scripts/run.sh - Build (if needed) and start the web UI
  • scripts/test.sh - Run go test ./cmd/... ./internal/...

Building & Rebuilding the CLI

The Goondex binary is not rebuilt automatically—whenever you change Go files (especially under cmd/goondex or internal/*), rebuild before re-running commands.

# Clean out any previous binary (prevents running stale code)
rm -f goondex bin/goondex

# Build the latest CLI
go build -o goondex ./cmd/goondex

After rebuilding, rerun ./goondex (or the binary under bin/) so new commands like import all become available. Repeat the build whenever:

  • You pull new commits (e.g., moving to v0.1.0-dev5)
  • CLI command definitions change
  • Shared packages under internal/ are modified
  • You switch Go versions or modules are updated (go mod tidy, go get, etc.)

Contributing

This is a personal project, but contributions are welcome! Please open an issue before submitting large changes.

License

[Your License Here]

Acknowledgments

Inspired by Stash and its metadata identification flow.