File: docs/README.md Version: v0.3.4 Last updated: November 2025 Maintainer: Leak Technologies Project: Goondex ------------------------------------------------------------ Goondex — PornPics Importer & ML Pipeline ------------------------------------------------------------ A modular, documented gallery importer for PornPics.com, forming the foundation of the Goondex ecosystem. Supports importing, tagging, metadata enrichment, and generation of ML-ready datasets for semantic search and classification. ------------------------------------------------------------ 1. Project Overview ------------------------------------------------------------ Goondex automates the process of: - Downloading and organizing galleries from PornPics.com - Generating structured metadata and tag inference - Enriching galleries via ThePornDB (TPDB) performer API - Building machine-learning datasets and embeddings - Enabling semantic, hybrid (text + image) search All operations are handled locally — no cloud dependencies or external databases are required. The system is modular, transparent, and designed for research and personal archival use. ------------------------------------------------------------ 2. Project Structure ------------------------------------------------------------ src/ ├── importer/ → Core importer logic and CLI tools │ ├── cli.py → Unified CLI entrypoint (goondex command) │ ├── gallery_importer.py → Gallery parser and downloader │ ├── tag_gallery.py → Tag inference and YAML management │ ├── reports/ → Auto-generated validation and tag stats │ ├── db/ → TPDB performer cache and local databases │ ├── secrets/ → Local-only API keys (ignored by Git) │ └── tag_dictionaries/ → Modular YAML tag dictionaries │ ├── ml/ → Machine learning and semantic search │ ├── ml_dataset_builder.py → Builds JSONL dataset for embeddings │ ├── ml_embeddings.py → Generates CLIP + text hybrid vectors │ ├── ml_dataset_inspector.py → (planned) visual dataset viewer │ └── ml_vision_detector.py → (planned) DINO + SAM visual tagging │ ├── docs/ → Documentation, changelogs, and brand files ├── tests/ → Unit and integration testing suite └── assets/ → Static samples and test assets ------------------------------------------------------------ 3. Environment Setup ------------------------------------------------------------ Create a virtual environment and install dependencies: bash python3 -m venv .venv source .venv/bin/activate pip install -r requirements.txt Set the source path for development: bash export PYTHONPATH=src ------------------------------------------------------------ 4. Quick Start ------------------------------------------------------------ Import a gallery from PornPics: bash goondex import "https://www.pornpics.com/galleries/example-id/" Automatically: - Downloads images and metadata - Saves to Galleries/__/ - Generates metadata.json - Runs auto-tagging (refresh-one) - Updates the central gallery index ------------------------------------------------------------ 5. CLI Overview ------------------------------------------------------------ All commands are run via: goondex <command> [args...] Examples: goondex refresh-all goondex refresh-one "<folder>" goondex validate-tags goondex tag-stats goondex list-tags "<folder>" goondex add "<folder>" "TagName" goondex source bulk set "PornPics" The CLI automatically detects YAML tag dictionaries and applies them during refresh or import. ------------------------------------------------------------ 6. Machine Learning Pipeline ------------------------------------------------------------ Build dataset: bash python -m ml.ml_dataset_builder Output: ML/porndex_dataset.jsonl Each record includes: { "gallery_id": "...", "title": "...", "models": ["..."], "tags": ["..."], "categories": ["..."], "image_paths": ["..."] } Build embeddings: bash python -m ml.ml_embeddings build --img-samples 8 --device auto Output: ML/embeddings/<gallery_id>.npz ML/embeddings_index.jsonl Search: bash python -m ml.ml_embeddings search "asian redhead solo" Modes: - semantic (default) — hybrid vector cosine similarity - text — text-only search - strict — literal keyword matching Verify: bash python -m ml.ml_embeddings verify ------------------------------------------------------------ 7. Development Guidelines ------------------------------------------------------------ - Use descriptive variable names and structured commits - Avoid emojis in code and commit messages - Always document new features in docs/CHANGELOG.md - Keep CLI text synchronized with docs/CLI_USAGE.md - Use version tagging for all major commits ------------------------------------------------------------ 8. Roadmap Summary ------------------------------------------------------------ Stage Feature Description ----------- -------------------------------- ----------------------------- ✅ v0.3.x Stable CLI & Tagging Unified CLI and YAML cleanup ⚙️ v0.4.x ML Embeddings & Dataset Builder Build hybrid vectors for search ⏳ v0.5.x Visual Intelligence DINO + SAM + attribute detection 🔜 v0.6.x Local Web UI Lightweight gallery browser 🚀 v1.0.0 Full Stable Release Plugin importers + visual ML tools ------------------------------------------------------------ 9. Licensing ------------------------------------------------------------ License: Research-Use MIT Variant Author: Leak Technologies Maintainer: Stu Leak For personal, non-commercial, and research use only. ------------------------------------------------------------ End of File ------------------------------------------------------------