Go to file
2025-11-06 17:33:06 -05:00
assets/logo v0.3.4-docs-update: finalized documentation suite and version file 2025-11-06 14:29:21 -05:00
config Reinitialize Goondex v0.3.2 - fresh push from local clean copy 2025-11-06 10:24:24 -05:00
docs docs: synchronize v0.3.4 documentation set (CLI, architecture, galleries, changelog, readme) 2025-11-06 17:33:06 -05:00
Galleries docs: synchronize v0.3.4 documentation set (CLI, architecture, galleries, changelog, readme) 2025-11-06 17:33:06 -05:00
ML Reinitialize Goondex v0.3.2 - fresh push from local clean copy 2025-11-06 10:24:24 -05:00
src docs: synchronize v0.3.4 documentation set (CLI, architecture, galleries, changelog, readme) 2025-11-06 17:33:06 -05:00
.gitignore fix: stabilize import paths and CLI alias system 2025-11-06 11:14:26 -05:00
main.py Reinitialize Goondex v0.3.2 - fresh push from local clean copy 2025-11-06 10:24:24 -05:00
performers_dump.json Reinitialize Goondex v0.3.2 - fresh push from local clean copy 2025-11-06 10:24:24 -05:00
pyproject.toml Reinitialize Goondex v0.3.2 - fresh push from local clean copy 2025-11-06 10:24:24 -05:00
requirements.txt fix: stabilize import paths and CLI alias system 2025-11-06 11:14:26 -05:00
VERSION fix: stabilize import paths and CLI alias system 2025-11-06 11:14:40 -05:00

File: docs/README.md
Version: v0.3.4
Last updated: November 2025
Maintainer: Leak Technologies
Project: Goondex


Goondex — PornPics Importer & ML Pipeline

A modular, documented gallery importer for PornPics.com, forming the foundation of the Goondex ecosystem.
Supports importing, tagging, metadata enrichment, and generation of ML-ready datasets for semantic search and classification.


  1. Project Overview

Goondex automates the process of:

  • Downloading and organizing galleries from PornPics.com
  • Generating structured metadata and tag inference
  • Enriching galleries via ThePornDB (TPDB) performer API
  • Building machine-learning datasets and embeddings
  • Enabling semantic, hybrid (text + image) search

All operations are handled locally — no cloud dependencies or external databases are required.
The system is modular, transparent, and designed for research and personal archival use.


  1. Project Structure

src/ ├── importer/ → Core importer logic and CLI tools
│ ├── cli.py → Unified CLI entrypoint (goondex command)
│ ├── gallery_importer.py → Gallery parser and downloader
│ ├── tag_gallery.py → Tag inference and YAML management
│ ├── reports/ → Auto-generated validation and tag stats
│ ├── db/ → TPDB performer cache and local databases
│ ├── secrets/ → Local-only API keys (ignored by Git)
│ └── tag_dictionaries/ → Modular YAML tag dictionaries
│ ├── ml/ → Machine learning and semantic search
│ ├── ml_dataset_builder.py → Builds JSONL dataset for embeddings
│ ├── ml_embeddings.py → Generates CLIP + text hybrid vectors
│ ├── ml_dataset_inspector.py → (planned) visual dataset viewer
│ └── ml_vision_detector.py → (planned) DINO + SAM visual tagging
│ ├── docs/ → Documentation, changelogs, and brand files
├── tests/ → Unit and integration testing suite
└── assets/ → Static samples and test assets


  1. Environment Setup

Create a virtual environment and install dependencies:

bash python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Set the source path for development:

bash export PYTHONPATH=src


  1. Quick Start

Import a gallery from PornPics:

bash goondex import "https://www.pornpics.com/galleries/example-id/"

Automatically:

  • Downloads images and metadata
  • Saves to Galleries/