# 🧠 PornPics Gallery Importer (Porndex System) **Version 0.4.2 β€” Unified Importer & ML Pipeline** A modular and well-documented gallery importer for [PornPics.com](https://www.pornpics.com) built for the **Porndex** ecosystem. Supports importing, tagging, metadata enrichment, and machine learning–ready dataset generation. --- ## πŸ“‚ Project Structure src/ β†’ Core source β”œβ”€β”€ importer/ β†’ Gallery importers, tag tools, and TPDB bridge β”‚ β”œβ”€β”€ cli.py β†’ Unified CLI (porndex-importer) β”‚ β”œβ”€β”€ gallery_importer.py β†’ Gallery parsing/downloading β”‚ β”œβ”€β”€ tag_gallery.py β†’ Tag management & YAML dictionaries β”‚ β”œβ”€β”€ reports/ β†’ Tag and enrichment summaries β”‚ β”œβ”€β”€ db/ β†’ Cached sources & enrichment data β”‚ β”œβ”€β”€ secrets/ β†’ API keys and credentials (ignored in Git) β”‚ └── tag_dictionaries/ β†’ YAML-based tag definitions β”‚ β”œβ”€β”€ ml/ β†’ Machine learning modules β”‚ β”œβ”€β”€ ml_dataset_builder.py β†’ Build JSONL dataset β”‚ β”œβ”€β”€ ml_embeddings.py β†’ Generate CLIP+Text embeddings β”‚ β”œβ”€β”€ ml_dataset_inspector.py β†’ Inspect or visualize dataset (planned) β”‚ └── ml_vision_detector.py β†’ GroundingDINO + SAM integration (planned) β”‚ β”œβ”€β”€ docs/ β†’ Documentation & changelogs β”œβ”€β”€ tests/ β†’ Unit and integration tests └── assets/ β†’ Static data or sample media yaml Copy code --- ## βš™οΈ Setup ```bash python3 -m venv .venv source .venv/bin/activate pip install -r requirements.txt Then, from the root of the project: bash Copy code export PYTHONPATH=src πŸš€ Quick Start Import a Gallery bash Copy code porndex-importer import "https://www.pornpics.com/galleries/example-gallery-id/" Automatically: Downloads images and metadata Saves to Galleries/__/ Creates metadata.json Runs auto-tagging (refresh-one) Updates library index 🧩 Core Features Feature Description Importer Downloads and parses galleries from PornPics Auto-Tagging Generates tags based on YAML dictionaries Metadata Refresh Updates all galleries with new metadata Source Management Track and bulk-update content sources CLI Tool Unified command: porndex-importer TPDB Bridge Enrich performers and metadata via ThePornDB API ML Dataset Builder Generates a unified dataset (JSONL) Hybrid Embeddings Builds combined CLIP + text vectors for semantic search πŸ€– Machine Learning Pipeline 1️⃣ Build Dataset bash Copy code python -m ml.ml_dataset_builder Creates: bash Copy code ML/porndex_dataset.jsonl Each record includes title, models, tags, and full image paths (no file duplication). 2️⃣ Build Embeddings bash Copy code python -m ml.ml_embeddings build --img-samples 8 --device auto Generates: bash Copy code ML/embeddings/<gallery_id>.npz ML/embeddings_index.jsonl Uses: SentenceTransformer for text OpenCLIP (ViT-B/32) for images and produces a combined hybrid vector. 3️⃣ Search Your Library bash Copy code # Semantic search (default) python -m ml.ml_embeddings search "japanese redhead creampie" # Strict literal search python -m ml.ml_embeddings search "interracial bbc" --mode strict 4️⃣ Verify Integrity bash Copy code python -m ml.ml_embeddings verify Displays: Total indexed records Images sampled NPZ validation summary 🧠 Development Guidelines No emojis in code or commits. Use descriptive variable names. Commit only verified working features. Document all new features in docs/CHANGELOG.md. Keep docs and CLI output in sync with docs/CLI_USAGE.md. πŸ—ΊοΈ Roadmap (v0.4.x β†’ v0.5.x) Stage Feature Description βœ… ML Embedding Search Hybrid text+image similarity βš™οΈ Gender & Ethnicity Detection Person-level classification ⏳ GroundingDINO Integration Object/region localization ⏳ Grounded SAM + BLIP Visual attribute extraction (clothing, actions) πŸ”œ Active Learning Re-train from gallery metadata and tags πŸ“„ License MIT β€” Internal Research Use Only Author: Leak Technologies