| config | ||
| docs | ||
| Galleries | ||
| ML | ||
| src | ||
| .gitignore | ||
| LICENSE | ||
| main.py | ||
| performers_dump.json | ||
| pyproject.toml | ||
| README.md | ||
| requirements.txt | ||
| VERSION | ||
🧠 PornPics Gallery Importer (Porndex System)
Version 0.4.2 — Unified Importer & ML Pipeline
A modular and well-documented gallery importer for PornPics.com built for the Porndex ecosystem.
Supports importing, tagging, metadata enrichment, and machine learning–ready dataset generation.
📂 Project Structure
src/ → Core source ├── importer/ → Gallery importers, tag tools, and TPDB bridge │ ├── cli.py → Unified CLI (porndex-importer) │ ├── gallery_importer.py → Gallery parsing/downloading │ ├── tag_gallery.py → Tag management & YAML dictionaries │ ├── reports/ → Tag and enrichment summaries │ ├── db/ → Cached sources & enrichment data │ ├── secrets/ → API keys and credentials (ignored in Git) │ └── tag_dictionaries/ → YAML-based tag definitions │ ├── ml/ → Machine learning modules │ ├── ml_dataset_builder.py → Build JSONL dataset │ ├── ml_embeddings.py → Generate CLIP+Text embeddings │ ├── ml_dataset_inspector.py → Inspect or visualize dataset (planned) │ └── ml_vision_detector.py → GroundingDINO + SAM integration (planned) │ ├── docs/ → Documentation & changelogs ├── tests/ → Unit and integration tests └── assets/ → Static data or sample media
yaml Copy code
⚙️ Setup
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
Then, from the root of the project:
bash
Copy code
export PYTHONPATH=src
🚀 Quick Start
Import a Gallery
bash
Copy code
porndex-importer import "https://www.pornpics.com/galleries/example-gallery-id/"
Automatically:
Downloads images and metadata
Saves to Galleries/<timestamp>_<model>_<title>/
Creates metadata.json
Runs auto-tagging (refresh-one)
Updates library index
🧩 Core Features
Feature Description
Importer Downloads and parses galleries from PornPics
Auto-Tagging Generates tags based on YAML dictionaries
Metadata Refresh Updates all galleries with new metadata
Source Management Track and bulk-update content sources
CLI Tool Unified command: porndex-importer
TPDB Bridge Enrich performers and metadata via ThePornDB API
ML Dataset Builder Generates a unified dataset (JSONL)
Hybrid Embeddings Builds combined CLIP + text vectors for semantic search
🤖 Machine Learning Pipeline
1️⃣ Build Dataset
bash
Copy code
python -m ml.ml_dataset_builder
Creates:
bash
Copy code
ML/porndex_dataset.jsonl
Each record includes title, models, tags, and full image paths (no file duplication).
2️⃣ Build Embeddings
bash
Copy code
python -m ml.ml_embeddings build --img-samples 8 --device auto
Generates:
bash
Copy code
ML/embeddings/<gallery_id>.npz
ML/embeddings_index.jsonl
Uses:
SentenceTransformer for text
OpenCLIP (ViT-B/32) for images
and produces a combined hybrid vector.
3️⃣ Search Your Library
bash
Copy code
# Semantic search (default)
python -m ml.ml_embeddings search "japanese redhead creampie"
# Strict literal search
python -m ml.ml_embeddings search "interracial bbc" --mode strict
4️⃣ Verify Integrity
bash
Copy code
python -m ml.ml_embeddings verify
Displays:
Total indexed records
Images sampled
NPZ validation summary
🧠 Development Guidelines
No emojis in code or commits.
Use descriptive variable names.
Commit only verified working features.
Document all new features in docs/CHANGELOG.md.
Keep docs and CLI output in sync with docs/CLI_USAGE.md.
🗺️ Roadmap (v0.4.x → v0.5.x)
Stage Feature Description
✅ ML Embedding Search Hybrid text+image similarity
⚙️ Gender & Ethnicity Detection Person-level classification
⏳ GroundingDINO Integration Object/region localization
⏳ Grounded SAM + BLIP Visual attribute extraction (clothing, actions)
🔜 Active Learning Re-train from gallery metadata and tags
📄 License
MIT — Internal Research Use Only
Author: Leak Technologies