Goondex/docs/CLI_USAGE.md

4.5 KiB

🎩 PornPics Importer — CLI Usage Guide

Version 0.4.2 — Import, Auto-Tag & ML Integration


📦 Overview

Tooling to:

  • Import & refresh PornPics.com galleries
  • Auto-tag via YAML dictionaries and unified CLI
  • Manage sources/tags and gallery index
  • Build ML datasets & hybrid (text+image) embeddings
  • Run semantic / strict search over your library

Project root: ~/Projects/PD/PornPics_Importer/Porndex_PornpicsImporter/

yaml Copy code


🧬 1) Importing Galleries

Quick Import (preferred)

porndex-importer import "https://www.pornpics.com/galleries/<gallery-id>/"
What happens:

Saves to Galleries/<timestamp>_<models>_<title>/

Downloads images (threaded) and writes metadata.json

Auto-tags the gallery (refresh-one) and rebuilds the index

Prints a colorized gallery summary

Legacy (direct script)
bash
Copy code
python src/importer/gallery_importer.py "https://www.pornpics.com/galleries/<gallery-id>/"
🔁 2) Refreshing Metadata
Refresh all galleries
bash
Copy code
python src/importer/gallery_importer.py --refresh-all
Re-fetches metadata for every gallery with source_url

Merges fields (preserves local tags)

Auto-reapplies tag inference

Updates Galleries/index.json

🎟️ 3) Tag Management (via unified CLI)
bash
Copy code
porndex-importer <command> [args...]
Common commands:

Action	Command
Refresh all	porndex-importer refresh-all
Refresh one	porndex-importer refresh-one "<folder>"
Validate YAML dictionaries	porndex-importer validate-tags
Tag statistics (reports to /src/importer/reports)	porndex-importer tag-stats
List galleries	porndex-importer list
List tags (one)	porndex-importer list-tags "<folder>"
Add tag	porndex-importer add "<folder>" "TagName"
Remove tag	porndex-importer remove "<folder>" "TagName"
Add multiple	porndex-importer add-multi "<folder>" "Tag1,Tag2"
Show metadata	porndex-importer show-metadata "<folder>"
Set source (single)	porndex-importer source "<folder>" set "Brazzers"
Set source (bulk)	porndex-importer source bulk set "PornPics"

Tag inference uses YAML dictionaries under src/importer/tag_dictionaries/ (clothing, acts, body, context, etc.).

🧫 4) TPDB Performer Bridge (optional)
bash
Copy code
python -m performers.tpdb_bridge <cmd> [flags]
Highlights:

check-key, fetch, fill-index, enrich, sync-all

list-sources, add-source, delete-source

verify-enrichment --export-json

Stores DB at src/importer/db/performers.db and reports to src/importer/reports/.

⚙️ 5) Example Workflow
bash
Copy code
# Import a gallery
porndex-importer import "https://www.pornpics.com/galleries/<id>/"

# Refresh tags for one folder (if you edited metadata)
porndex-importer refresh-one "<folder-name>"

# Validate YAML dictionaries
porndex-importer validate-tags

# Build tag stats
porndex-importer tag-stats
🤖 6) Machine Learning (ML) Pipeline
Build dataset (reads from Galleries/, no file moves)
bash
Copy code
python -m ml.ml_dataset_builder
Creates ML/porndex_dataset.jsonl with records:

json
Copy code
{
  "gallery_id": "...",
  "title": "...",
  "models": ["..."],
  "tags": ["..."],
  "categories": ["..."],
  "image_paths": [".../Galleries/.../images/001.jpg", "..."]
}
Build hybrid embeddings (text + image)
bash
Copy code
python -m ml.ml_embeddings build --img-samples 8 --device auto
Outputs:

ML/embeddings/<gallery_id>.npz (text / image / combined vectors)

ML/embeddings_index.jsonl

Search (semantic / text / strict)
bash
Copy code
# Hybrid semantic (default)
python -m ml.ml_embeddings search "japanese redhead creampie"

# Text-only space
python -m ml.ml_embeddings search "japanese redhead creampie" --index text

# Strict keyword pre-filter (title/tags must include all tokens)
python -m ml.ml_embeddings search "interracial bbc" --mode strict
Verify
bash
Copy code
python -m ml.ml_embeddings verify
🗂️ 7) Data Locations
Path	Purpose
Galleries/	Imported galleries (images + metadata)
Galleries/index.json	Library index
src/importer/reports/	Tag stats & TPDB reports
ML/porndex_dataset.jsonl	ML dataset source
ML/embeddings/	NPZ vectors
ML/embeddings_index.jsonl	Search index

🧭 8) Roadmap (post-v0.4.2)
GroundingDINO + Grounded-SAM for localized detections (people, clothing)

Attribute heads for gender → ethnicity → clothing brand (e.g., socks)

Active-learning loop to leverage existing metadata as weak labels

🧾 Notes
All commands are local/offline friendly.

Rebuilding dataset/embeddings is safe and idempotent.

Importer auto-tags on import/refresh using the YAML dictionaries.

Author: Leak Technologies • License: MIT (internal research)