Goondex/docs/CLI_USAGE.md

172 lines
4.5 KiB
Markdown

# 🎩 PornPics Importer — CLI Usage Guide
### Version 0.4.2 — Import, Auto-Tag & ML Integration
---
## 📦 Overview
Tooling to:
- Import & refresh PornPics.com galleries
- Auto-tag via YAML dictionaries and unified CLI
- Manage sources/tags and gallery index
- Build ML datasets & hybrid (text+image) embeddings
- Run semantic / strict search over your library
Project root:
~/Projects/PD/PornPics_Importer/Porndex_PornpicsImporter/
yaml
Copy code
---
## 🧬 1) Importing Galleries
### Quick Import (preferred)
```bash
porndex-importer import "https://www.pornpics.com/galleries/<gallery-id>/"
What happens:
Saves to Galleries/<timestamp>_<models>_<title>/
Downloads images (threaded) and writes metadata.json
Auto-tags the gallery (refresh-one) and rebuilds the index
Prints a colorized gallery summary
Legacy (direct script)
bash
Copy code
python src/importer/gallery_importer.py "https://www.pornpics.com/galleries/<gallery-id>/"
🔁 2) Refreshing Metadata
Refresh all galleries
bash
Copy code
python src/importer/gallery_importer.py --refresh-all
Re-fetches metadata for every gallery with source_url
Merges fields (preserves local tags)
Auto-reapplies tag inference
Updates Galleries/index.json
🎟️ 3) Tag Management (via unified CLI)
bash
Copy code
porndex-importer <command> [args...]
Common commands:
Action Command
Refresh all porndex-importer refresh-all
Refresh one porndex-importer refresh-one "<folder>"
Validate YAML dictionaries porndex-importer validate-tags
Tag statistics (reports to /src/importer/reports) porndex-importer tag-stats
List galleries porndex-importer list
List tags (one) porndex-importer list-tags "<folder>"
Add tag porndex-importer add "<folder>" "TagName"
Remove tag porndex-importer remove "<folder>" "TagName"
Add multiple porndex-importer add-multi "<folder>" "Tag1,Tag2"
Show metadata porndex-importer show-metadata "<folder>"
Set source (single) porndex-importer source "<folder>" set "Brazzers"
Set source (bulk) porndex-importer source bulk set "PornPics"
Tag inference uses YAML dictionaries under src/importer/tag_dictionaries/ (clothing, acts, body, context, etc.).
🧫 4) TPDB Performer Bridge (optional)
bash
Copy code
python -m performers.tpdb_bridge <cmd> [flags]
Highlights:
check-key, fetch, fill-index, enrich, sync-all
list-sources, add-source, delete-source
verify-enrichment --export-json
Stores DB at src/importer/db/performers.db and reports to src/importer/reports/.
⚙️ 5) Example Workflow
bash
Copy code
# Import a gallery
porndex-importer import "https://www.pornpics.com/galleries/<id>/"
# Refresh tags for one folder (if you edited metadata)
porndex-importer refresh-one "<folder-name>"
# Validate YAML dictionaries
porndex-importer validate-tags
# Build tag stats
porndex-importer tag-stats
🤖 6) Machine Learning (ML) Pipeline
Build dataset (reads from Galleries/, no file moves)
bash
Copy code
python -m ml.ml_dataset_builder
Creates ML/porndex_dataset.jsonl with records:
json
Copy code
{
"gallery_id": "...",
"title": "...",
"models": ["..."],
"tags": ["..."],
"categories": ["..."],
"image_paths": [".../Galleries/.../images/001.jpg", "..."]
}
Build hybrid embeddings (text + image)
bash
Copy code
python -m ml.ml_embeddings build --img-samples 8 --device auto
Outputs:
ML/embeddings/<gallery_id>.npz (text / image / combined vectors)
ML/embeddings_index.jsonl
Search (semantic / text / strict)
bash
Copy code
# Hybrid semantic (default)
python -m ml.ml_embeddings search "japanese redhead creampie"
# Text-only space
python -m ml.ml_embeddings search "japanese redhead creampie" --index text
# Strict keyword pre-filter (title/tags must include all tokens)
python -m ml.ml_embeddings search "interracial bbc" --mode strict
Verify
bash
Copy code
python -m ml.ml_embeddings verify
🗂️ 7) Data Locations
Path Purpose
Galleries/ Imported galleries (images + metadata)
Galleries/index.json Library index
src/importer/reports/ Tag stats & TPDB reports
ML/porndex_dataset.jsonl ML dataset source
ML/embeddings/ NPZ vectors
ML/embeddings_index.jsonl Search index
🧭 8) Roadmap (post-v0.4.2)
GroundingDINO + Grounded-SAM for localized detections (people, clothing)
Attribute heads for gender → ethnicity → clothing brand (e.g., socks)
Active-learning loop to leverage existing metadata as weak labels
🧾 Notes
All commands are local/offline friendly.
Rebuilding dataset/embeddings is safe and idempotent.
Importer auto-tags on import/refresh using the YAML dictionaries.
Author: Leak Technologies • License: MIT (internal research)