172 lines
4.5 KiB
Markdown
172 lines
4.5 KiB
Markdown
# 🎩 PornPics Importer — CLI Usage Guide
|
|
### Version 0.4.2 — Import, Auto-Tag & ML Integration
|
|
|
|
---
|
|
|
|
## 📦 Overview
|
|
|
|
Tooling to:
|
|
|
|
- Import & refresh PornPics.com galleries
|
|
- Auto-tag via YAML dictionaries and unified CLI
|
|
- Manage sources/tags and gallery index
|
|
- Build ML datasets & hybrid (text+image) embeddings
|
|
- Run semantic / strict search over your library
|
|
|
|
Project root:
|
|
~/Projects/PD/PornPics_Importer/Porndex_PornpicsImporter/
|
|
|
|
yaml
|
|
Copy code
|
|
|
|
---
|
|
|
|
## 🧬 1) Importing Galleries
|
|
|
|
### Quick Import (preferred)
|
|
|
|
```bash
|
|
porndex-importer import "https://www.pornpics.com/galleries/<gallery-id>/"
|
|
What happens:
|
|
|
|
Saves to Galleries/<timestamp>_<models>_<title>/
|
|
|
|
Downloads images (threaded) and writes metadata.json
|
|
|
|
Auto-tags the gallery (refresh-one) and rebuilds the index
|
|
|
|
Prints a colorized gallery summary
|
|
|
|
Legacy (direct script)
|
|
bash
|
|
Copy code
|
|
python src/importer/gallery_importer.py "https://www.pornpics.com/galleries/<gallery-id>/"
|
|
🔁 2) Refreshing Metadata
|
|
Refresh all galleries
|
|
bash
|
|
Copy code
|
|
python src/importer/gallery_importer.py --refresh-all
|
|
Re-fetches metadata for every gallery with source_url
|
|
|
|
Merges fields (preserves local tags)
|
|
|
|
Auto-reapplies tag inference
|
|
|
|
Updates Galleries/index.json
|
|
|
|
🎟️ 3) Tag Management (via unified CLI)
|
|
bash
|
|
Copy code
|
|
porndex-importer <command> [args...]
|
|
Common commands:
|
|
|
|
Action Command
|
|
Refresh all porndex-importer refresh-all
|
|
Refresh one porndex-importer refresh-one "<folder>"
|
|
Validate YAML dictionaries porndex-importer validate-tags
|
|
Tag statistics (reports to /src/importer/reports) porndex-importer tag-stats
|
|
List galleries porndex-importer list
|
|
List tags (one) porndex-importer list-tags "<folder>"
|
|
Add tag porndex-importer add "<folder>" "TagName"
|
|
Remove tag porndex-importer remove "<folder>" "TagName"
|
|
Add multiple porndex-importer add-multi "<folder>" "Tag1,Tag2"
|
|
Show metadata porndex-importer show-metadata "<folder>"
|
|
Set source (single) porndex-importer source "<folder>" set "Brazzers"
|
|
Set source (bulk) porndex-importer source bulk set "PornPics"
|
|
|
|
Tag inference uses YAML dictionaries under src/importer/tag_dictionaries/ (clothing, acts, body, context, etc.).
|
|
|
|
🧫 4) TPDB Performer Bridge (optional)
|
|
bash
|
|
Copy code
|
|
python -m performers.tpdb_bridge <cmd> [flags]
|
|
Highlights:
|
|
|
|
check-key, fetch, fill-index, enrich, sync-all
|
|
|
|
list-sources, add-source, delete-source
|
|
|
|
verify-enrichment --export-json
|
|
|
|
Stores DB at src/importer/db/performers.db and reports to src/importer/reports/.
|
|
|
|
⚙️ 5) Example Workflow
|
|
bash
|
|
Copy code
|
|
# Import a gallery
|
|
porndex-importer import "https://www.pornpics.com/galleries/<id>/"
|
|
|
|
# Refresh tags for one folder (if you edited metadata)
|
|
porndex-importer refresh-one "<folder-name>"
|
|
|
|
# Validate YAML dictionaries
|
|
porndex-importer validate-tags
|
|
|
|
# Build tag stats
|
|
porndex-importer tag-stats
|
|
🤖 6) Machine Learning (ML) Pipeline
|
|
Build dataset (reads from Galleries/, no file moves)
|
|
bash
|
|
Copy code
|
|
python -m ml.ml_dataset_builder
|
|
Creates ML/porndex_dataset.jsonl with records:
|
|
|
|
json
|
|
Copy code
|
|
{
|
|
"gallery_id": "...",
|
|
"title": "...",
|
|
"models": ["..."],
|
|
"tags": ["..."],
|
|
"categories": ["..."],
|
|
"image_paths": [".../Galleries/.../images/001.jpg", "..."]
|
|
}
|
|
Build hybrid embeddings (text + image)
|
|
bash
|
|
Copy code
|
|
python -m ml.ml_embeddings build --img-samples 8 --device auto
|
|
Outputs:
|
|
|
|
ML/embeddings/<gallery_id>.npz (text / image / combined vectors)
|
|
|
|
ML/embeddings_index.jsonl
|
|
|
|
Search (semantic / text / strict)
|
|
bash
|
|
Copy code
|
|
# Hybrid semantic (default)
|
|
python -m ml.ml_embeddings search "japanese redhead creampie"
|
|
|
|
# Text-only space
|
|
python -m ml.ml_embeddings search "japanese redhead creampie" --index text
|
|
|
|
# Strict keyword pre-filter (title/tags must include all tokens)
|
|
python -m ml.ml_embeddings search "interracial bbc" --mode strict
|
|
Verify
|
|
bash
|
|
Copy code
|
|
python -m ml.ml_embeddings verify
|
|
🗂️ 7) Data Locations
|
|
Path Purpose
|
|
Galleries/ Imported galleries (images + metadata)
|
|
Galleries/index.json Library index
|
|
src/importer/reports/ Tag stats & TPDB reports
|
|
ML/porndex_dataset.jsonl ML dataset source
|
|
ML/embeddings/ NPZ vectors
|
|
ML/embeddings_index.jsonl Search index
|
|
|
|
🧭 8) Roadmap (post-v0.4.2)
|
|
GroundingDINO + Grounded-SAM for localized detections (people, clothing)
|
|
|
|
Attribute heads for gender → ethnicity → clothing brand (e.g., socks)
|
|
|
|
Active-learning loop to leverage existing metadata as weak labels
|
|
|
|
🧾 Notes
|
|
All commands are local/offline friendly.
|
|
|
|
Rebuilding dataset/embeddings is safe and idempotent.
|
|
|
|
Importer auto-tags on import/refresh using the YAML dictionaries.
|
|
|
|
Author: Leak Technologies • License: MIT (internal research) |