Goondex/docs/CLI_USAGE.md

185 lines
6.1 KiB
Markdown

File: docs/CLI_USAGE.md
Version: v0.4.2
Last updated: November 2025
Maintainer: Leak Technologies
Project: Goondex
------------------------------------------------------------
Goondex CLI Usage Guide
------------------------------------------------------------
Purpose:
Provide a full command reference for importing, tagging, validating, and searching PornPics galleries using the Goondex command-line interface.
------------------------------------------------------------
Overview
------------------------------------------------------------
The Goondex CLI provides a unified workflow to:
1. Import and refresh PornPics galleries
2. Automatically tag galleries using YAML dictionaries
3. Manage sources and metadata through a single command entrypoint
4. Generate statistics and validation reports
5. Build and search machine learning datasets (hybrid text + image)
Project root:
~/Projects/PD/PornPics_Importer/Porndex_PornpicsImporter/
------------------------------------------------------------
1. Importing Galleries
------------------------------------------------------------
Quick Import (preferred):
goondex import "https://www.pornpics.com/galleries/<gallery-id>/"
Process:
- Creates a new folder in Galleries/<timestamp>_<models>_<title>/
- Downloads all images (threaded)
- Saves metadata.json
- Auto-tags the gallery using refresh-one
- Rebuilds the global index
Legacy method:
python src/importer/gallery_importer.py "https://www.pornpics.com/galleries/<gallery-id>/"
------------------------------------------------------------
2. Refreshing Metadata
------------------------------------------------------------
Refresh all galleries:
python src/importer/gallery_importer.py --refresh-all
Function:
- Re-fetches metadata for every gallery that has a source_url
- Merges new fields without overwriting local tags
- Automatically re-applies tag inference
- Rebuilds Galleries/index.json
------------------------------------------------------------
3. Tag Management
------------------------------------------------------------
Unified syntax:
goondex <command> [args...]
Common operations:
refresh-all → refresh tags for all galleries
refresh-one "<folder>" → refresh tags for a single gallery
validate-tags → validate YAML tag dictionaries
tag-stats → generate frequency report (saved to src/importer/reports)
list → list all galleries
list-tags "<folder>" → show tags for one gallery
add "<folder>" "Tag" → add a tag manually
remove "<folder>" "Tag" → remove a tag manually
add-multi "<folder>" "Tag1,Tag2" → add multiple tags at once
show-metadata "<folder>" → view metadata.json content
source "<folder>" set "Source" → set a single source
source bulk set "Source" → set the same source for all galleries
Tag inference uses YAML dictionaries stored under:
src/importer/tag_dictionaries/
------------------------------------------------------------
4. TPDB Performer Bridge (optional)
------------------------------------------------------------
Command:
python -m performers.tpdb_bridge <cmd> [flags]
Common flags:
check-key, fetch, fill-index, enrich, sync-all
list-sources, add-source, delete-source
verify-enrichment --export-json
Database:
src/importer/db/performers.db
Reports:
src/importer/reports/
------------------------------------------------------------
5. Example Workflow
------------------------------------------------------------
Import a gallery:
goondex import "https://www.pornpics.com/galleries/<id>/"
Refresh tags for one folder:
goondex refresh-one "<folder-name>"
Validate YAML dictionaries:
goondex validate-tags
Generate tag statistics:
goondex tag-stats
------------------------------------------------------------
6. Machine Learning (ML) Pipeline
------------------------------------------------------------
Dataset builder:
python -m ml.ml_dataset_builder
Creates file:
ML/porndex_dataset.jsonl
Example entry:
{
"gallery_id": "...",
"title": "...",
"models": ["..."],
"tags": ["..."],
"categories": ["..."],
"image_paths": [".../Galleries/.../001.jpg"]
}
Build hybrid embeddings:
python -m ml.ml_embeddings build --img-samples 8 --device auto
Outputs:
ML/embeddings/<gallery_id>.npz
ML/embeddings_index.jsonl
Search modes:
python -m ml.ml_embeddings search "japanese redhead creampie"
python -m ml.ml_embeddings search "japanese redhead creampie" --index text
python -m ml.ml_embeddings search "interracial bbc" --mode strict
Verify embedding integrity:
python -m ml.ml_embeddings verify
------------------------------------------------------------
7. Data Locations
------------------------------------------------------------
Galleries/ → imported galleries and images
Galleries/index.json → master index of all galleries
src/importer/reports/ → YAML validation and statistics reports
ML/porndex_dataset.jsonl → ML dataset definition
ML/embeddings/ → embedding vector files
ML/embeddings_index.jsonl → search index for semantic lookups
------------------------------------------------------------
8. Roadmap (post-v0.4.2)
------------------------------------------------------------
- Integrate GroundingDINO + Grounded-SAM for localized object detection
- Add attribute heads for gender, ethnicity, and clothing
- Develop an active-learning loop to refine weakly-labeled data
- Introduce interactive tag editor for review and correction
------------------------------------------------------------
Notes
------------------------------------------------------------
All commands operate locally and offline.
Rebuilding datasets and embeddings is safe and idempotent.
Importer auto-tags new galleries using YAML dictionaries by default.
All modules adhere to the clean modular design outlined in ARCHITECTURE.md.
Versioned documentation ensures clarity between CLI and code versions.
------------------------------------------------------------
End of File
------------------------------------------------------------