File: docs/CLI_USAGE.md Version: v0.4.2 Last updated: November 2025 Maintainer: Leak Technologies Project: Goondex ------------------------------------------------------------ Goondex CLI Usage Guide ------------------------------------------------------------ Purpose: Provide a full command reference for importing, tagging, validating, and searching PornPics galleries using the Goondex command-line interface. ------------------------------------------------------------ Overview ------------------------------------------------------------ The Goondex CLI provides a unified workflow to: 1. Import and refresh PornPics galleries 2. Automatically tag galleries using YAML dictionaries 3. Manage sources and metadata through a single command entrypoint 4. Generate statistics and validation reports 5. Build and search machine learning datasets (hybrid text + image) Project root: ~/Projects/PD/PornPics_Importer/Porndex_PornpicsImporter/ ------------------------------------------------------------ 1. Importing Galleries ------------------------------------------------------------ Quick Import (preferred): goondex import "https://www.pornpics.com/galleries//" Process: - Creates a new folder in Galleries/__/ - Downloads all images (threaded) - Saves metadata.json - Auto-tags the gallery using refresh-one - Rebuilds the global index Legacy method: python src/importer/gallery_importer.py "https://www.pornpics.com/galleries/<gallery-id>/" ------------------------------------------------------------ 2. Refreshing Metadata ------------------------------------------------------------ Refresh all galleries: python src/importer/gallery_importer.py --refresh-all Function: - Re-fetches metadata for every gallery that has a source_url - Merges new fields without overwriting local tags - Automatically re-applies tag inference - Rebuilds Galleries/index.json ------------------------------------------------------------ 3. Tag Management ------------------------------------------------------------ Unified syntax: goondex <command> [args...] Common operations: refresh-all → refresh tags for all galleries refresh-one "<folder>" → refresh tags for a single gallery validate-tags → validate YAML tag dictionaries tag-stats → generate frequency report (saved to src/importer/reports) list → list all galleries list-tags "<folder>" → show tags for one gallery add "<folder>" "Tag" → add a tag manually remove "<folder>" "Tag" → remove a tag manually add-multi "<folder>" "Tag1,Tag2" → add multiple tags at once show-metadata "<folder>" → view metadata.json content source "<folder>" set "Source" → set a single source source bulk set "Source" → set the same source for all galleries Tag inference uses YAML dictionaries stored under: src/importer/tag_dictionaries/ ------------------------------------------------------------ 4. TPDB Performer Bridge (optional) ------------------------------------------------------------ Command: python -m performers.tpdb_bridge <cmd> [flags] Common flags: check-key, fetch, fill-index, enrich, sync-all list-sources, add-source, delete-source verify-enrichment --export-json Database: src/importer/db/performers.db Reports: src/importer/reports/ ------------------------------------------------------------ 5. Example Workflow ------------------------------------------------------------ Import a gallery: goondex import "https://www.pornpics.com/galleries/<id>/" Refresh tags for one folder: goondex refresh-one "<folder-name>" Validate YAML dictionaries: goondex validate-tags Generate tag statistics: goondex tag-stats ------------------------------------------------------------ 6. Machine Learning (ML) Pipeline ------------------------------------------------------------ Dataset builder: python -m ml.ml_dataset_builder Creates file: ML/porndex_dataset.jsonl Example entry: { "gallery_id": "...", "title": "...", "models": ["..."], "tags": ["..."], "categories": ["..."], "image_paths": [".../Galleries/.../001.jpg"] } Build hybrid embeddings: python -m ml.ml_embeddings build --img-samples 8 --device auto Outputs: ML/embeddings/<gallery_id>.npz ML/embeddings_index.jsonl Search modes: python -m ml.ml_embeddings search "japanese redhead creampie" python -m ml.ml_embeddings search "japanese redhead creampie" --index text python -m ml.ml_embeddings search "interracial bbc" --mode strict Verify embedding integrity: python -m ml.ml_embeddings verify ------------------------------------------------------------ 7. Data Locations ------------------------------------------------------------ Galleries/ → imported galleries and images Galleries/index.json → master index of all galleries src/importer/reports/ → YAML validation and statistics reports ML/porndex_dataset.jsonl → ML dataset definition ML/embeddings/ → embedding vector files ML/embeddings_index.jsonl → search index for semantic lookups ------------------------------------------------------------ 8. Roadmap (post-v0.4.2) ------------------------------------------------------------ - Integrate GroundingDINO + Grounded-SAM for localized object detection - Add attribute heads for gender, ethnicity, and clothing - Develop an active-learning loop to refine weakly-labeled data - Introduce interactive tag editor for review and correction ------------------------------------------------------------ Notes ------------------------------------------------------------ All commands operate locally and offline. Rebuilding datasets and embeddings is safe and idempotent. Importer auto-tags new galleries using YAML dictionaries by default. All modules adhere to the clean modular design outlined in ARCHITECTURE.md. Versioned documentation ensures clarity between CLI and code versions. ------------------------------------------------------------ End of File ------------------------------------------------------------