From 35142587c5690105fdfe83ad0874dd089f585bca Mon Sep 17 00:00:00 2001 From: Team Goon Date: Thu, 6 Nov 2025 14:29:21 -0500 Subject: [PATCH] v0.3.4-docs-update: finalized documentation suite and version file --- LICENSE | 0 README.md | 145 ---------- assets/logo/GOONDEX_logo.svg | 46 ++++ docs/ARCHITECTURE.md | 142 ++++++++++ docs/BRANDING.md | 109 ++++++++ docs/CHANGELOG.md | 503 +++++++++++------------------------ docs/CLI_USAGE.md | 254 +++++++++--------- docs/GALLERIES.md | 197 ++++++++++++++ docs/HISTORY.md | 76 ++++++ docs/LICENSE | 69 +++++ docs/README.md | 176 ++++++++++++ docs/ROADMAP.md | 160 +++++++++++ docs/TAGGING.md | 180 +++++++++++++ 13 files changed, 1447 insertions(+), 610 deletions(-) delete mode 100644 LICENSE delete mode 100644 README.md create mode 100644 assets/logo/GOONDEX_logo.svg create mode 100644 docs/ARCHITECTURE.md create mode 100644 docs/BRANDING.md create mode 100644 docs/GALLERIES.md create mode 100644 docs/HISTORY.md create mode 100644 docs/LICENSE create mode 100644 docs/README.md create mode 100644 docs/ROADMAP.md create mode 100644 docs/TAGGING.md diff --git a/LICENSE b/LICENSE deleted file mode 100644 index e69de29..0000000 diff --git a/README.md b/README.md deleted file mode 100644 index 5709763..0000000 --- a/README.md +++ /dev/null @@ -1,145 +0,0 @@ -# 🧠 PornPics Gallery Importer (Porndex System) -**Version 0.4.2 β€” Unified Importer & ML Pipeline** - -A modular and well-documented gallery importer for [PornPics.com](https://www.pornpics.com) built for the **Porndex** ecosystem. -Supports importing, tagging, metadata enrichment, and machine learning–ready dataset generation. - ---- - -## πŸ“‚ Project Structure - -src/ β†’ Core source -β”œβ”€β”€ importer/ β†’ Gallery importers, tag tools, and TPDB bridge -β”‚ β”œβ”€β”€ cli.py β†’ Unified CLI (porndex-importer) -β”‚ β”œβ”€β”€ gallery_importer.py β†’ Gallery parsing/downloading -β”‚ β”œβ”€β”€ tag_gallery.py β†’ Tag management & YAML dictionaries -β”‚ β”œβ”€β”€ reports/ β†’ Tag and enrichment summaries -β”‚ β”œβ”€β”€ db/ β†’ Cached sources & enrichment data -β”‚ β”œβ”€β”€ secrets/ β†’ API keys and credentials (ignored in Git) -β”‚ └── tag_dictionaries/ β†’ YAML-based tag definitions -β”‚ -β”œβ”€β”€ ml/ β†’ Machine learning modules -β”‚ β”œβ”€β”€ ml_dataset_builder.py β†’ Build JSONL dataset -β”‚ β”œβ”€β”€ ml_embeddings.py β†’ Generate CLIP+Text embeddings -β”‚ β”œβ”€β”€ ml_dataset_inspector.py β†’ Inspect or visualize dataset (planned) -β”‚ └── ml_vision_detector.py β†’ GroundingDINO + SAM integration (planned) -β”‚ -β”œβ”€β”€ docs/ β†’ Documentation & changelogs -β”œβ”€β”€ tests/ β†’ Unit and integration tests -└── assets/ β†’ Static data or sample media - -yaml -Copy code - ---- - -## βš™οΈ Setup - -```bash -python3 -m venv .venv -source .venv/bin/activate -pip install -r requirements.txt -Then, from the root of the project: - -bash -Copy code -export PYTHONPATH=src -πŸš€ Quick Start -Import a Gallery -bash -Copy code -porndex-importer import "https://www.pornpics.com/galleries/example-gallery-id/" -Automatically: - -Downloads images and metadata - -Saves to Galleries/__/ - -Creates metadata.json - -Runs auto-tagging (refresh-one) - -Updates library index - -🧩 Core Features -Feature Description -Importer Downloads and parses galleries from PornPics -Auto-Tagging Generates tags based on YAML dictionaries -Metadata Refresh Updates all galleries with new metadata -Source Management Track and bulk-update content sources -CLI Tool Unified command: porndex-importer -TPDB Bridge Enrich performers and metadata via ThePornDB API -ML Dataset Builder Generates a unified dataset (JSONL) -Hybrid Embeddings Builds combined CLIP + text vectors for semantic search - -πŸ€– Machine Learning Pipeline -1️⃣ Build Dataset -bash -Copy code -python -m ml.ml_dataset_builder -Creates: - -bash -Copy code -ML/porndex_dataset.jsonl -Each record includes title, models, tags, and full image paths (no file duplication). - -2️⃣ Build Embeddings -bash -Copy code -python -m ml.ml_embeddings build --img-samples 8 --device auto -Generates: - -bash -Copy code -ML/embeddings/<gallery_id>.npz -ML/embeddings_index.jsonl -Uses: - -SentenceTransformer for text - -OpenCLIP (ViT-B/32) for images -and produces a combined hybrid vector. - -3️⃣ Search Your Library -bash -Copy code -# Semantic search (default) -python -m ml.ml_embeddings search "japanese redhead creampie" - -# Strict literal search -python -m ml.ml_embeddings search "interracial bbc" --mode strict -4️⃣ Verify Integrity -bash -Copy code -python -m ml.ml_embeddings verify -Displays: - -Total indexed records - -Images sampled - -NPZ validation summary - -🧠 Development Guidelines -No emojis in code or commits. - -Use descriptive variable names. - -Commit only verified working features. - -Document all new features in docs/CHANGELOG.md. - -Keep docs and CLI output in sync with docs/CLI_USAGE.md. - -πŸ—ΊοΈ Roadmap (v0.4.x β†’ v0.5.x) -Stage Feature Description -βœ… ML Embedding Search Hybrid text+image similarity -βš™οΈ Gender & Ethnicity Detection Person-level classification -⏳ GroundingDINO Integration Object/region localization -⏳ Grounded SAM + BLIP Visual attribute extraction (clothing, actions) -πŸ”œ Active Learning Re-train from gallery metadata and tags - -πŸ“„ License -MIT β€” Internal Research Use Only -Author: Leak Technologies \ No newline at end of file diff --git a/assets/logo/GOONDEX_logo.svg b/assets/logo/GOONDEX_logo.svg new file mode 100644 index 0000000..dfb7360 --- /dev/null +++ b/assets/logo/GOONDEX_logo.svg @@ -0,0 +1,46 @@ +<?xml version="1.0" encoding="UTF-8" standalone="no"?> +<!-- Created with Inkscape (http://www.inkscape.org/) --> + +<svg + width="600" + height="180" + viewBox="0 0 600 180" + version="1.1" + id="svg1" + inkscape:version="1.4.2 (ebf0e940d0, 2025-05-08)" + sodipodi:docname="GOONDEX_logo.svg" + xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape" + xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd" + xmlns="http://www.w3.org/2000/svg" + xmlns:svg="http://www.w3.org/2000/svg"> + <sodipodi:namedview + id="namedview1" + pagecolor="#505050" + bordercolor="#eeeeee" + borderopacity="1" + inkscape:showpageshadow="0" + inkscape:pageopacity="0" + inkscape:pagecheckerboard="0" + inkscape:deskcolor="#505050" + inkscape:document-units="px" + inkscape:zoom="1.5483025" + inkscape:cx="269.00429" + inkscape:cy="227.02282" + inkscape:window-width="1920" + inkscape:window-height="1080" + inkscape:window-x="0" + inkscape:window-y="0" + inkscape:window-maximized="1" + inkscape:current-layer="layer1" /> + <defs + id="defs1" /> + <g + inkscape:label="Layer 1" + inkscape:groupmode="layer" + id="layer1"> + <path + id="path26" + style="font-size:48px;font-family:'Gmarket Sans';-inkscape-font-specification:'Gmarket Sans';fill:#ff5fa2;fill-opacity:1;stroke-width:0;stroke-linecap:round;paint-order:markers fill stroke" + d="m 89.6071,50.4885 c -23.10416,0 -39.60743,16.69024 -39.60743,39.51149 0,22.82124 16.50328,39.51151 39.60743,39.51151 22.06683,0 39.04336,-15.65212 39.04336,-37.90755 v -3.39592 h -40.6473 v 8.57996 h 30.08354 c -1.13163,13.67388 -12.63408,23.85962 -28.0997,23.85962 -17.6346,0 -30.08354,-12.82442 -30.08354,-30.64762 0,-17.72891 12.44569,-30.45956 29.89167,-30.45956 12.91947,0 23.57566,7.07122 26.68766,16.59581 h 10.75176 C 123.27385,61.70793 108.84487,50.48851 89.6071,50.4885 Z m 83.73122,0 c -22.91553,0 -39.79544,16.69024 -39.79544,39.51149 0,22.82124 16.97586,39.51151 39.89139,39.51151 11.29137,0 21.11695,-4.05219 28.18029,-10.89376 7.08654,6.84157 16.93497,10.89376 28.22632,10.89376 22.91556,0 39.79544,-16.69026 39.79544,-39.51151 0,-22.82124 -16.97196,-39.51149 -39.88752,-39.51149 -11.29138,0 -21.11698,4.05217 -28.18029,10.89376 -7.08655,-6.84159 -16.9388,-10.89376 -28.23019,-10.89376 z m 156.5227,1.32 v 59.3152 L 284.12556,52.28048 h -9.23996 v 75.53498 h 9.89995 V 68.50023 l 45.73537,58.84324 h 9.3359 V 51.80846 Z m 18.51061,0.47198 v 75.53499 h 26.7796 c 26.0276,0 41.1193,-15.1812 41.1193,-37.71954 0,-0.52548 -0.01,-1.04807 -0.027,-1.56558 -0.041,-1.2646 -0.1283,-2.50346 -0.2647,-3.71824 h -0.01 c -2.2059,-19.60648 -16.8839,-32.53165 -40.82,-32.53165 z m 74.6754,0 v 75.53499 h 54.5072 v -8.77182 h -44.6073 V 93.5839 h 40.4593 v -8.77179 h -40.4593 V 61.04843 h 43.7593 v -8.76795 z m 60.6582,0 26.3116,37.34349 -27.2555,38.1915 h 11.5998 l 21.9717,-30.93156 21.8797,30.93156 h 11.7878 l -27.3476,-38.47545 26.4036,-37.05954 h -11.5039 l -21.0277,29.79956 -20.8436,-29.79956 z m -310.36688,7.25996 c 9.05961,0 16.87419,3.48312 22.25184,9.3704 -3.60762,5.99921 -5.63683,13.17524 -5.63683,21.08915 0,7.89949 2.03062,15.06595 5.64066,21.05848 -5.36636,5.89579 -13.15175,9.40112 -22.15972,9.40112 -17.2574,0 -29.98763,-12.73069 -29.98763,-30.4596 0,-17.8232 12.63428,-30.45955 29.89168,-30.45955 z m 56.41046,0 c 17.25739,0 29.98763,12.63635 29.98763,30.45955 0,17.72891 -12.73244,30.45958 -29.89553,30.45958 -9.05209,0 -16.85927,-3.50401 -22.23648,-9.39344 3.59921,-5.99469 5.62531,-13.16204 5.62531,-21.06614 0,-7.91897 -2.04458,-15.09915 -5.67135,-21.10067 5.35263,-5.88029 13.13673,-9.35888 22.19042,-9.35888 z m 128.52272,1.60391 h 17.1637 c 17.2271,0 28.4182,8.55424 30.4864,23.66776 h -23.3339 v 8.77179 h 23.5335 c 0.098,-1.12825 0.1497,-2.29113 0.1497,-3.48797 0,1.19665 -0.052,2.35989 -0.1497,3.48797 -1.4059,16.20741 -12.7883,25.27173 -30.686,25.27173 h -17.1637 z M 201.58388,79.12157 c 1.1334,3.32559 1.74589,6.97798 1.74589,10.87842 0,3.87376 -0.60919,7.50886 -1.73821,10.82471 -1.13003,-3.31585 -1.73825,-6.95095 -1.73825,-10.82471 0,-3.90044 0.60423,-7.55286 1.73057,-10.87842 z m -28.99762,21.55347 c -2.1037,1.2e-4 -3.80849,1.70661 -3.80647,3.81032 9e-5,2.10223 1.70425,3.80637 3.80647,3.80649 2.10221,-1.2e-4 3.80637,-1.70426 3.80651,-3.80649 0.002,-2.10371 -1.70281,-3.8102 -3.80651,-3.81032 z m 56.4642,0 c -2.10372,1.2e-4 -3.80849,1.70661 -3.80651,3.81032 1.3e-4,2.10223 1.70425,3.80637 3.80651,3.80649 2.10222,-1.2e-4 3.80637,-1.70426 3.80649,-3.80649 0.002,-2.10371 -1.70279,-3.8102 -3.80649,-3.81032 z" /> + </g> +</svg> diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md new file mode 100644 index 0000000..cb8ee64 --- /dev/null +++ b/docs/ARCHITECTURE.md @@ -0,0 +1,142 @@ +File: docs/ARCHITECTURE.md +Version: v0.3.4 +Last updated: November 2025 +Maintainer: Leak Technologies +Project: Goondex + +------------------------------------------------------------ +Goondex System Architecture Overview +------------------------------------------------------------ + +Purpose: +This document outlines the internal structure, key modules, and data flow of Goondex. It defines how importer, tagging, and metadata systems interact, ensuring consistent development practices and clear separation of responsibilities across the codebase. + +------------------------------------------------------------ +High-Level Overview +------------------------------------------------------------ + +Goondex is a modular image importer and metadata indexer designed primarily for PornPics galleries. +Its long-term goal is to evolve into a general-purpose adult media cataloguing and tagging framework. + +Core functions: +1. Gallery importing and metadata storage +2. Automated tag inference from titles and descriptions +3. Performer and source enrichment (via ThePornDB) +4. Semantic search and ML dataset generation (planned v0.4.x) +5. CLI-driven operations with simple developer alias support + +------------------------------------------------------------ +Primary Directories +------------------------------------------------------------ + +src/importer/ + cli.py - Main command entrypoint for all user-facing operations. + gallery_importer.py - Handles gallery downloading, metadata management, and index updates. + tag_gallery.py - Manages tagging logic, inference, and validation. + tag_utils.py - Shared utilities for YAML parsing and tag validation. + index_builder.py - Rebuilds gallery index files after import or refresh. + fetch_gallery_metadata.py - Scrapes PornPics galleries for metadata and image URLs. + tpdb_bridge.py - Integrates with ThePornDB API for performer enrichment. + config/ - Contains YAML and JSON config templates for environment paths. + reports/ - Stores generated statistics, tag summaries, and validation logs. + tag_dictionaries/ - Modular YAML tag dictionaries (body, acts, clothing, context). + +docs/ + BRANDING.md - Defines visual and branding identity. + ARCHITECTURE.md - This file. + CHANGELOG.md - Version history and release notes. + ROADMAP.md - Planned features and milestones. + +------------------------------------------------------------ +Core Workflow +------------------------------------------------------------ + +1. Import + - User runs `goondex import <url>` or `python -m src.importer.cli import <url>`. + - The system fetches gallery metadata and image URLs. + - Metadata is saved to disk under Galleries/<timestamp>_<model_name>/. + - Images are downloaded using threaded requests. + +2. Tagging + - On import completion, automatic tagging runs via tag_gallery.py. + - Inferred tags are based on YAML dictionaries and keyword matches. + - Users can adjust tags manually or re-run `goondex refresh-one`. + +3. Indexing + - After import, index_builder.py rebuilds a global index for CLI listing. + - Index entries include title, models, source, and folder references. + +4. Enrichment (optional) + - ThePornDB bridge pulls performer metadata and merges it with local entries. + - Data is stored in a lightweight SQLite database for reusability. + +5. Validation + - Tag dictionaries are validated via `goondex validate-tags`. + - Reports are saved to /src/importer/reports/ for long-term tracking. + +------------------------------------------------------------ +Data Structure +------------------------------------------------------------ + +Each gallery folder includes: + metadata.json - Core descriptive data and tags. + failed_downloads.json (optional) - Log of skipped or failed images. + inferred_tags - Automatically detected tags stored separately from user edits. + source_url - Original import link for refresh operations. + +Metadata fields (core schema): + title + models + categories + tags + image_urls + source_url + views + rating + last_refreshed + +------------------------------------------------------------ +CLI Commands (as of v0.3.4) +------------------------------------------------------------ + +import <url> - Import new gallery from PornPics. +refresh-all - Refresh tags for all galleries. +refresh-one <folder> - Refresh tags for one gallery. +validate-tags - Validate YAML tag dictionaries. +tag-stats - Generate tag frequency report. +list - List all galleries. +list-tags <folder> - List tags for one gallery. +add <folder> <tag> - Add a tag manually. +remove <folder> <tag> - Remove a tag manually. +add-multi - Add multiple tags at once. +show-metadata - Display metadata.json contents. +source set - Set or bulk-set gallery source. + +------------------------------------------------------------ +Planned Evolution +------------------------------------------------------------ + +v0.4.x +- Introduce ML dataset builder for hybrid text and image embeddings. +- Implement semantic search with CLIP model integration. +- Support multiple site importers beyond PornPics. +- Add confidence scoring for auto-tagging accuracy. + +v0.5.x +- Implement Web UI with search, tag filters, and visual gallery grid. +- Introduce local model inference (GroundingDINO + SAM). +- Build API layer for remote clients. + +------------------------------------------------------------ +Design Philosophy +------------------------------------------------------------ + +Keep it modular, transparent, and locally maintainable. +Every import should leave a clean, readable data trail. +Avoid hard dependencies β€” keep Python standard library primary, with only essential external libraries (requests, tqdm, yaml). +All scripts must remain executable from both CLI and within the src context. +Maintain clean commit history and clearly versioned documentation (as done with this file). + +------------------------------------------------------------ +End of File +------------------------------------------------------------ diff --git a/docs/BRANDING.md b/docs/BRANDING.md new file mode 100644 index 0000000..fe95944 --- /dev/null +++ b/docs/BRANDING.md @@ -0,0 +1,109 @@ +File: docs/BRANDING.md +Version: v0.3.4 +Last updated: November 2025 +Maintainer: Leak Technologies +Project: Goondex + +------------------------------------------------------------ +Goondex Branding Guide +------------------------------------------------------------ + +Purpose: +Define the current visual identity and colour palette for Goondex, including logo interpretation, UI application, and consistency notes. This file is versioned and updated alongside major CLI and UI revisions. + +------------------------------------------------------------ +Colour Palette +------------------------------------------------------------ + +Primary Accent: Flamingo Pulse +Hex: #FF5FA2 +Usage: Core brand colour used in the logo, buttons, and key highlights. + +Secondary Accent: Electric Plum +Hex: #8C2F5C +Usage: Shadow tone and hover states. Complements Flamingo Pulse. + +Highlight Glow: Neon Rose +Hex: #FF9BCB +Usage: Used sparingly for glow or gradient highlight effects. + +Background (Dark): Deep Charcoal +Hex: #1E1E1E +Usage: Main dark-mode background for UI and presentation. + +Text (Primary): Soft Porcelain +Hex: #E9E9E9 +Usage: Default readable text colour on dark backgrounds. + +Text (Muted): Ash Gray +Hex: #B0B0B0 +Usage: Secondary text, metadata, or inactive UI states. + +Print Match (CMYK): +Flamingo Pulse β†’ C:0 M:70 Y:15 K:0 + +------------------------------------------------------------ +Logo Guidelines +------------------------------------------------------------ + +Wordmark: +Custom type featuring interlinked "OO" characters that form a playful, anatomical suggestion of breasts. +The "D" subtly mirrors a phallic shape, continuing the tongue-in-cheek theme. +The logo remains clean and geometric, ensuring the humour is implied, not explicit. + +Primary Colour: +Flamingo Pulse (#FF5FA2) + +Backgrounds: +Best contrast on Deep Charcoal (#1E1E1E) or near-black backgrounds. + +Minimum Spacing: +Maintain clear space around the logo equal to the height of the "G" on all sides. + +Optional Glow: +Apply Neon Rose (#FF9BCB) outer glow at 10–15% opacity for digital assets only. +Avoid glow in print use. + +------------------------------------------------------------ +Typography +------------------------------------------------------------ + +Preferred typefaces: +- Montserrat +- Poppins +- Satoshi + +Fallbacks: +- Sans-serif system fonts (Arial, Helvetica, etc.) + +All-caps for primary titles and logos. +Mixed case for documentation and UI elements. + +------------------------------------------------------------ +UI Application +------------------------------------------------------------ + +Buttons: +Flamingo Pulse base colour, with Electric Plum overlay on hover (10–20% opacity). + +Links and Highlights: +Flamingo Pulse. + +Cards and Panels: +Deep Charcoal background with 1px border in Electric Plum (20% opacity). + +Headers and Dividers: +Use Soft Porcelain for text and Electric Plum for thin divider accents. + +------------------------------------------------------------ +Notes +------------------------------------------------------------ + +Keep the palette consistent; avoid introducing new accent colours. +Ensure contrast ratios meet accessibility guidelines. +When printing, verify Flamingo Pulse tone accuracy using the CMYK approximation listed above. +Maintain a professional tone with playful undertones. The logo and colour scheme should feel adult, confident, and minimalist β€” never explicit. + +------------------------------------------------------------ +End of File +------------------------------------------------------------ diff --git a/docs/CHANGELOG.md b/docs/CHANGELOG.md index 6c4c9a0..a56ded1 100644 --- a/docs/CHANGELOG.md +++ b/docs/CHANGELOG.md @@ -1,344 +1,159 @@ -# πŸ“œ Goondex β€” Full Changelog -> **Repository:** Leak Technologies -> **Branch:** main -> **Version Line:** v0.3.x Development Cycle -> _Formerly: Porndex Importer (PornPics Importer Module)_ - - ---- - -## [v0.3.3] β€” Stable CLI Alias & Import Path Fix (2025-11-06) - -### ✨ Added -- Introduced unified `goondex` CLI alias, now functional across Fish, Bash, and Zsh shells. -- Added `--help` and `--version` flags with consistent colorized output. -- Standardized usage and examples block for user clarity. - -### πŸ›  Changed -- Refactored all internal imports to use absolute `src.importer.*` paths for compatibility. -- Updated `gallery_importer.py` to call `src.importer.tag_gallery` for subprocess calls. -- Simplified alias setup scripts under `/src/utils/install_alias.sh` and `/src/utils/install_alias.fish`. - -### 🧹 Maintenance -- Rebuilt virtual environment (`.venv`) and dependency tree under Python 3.13. -- Verified clean CLI operation with `goondex --version` and `goondex --help`. -- Confirmed consistent behavior across development and installed modes. - ---- - -## [v0.3.2-rebuild] β€” Repository Cleanup & Stabilization (2025-11-02) - -### ✨ Added -- Introduced project-wide `.gitignore` to exclude gallery media and model weights. -- Added `VERSION` file (v0.3.2) for synchronized CLI and metadata versioning. -- Implemented environment fix for Fish-shell virtualenv activation. -- Ensured unified `porndex` CLI entrypoint under `/src/importer/cli.py`. - -### 🧹 Maintenance -- Removed redundant and outdated tags (v0.3.0–v0.4.1) from remote. -- Normalized repository tree and re-pushed clean 4.6 GiB β†’ base v0.3.2. -- Prepared groundwork for `--help` and `--version` CLI arguments. - ---- - -## [v0.3.0] β€” Modular Tagging Framework Foundation (2025-10-18) - -### ✨ Added -- Introduced **YAML-based Tag Dictionaries** stored under `/src/importer/tagging/` for modular, human-readable tag definitions. -- Implemented initial **`refresh-all`** and **`refresh-one`** commands for reapplying tag inference to galleries. -- Added **persistent `inferred_tags` field** in `metadata.json` to differentiate between automated and manual tags. -- Implemented **automatic source inference** for known networks (e.g., Brazzers, FTV Girls, PornPics). -- Enhanced CLI output with colorized progress indicators and summary totals. - -### πŸ›  Changed -- Refactored `tag_gallery.py` for modular tagging architecture. -- Centralized configuration paths to `/src/importer/config/` for easier project-wide access. - -### 🧹 Maintenance -- Improved exception handling for missing or malformed tag dictionaries. -- Added consistent emoji/logging system across CLI commands. - ---- - -## [v0.3.1] β€” CLI Polishing & Dictionary Improvements (2025-10-19) - -### ✨ Added -- Introduced **CLI argument parsing** with `argparse` for a unified user interface. -- Added `--verbose` flag for detailed debugging output. -- Added **metadata validation** to ensure all tag dictionaries contain unique keywords. - -### πŸ›  Changed -- Adjusted internal path resolution to work from both installed and development environments. -- Improved `load_all_tag_maps()` with caching and better error resilience. - -### 🧹 Maintenance -- Cleaned duplicate mappings within YAML files. -- Improved documentation and inline docstrings throughout importer modules. - ---- - -## [v0.3.2] β€” TPDB Bridge Integration (2025-10-21) - -### ✨ Added -- Introduced **`tpdb_bridge.py`** for importing performer data from *ThePornDB* API. -- Added local **SQLite performer database** under `/src/importer/db/performers.db`. -- Added commands: - - `fetch` β€” Import performers in a single batch. - - `fill-index` β€” Continuously pull until a limit is reached. - - `enrich` β€” Fetch and merge extended performer metadata. - - `sync-all` β€” Hybrid incremental fetch + enrich loop. -- Introduced **local API key management** using `tpdb_api_key.txt` under `/secrets/`. - -### 🧹 Maintenance -- Verified importer against TPDB rate limits and ensured safe error recovery. -- Added initial test data exports to `/src/importer/reports/`. - ---- - -## [v0.3.3] β€” YAML Tag Inference Update (2025-10-20) - -### ✨ Added -- Dynamic **YAML tag dictionary loader** for modular tag categories. -- Introduced **automatic source inference** for common networks. -- Added **`refresh-all`** bulk operation to reapply tag inference globally. - -### πŸ›  Changed -- Refactored `infer_tags()` to merge results from multiple YAML files dynamically. -- Enhanced progress and summary reporting for tag inference. - -### 🧹 Maintenance -- Fixed `AttributeError: 'int' object has no attribute 'lower'` when parsing numeric YAML values. -- Standardized internal naming conventions. - ---- - -## [v0.3.4] β€” Tag Dictionary Validation & Cleanup (2025-10-20) - -### ✨ Added -- **`validate-tags`** CLI command for verifying YAML tag dictionaries. - - Detects duplicates, empty entries, and conflicting keywords. - - Outputs detailed summaries with per-keyword conflict listings. - -### πŸ›  Changed -- Standardized YAML structure enforcement (consistent key capitalization and layout). -- Added human-readable validation summaries. - -### 🧹 Maintenance -- General code cleanup and consistent logging system updates. - ---- - -## [v0.3.5] β€” Tag Statistics & Unified CLI Update (2025-10-20) - -### ✨ Added -- **Tag Statistics System** - - Introduced `tag-stats` command to generate frequency analytics across all gallery metadata. - - Produces both console summaries and saved reports: - - `reports/tag_stats.json` β€” JSON-formatted tag counts. - - `reports/tag_stats_sorted.txt` β€” human-readable ranked list. -- **Unified CLI Interface (`cli.py`)** - - Consolidated all tagging and maintenance operations into a single entrypoint: - - `refresh-all`, `refresh-one`, `validate-tags`, `tag-stats`, `list`, `list-tags`, `add`, `remove`, `add-multi`, `show-metadata`, `source` - - Standardized command syntax and output formatting across all operations. - -### πŸ›  Changed -- Centralized tag frequency logic into `tag_gallery.py`. -- Refactored CLI dispatch system for scalability and better error handling. -- Standardized output style (headers, dividers, alignment). - -### 🧹 Maintenance -- Automatic creation of `/src/importer/reports/` when missing. -- Verified all tag operations across 60+ galleries. -- Unified terminology and capitalization across CLI help text and docstrings. - -### 🧭 Next Steps -- Add color-coded CLI output for readability. -- Implement `--export-csv` flag for `tag-stats` output. -- Begin roadmap for **v0.4.0** introducing ML-based tag confidence scoring and category weighting. - ---- - -## [v0.3.6] β€” Enrichment Verification & Freshness Tracking (2025-10-26) - -### ✨ Added -- **verify-enrichment command** - - Scans performer database for missing metadata (e.g., `url`, `last_updated`). - - Reports enriched vs incomplete entries, with preview via `--show-missing`. -- **Freshness tracking** - - Displays oldest and most recent enrichment timestamps. - - Warns if data is older than the freshness threshold. -- **Automatic TPDB key validation** - - Checks for valid API key and provides setup help if missing. - -### πŸ›  Changed -- Enrichment logic now guarantees `url` and `last_updated` fields for all performers. -- Improved emoji-based CLI logs for clarity. -- CLI outputs enrichment stats after each batch during `sync-all`. - -### 🧹 Maintenance -- Cleanup and refactor of `tpdb_bridge.py` for readability and modular design. -- Verified completeness: **5,087 performers enriched** and up to date. -- Improved sleep timing and network error recovery during long sync runs. - -### 🧭 Next Steps -- Add `--stale-days` CLI flag for user-defined freshness thresholds. -- Implement automatic enrichment scheduling via cron or systemd. -- Add shortcut alias `porndex-importer verify` for database status checks. - ---- - -[v0.3.7] β€” Scene-Based Enrichment & Channel Auto-Upgrade (2025-10-26) -✨ Added - -Scene-based enrichment system - -New flag --use-scenes enables intelligent inference of performer studios/channels using recent scene data from ThePornDB. - -Automatically scans /performers/{id}/scenes for studio, site, or network fields when direct metadata is missing. - -Dynamically upgrades performer entries from β€œUnknown” to valid channel names (e.g., β€œDesire Room”, β€œI Want Clips: Princess Chanel”). - -Enhanced enrichment diagnostics - ---debug-channels now outputs detailed channel inference logs with origin type (e.g., β€œvia scene” or β€œvia performer metadata”). - -Emoji-coded output for improved clarity: - -🎞 Scene-based upgrades - -🎬 Direct metadata - -⚫ Missing channel info - -Progress verification - -verify-enrichment now reports precise completion percentages and lists the most recent 20 upgraded performers. - -πŸ›  Changed - -Enrichment process now performs automatic in-place upgrades of performer_sources without overwriting other fields. - -Optimized query logic to prioritize unverified performers and handle large datasets efficiently. - -Added fine-grained sleep control between API requests to stay compliant with TPDB rate limits. - -🧹 Maintenance - -Refactored enrichment functions for modularity: - -_fetch_studio_from_scenes() introduced for scene scanning. - -Simplified argument handling and enriched exception tracing. - -Verified enrichment stability across 100 performers with 44% successful channel discovery in live test. - -Improved timestamp consistency in verification logs and upgraded database schema resilience. - -[v0.4.2] β€” Unified Importer, ML Pipeline, and Semantic Search (2025-10-27) -✨ Added - -Unified Importer CLI (porndex-importer) - -Replaces legacy multi-script workflow with a single command entrypoint. - -Introduced import, refresh-all, refresh-one, validate-tags, tag-stats, and source subcommands. - -Includes colorized CLI summaries and consistent emoji headers. - -Machine Learning Dataset Builder - -New module: ml/ml_dataset_builder.py - -Generates structured dataset in ML/porndex_dataset.jsonl from all indexed galleries. - -Each record includes title, models, tags, and image paths for hybrid ML ingestion. - -Embedding Generation Module - -Added ml/ml_embeddings.py to create hybrid text + image embeddings. - -Builds per-gallery NPZ files under ML/embeddings/ and a consolidated embeddings_index.jsonl. - -Supports configurable --img-samples and automatic device detection (--device auto). - -Semantic & Strict Search - -search command supports three modes: - -semantic: CLIP + text hybrid cosine similarity (default) - -text: text-only vector space search - -strict: literal match filtering before vector ranking - -Results show top-ranked galleries, confidence scores, and gallery IDs. - -ML Verification Command - -verify confirms index consistency, embedding count, and file integrity. - -Directory Auto-Creation - -Automatically generates ML/embeddings/ and ML/ if missing. - -πŸ›  Changed - -Importer Pipeline Refactor - -Moved all CLI handling into src/importer/cli.py. - -Centralized environment setup and config loading. - -Replaced direct Python script calls with porndex-importer entrypoint. - -Tagging System - -Unified YAML dictionary loading for clothing, acts, body, and context. - -Improved tag inference logging and duplicate suppression. - -Output Formatting - -Standardized headers, dividers, and indentation across all CLI commands. - -Added readable time and path indicators for long-running operations. - -🧹 Maintenance - -Verified full ML dataset build across 150 test galleries (100% JSONL completion). - -Added fallback for empty or missing image lists in dataset builder. - -Improved error handling for partial downloads and interrupted imports. - -Streamlined path resolution for consistent operation across dev and installed modes. - -Updated documentation: - -/docs/CLI_USAGE.md rewritten for v0.4.2. - -/README.md modernized with full project tree and ML pipeline overview. - -🧭 Next Steps - -Begin v0.4.3–v0.5.x roadmap: - -Integrate GroundingDINO + GroundedSAM for visual region detection. - -Implement attribute extraction (gender β†’ ethnicity β†’ clothing). - -Build visual verification tool (ml_dataset_inspector.py). - -Add tag-confidence weighting system. - -Extend TPDB bridge to cross-link enriched performer metadata into ML training records. - -🧩 Summary of Current State (as of v0.4.2) - -βœ… Fully unified CLI under porndex-importer -βœ… Stable YAML tagging + validation -βœ… Complete ML dataset and embedding generation workflow -βœ… Working hybrid semantic search -βœ… Verified 150-gallery dataset index - -Β© 2025 Leak Technologies β€” Porndex Importer Project \ No newline at end of file +File: docs/CHANGELOG.md +Version: v0.3.4 +Last updated: November 2025 +Maintainer: Leak Technologies +Project: Goondex + +------------------------------------------------------------ +πŸ“œ Goondex β€” Full Changelog +------------------------------------------------------------ +Repository: Leak Technologies +Branch: main +Version Line: v0.3.x Development Cycle + +------------------------------------------------------------ +[v0.3.4] β€” Tagging Logic & Documentation Overhaul (2025-11-07) +------------------------------------------------------------ + +πŸ›  Fixes +- Rebuilt tag parsing and inference logic for PornPics galleries. +- Prevented accidental tag overwrites during gallery refreshes. +- Improved duplicate tag merging, case normalization, and category alignment. + +πŸ“˜ Documentation +- Added `TAGGING.md` with full explanation of YAML tag inference. +- Added `ARCHITECTURE.md` outlining Goondex’s module hierarchy. +- Added `GALLERIES.md` detailing folder naming and metadata schema. +- Updated `README.md`, `ROADMAP.md`, and `BRANDING.md` to match v0.3.4 project direction. + +🧹 Maintenance +- Added `--show-tags` debug flag to CLI for verifying tag inference results. +- Improved path handling, error catching, and overall refresh stability. +- Normalized tag capitalization across YAML dictionaries and output summaries. + +🧭 Next Steps +- Implement category weighting for tag relevance scoring. +- Introduce interactive tag inspector CLI (`goondex inspect-tags`). +- Begin work on validation of inferred vs manual tag consistency. + +------------------------------------------------------------ +[v0.3.3] β€” Stable CLI Alias & Import Path Fix (2025-11-06) +------------------------------------------------------------ + +✨ Added +- Added unified `goondex` CLI alias for Fish, Bash, and Zsh environments. +- Introduced `--help` and `--version` flags with consistent colorized output. +- Improved help text readability and formatting for all commands. + +πŸ›  Changed +- Standardized all imports to `src.importer.*` format for compatibility. +- Updated subprocess calls to use `src.importer.tag_gallery`. +- Simplified alias setup scripts under `/src/utils/install_alias.sh` and `/src/utils/install_alias.fish`. + +🧹 Maintenance +- Rebuilt virtual environment under Python 3.13. +- Verified clean execution of CLI across all supported shells. +- Confirmed consistent `goondex --version` and `goondex --help` output. + +------------------------------------------------------------ +[v0.3.2-rebuild] β€” Repository Cleanup & Stabilization (2025-11-02) +------------------------------------------------------------ + +✨ Added +- Introduced `.gitignore` to exclude gallery media and ML assets. +- Added `VERSION` file for synchronized CLI and metadata versioning. +- Fixed Fish-shell virtualenv activation behavior. + +🧹 Maintenance +- Removed redundant commits and outdated files from older Porndex lineage. +- Normalized repository structure to a clean, modular state. +- Established new base branch for Goondex v0.3.x development. + +------------------------------------------------------------ +[v0.3.1] β€” CLI Polishing & Internal Improvements (2025-10-19) +------------------------------------------------------------ + +✨ Added +- Added unified CLI argument parsing with `argparse`. +- Introduced verbose mode for debugging (`--verbose`). + +πŸ›  Changed +- Improved internal path resolution for dev vs installed modes. +- Enhanced YAML loader fault tolerance and caching. + +🧹 Maintenance +- Cleaned redundant imports and improved logging consistency. +- Added internal docstrings for importer functions. + +------------------------------------------------------------ +[v0.3.0] β€” Goondex Framework Foundation (2025-10-18) +------------------------------------------------------------ + +✨ Added +- Established base project structure under `/src/importer/`. +- Implemented initial gallery importer for PornPics.com. +- Introduced modular YAML tag dictionary system. +- Added basic CLI commands: + - `import <url>` + - `refresh-all` + - `refresh-one` + - `validate-tags` + - `tag-stats` + - `list`, `list-tags`, `add`, `remove`, `show-metadata`, `source set` + +πŸ›  Changed +- Reorganized importer modules for clarity and testability. + +🧹 Maintenance +- Set up `docs/`, `reports/`, and `assets/` directories. +- Created initial `CHANGELOG.md` and `README.md`. + +------------------------------------------------------------ +πŸ“¦ Legacy Development β€” Porndex Importer (2024–2025) +------------------------------------------------------------ +_The following section documents the earlier development cycle +that led to the creation of Goondex. The system used a different +tagging and metadata architecture before the full rebuild in +late 2025. These entries remain for historical and archival +purposes only._ + +------------------------------------------------------------ +[v0.2.x] β€” TPDB Integration & Enrichment Phase (2025-04 β†’ 2025-10) +------------------------------------------------------------ +- Integrated ThePornDB API for performer enrichment. +- Added `fetch`, `fill-index`, `enrich`, and `sync-all` commands. +- Created SQLite database for performer metadata. +- Added automatic API key validation and freshness checks. +- Verified thousands of enriched performers with timestamp logging. + +------------------------------------------------------------ +[v0.1.x] β€” Modular Tagging System Prototype (2024-12 β†’ 2025-03) +------------------------------------------------------------ +- Introduced first YAML-based tag dictionaries for clothing, acts, and body type. +- Implemented prototype tag inference pipeline using keyword heuristics. +- Added basic CLI interface for gallery tagging and metadata refresh. +- Created `refresh-all`, `refresh-one`, and `validate-tags` operations. +- Implemented early tag frequency statistics and conflict validation. + +------------------------------------------------------------ +[v0.0.x] β€” PornPics Importer Foundations (2024) +------------------------------------------------------------ +- Built initial gallery importer for PornPics.com. +- Implemented threaded image downloading with metadata.json output. +- Added local caching and source indexing system. +- Developed basic tag extraction based on gallery titles and captions. +- Established early directory structure under `/src/importer/`. + +------------------------------------------------------------ +🧩 Legacy Summary +------------------------------------------------------------ +The Porndex Importer laid the groundwork for gallery parsing, +basic tagging, and performer enrichment, but its architecture +was replaced by the more robust, modular Goondex framework in +October–November 2025. Goondex introduces a new YAML-based +tagging model, a cleaner CLI, and improved documentation +standards across all modules. + +------------------------------------------------------------ +Β© 2025 Leak Technologies β€” Goondex Project +------------------------------------------------------------ diff --git a/docs/CLI_USAGE.md b/docs/CLI_USAGE.md index a25aa38..6f60e72 100644 --- a/docs/CLI_USAGE.md +++ b/docs/CLI_USAGE.md @@ -1,172 +1,184 @@ -# 🎩 PornPics Importer β€” CLI Usage Guide -### Version 0.4.2 β€” Import, Auto-Tag & ML Integration +File: docs/CLI_USAGE.md +Version: v0.4.2 +Last updated: November 2025 +Maintainer: Leak Technologies +Project: Goondex ---- +------------------------------------------------------------ +Goondex CLI Usage Guide +------------------------------------------------------------ -## πŸ“¦ Overview +Purpose: +Provide a full command reference for importing, tagging, validating, and searching PornPics galleries using the Goondex command-line interface. -Tooling to: +------------------------------------------------------------ +Overview +------------------------------------------------------------ -- Import & refresh PornPics.com galleries -- Auto-tag via YAML dictionaries and unified CLI -- Manage sources/tags and gallery index -- Build ML datasets & hybrid (text+image) embeddings -- Run semantic / strict search over your library +The Goondex CLI provides a unified workflow to: + +1. Import and refresh PornPics galleries +2. Automatically tag galleries using YAML dictionaries +3. Manage sources and metadata through a single command entrypoint +4. Generate statistics and validation reports +5. Build and search machine learning datasets (hybrid text + image) Project root: ~/Projects/PD/PornPics_Importer/Porndex_PornpicsImporter/ -yaml -Copy code +------------------------------------------------------------ +1. Importing Galleries +------------------------------------------------------------ ---- +Quick Import (preferred): +goondex import "https://www.pornpics.com/galleries/<gallery-id>/" -## 🧬 1) Importing Galleries +Process: +- Creates a new folder in Galleries/<timestamp>_<models>_<title>/ +- Downloads all images (threaded) +- Saves metadata.json +- Auto-tags the gallery using refresh-one +- Rebuilds the global index -### Quick Import (preferred) - -```bash -porndex-importer import "https://www.pornpics.com/galleries/<gallery-id>/" -What happens: - -Saves to Galleries/<timestamp>_<models>_<title>/ - -Downloads images (threaded) and writes metadata.json - -Auto-tags the gallery (refresh-one) and rebuilds the index - -Prints a colorized gallery summary - -Legacy (direct script) -bash -Copy code +Legacy method: python src/importer/gallery_importer.py "https://www.pornpics.com/galleries/<gallery-id>/" -πŸ” 2) Refreshing Metadata -Refresh all galleries -bash -Copy code + +------------------------------------------------------------ +2. Refreshing Metadata +------------------------------------------------------------ + +Refresh all galleries: python src/importer/gallery_importer.py --refresh-all -Re-fetches metadata for every gallery with source_url -Merges fields (preserves local tags) +Function: +- Re-fetches metadata for every gallery that has a source_url +- Merges new fields without overwriting local tags +- Automatically re-applies tag inference +- Rebuilds Galleries/index.json -Auto-reapplies tag inference +------------------------------------------------------------ +3. Tag Management +------------------------------------------------------------ -Updates Galleries/index.json +Unified syntax: +goondex <command> [args...] -🎟️ 3) Tag Management (via unified CLI) -bash -Copy code -porndex-importer <command> [args...] -Common commands: +Common operations: +refresh-all β†’ refresh tags for all galleries +refresh-one "<folder>" β†’ refresh tags for a single gallery +validate-tags β†’ validate YAML tag dictionaries +tag-stats β†’ generate frequency report (saved to src/importer/reports) +list β†’ list all galleries +list-tags "<folder>" β†’ show tags for one gallery +add "<folder>" "Tag" β†’ add a tag manually +remove "<folder>" "Tag" β†’ remove a tag manually +add-multi "<folder>" "Tag1,Tag2" β†’ add multiple tags at once +show-metadata "<folder>" β†’ view metadata.json content +source "<folder>" set "Source" β†’ set a single source +source bulk set "Source" β†’ set the same source for all galleries -Action Command -Refresh all porndex-importer refresh-all -Refresh one porndex-importer refresh-one "<folder>" -Validate YAML dictionaries porndex-importer validate-tags -Tag statistics (reports to /src/importer/reports) porndex-importer tag-stats -List galleries porndex-importer list -List tags (one) porndex-importer list-tags "<folder>" -Add tag porndex-importer add "<folder>" "TagName" -Remove tag porndex-importer remove "<folder>" "TagName" -Add multiple porndex-importer add-multi "<folder>" "Tag1,Tag2" -Show metadata porndex-importer show-metadata "<folder>" -Set source (single) porndex-importer source "<folder>" set "Brazzers" -Set source (bulk) porndex-importer source bulk set "PornPics" +Tag inference uses YAML dictionaries stored under: +src/importer/tag_dictionaries/ -Tag inference uses YAML dictionaries under src/importer/tag_dictionaries/ (clothing, acts, body, context, etc.). +------------------------------------------------------------ +4. TPDB Performer Bridge (optional) +------------------------------------------------------------ -🧫 4) TPDB Performer Bridge (optional) -bash -Copy code +Command: python -m performers.tpdb_bridge <cmd> [flags] -Highlights: -check-key, fetch, fill-index, enrich, sync-all +Common flags: +check-key, fetch, fill-index, enrich, sync-all +list-sources, add-source, delete-source +verify-enrichment --export-json -list-sources, add-source, delete-source +Database: +src/importer/db/performers.db -verify-enrichment --export-json +Reports: +src/importer/reports/ -Stores DB at src/importer/db/performers.db and reports to src/importer/reports/. +------------------------------------------------------------ +5. Example Workflow +------------------------------------------------------------ -βš™οΈ 5) Example Workflow -bash -Copy code -# Import a gallery -porndex-importer import "https://www.pornpics.com/galleries/<id>/" +Import a gallery: +goondex import "https://www.pornpics.com/galleries/<id>/" -# Refresh tags for one folder (if you edited metadata) -porndex-importer refresh-one "<folder-name>" +Refresh tags for one folder: +goondex refresh-one "<folder-name>" -# Validate YAML dictionaries -porndex-importer validate-tags +Validate YAML dictionaries: +goondex validate-tags -# Build tag stats -porndex-importer tag-stats -πŸ€– 6) Machine Learning (ML) Pipeline -Build dataset (reads from Galleries/, no file moves) -bash -Copy code +Generate tag statistics: +goondex tag-stats + +------------------------------------------------------------ +6. Machine Learning (ML) Pipeline +------------------------------------------------------------ + +Dataset builder: python -m ml.ml_dataset_builder -Creates ML/porndex_dataset.jsonl with records: -json -Copy code +Creates file: +ML/porndex_dataset.jsonl + +Example entry: { "gallery_id": "...", "title": "...", "models": ["..."], "tags": ["..."], "categories": ["..."], - "image_paths": [".../Galleries/.../images/001.jpg", "..."] + "image_paths": [".../Galleries/.../001.jpg"] } -Build hybrid embeddings (text + image) -bash -Copy code + +Build hybrid embeddings: python -m ml.ml_embeddings build --img-samples 8 --device auto + Outputs: - -ML/embeddings/<gallery_id>.npz (text / image / combined vectors) - +ML/embeddings/<gallery_id>.npz ML/embeddings_index.jsonl -Search (semantic / text / strict) -bash -Copy code -# Hybrid semantic (default) -python -m ml.ml_embeddings search "japanese redhead creampie" - -# Text-only space -python -m ml.ml_embeddings search "japanese redhead creampie" --index text - -# Strict keyword pre-filter (title/tags must include all tokens) +Search modes: +python -m ml.ml_embeddings search "japanese redhead creampie" +python -m ml.ml_embeddings search "japanese redhead creampie" --index text python -m ml.ml_embeddings search "interracial bbc" --mode strict -Verify -bash -Copy code + +Verify embedding integrity: python -m ml.ml_embeddings verify -πŸ—‚οΈ 7) Data Locations -Path Purpose -Galleries/ Imported galleries (images + metadata) -Galleries/index.json Library index -src/importer/reports/ Tag stats & TPDB reports -ML/porndex_dataset.jsonl ML dataset source -ML/embeddings/ NPZ vectors -ML/embeddings_index.jsonl Search index -🧭 8) Roadmap (post-v0.4.2) -GroundingDINO + Grounded-SAM for localized detections (people, clothing) +------------------------------------------------------------ +7. Data Locations +------------------------------------------------------------ -Attribute heads for gender β†’ ethnicity β†’ clothing brand (e.g., socks) +Galleries/ β†’ imported galleries and images +Galleries/index.json β†’ master index of all galleries +src/importer/reports/ β†’ YAML validation and statistics reports +ML/porndex_dataset.jsonl β†’ ML dataset definition +ML/embeddings/ β†’ embedding vector files +ML/embeddings_index.jsonl β†’ search index for semantic lookups -Active-learning loop to leverage existing metadata as weak labels +------------------------------------------------------------ +8. Roadmap (post-v0.4.2) +------------------------------------------------------------ -🧾 Notes -All commands are local/offline friendly. +- Integrate GroundingDINO + Grounded-SAM for localized object detection +- Add attribute heads for gender, ethnicity, and clothing +- Develop an active-learning loop to refine weakly-labeled data +- Introduce interactive tag editor for review and correction -Rebuilding dataset/embeddings is safe and idempotent. +------------------------------------------------------------ +Notes +------------------------------------------------------------ -Importer auto-tags on import/refresh using the YAML dictionaries. +All commands operate locally and offline. +Rebuilding datasets and embeddings is safe and idempotent. +Importer auto-tags new galleries using YAML dictionaries by default. +All modules adhere to the clean modular design outlined in ARCHITECTURE.md. +Versioned documentation ensures clarity between CLI and code versions. -Author: Leak Technologies β€’ License: MIT (internal research) \ No newline at end of file +------------------------------------------------------------ +End of File +------------------------------------------------------------ diff --git a/docs/GALLERIES.md b/docs/GALLERIES.md new file mode 100644 index 0000000..f4a2e3f --- /dev/null +++ b/docs/GALLERIES.md @@ -0,0 +1,197 @@ +File: docs/GALLERIES.md +Version: v0.3.4 +Last updated: November 2025 +Maintainer: Leak Technologies +Project: Goondex + +------------------------------------------------------------ +Goondex Gallery Structure and Metadata Specification +------------------------------------------------------------ + +Purpose: +Define how galleries are stored, named, and structured within the Goondex system. +This document standardizes the folder layout, metadata schema, and indexing process to ensure all galleries are consistent and compatible with importer, tagger, and ML modules. + +------------------------------------------------------------ +1. Directory Structure +------------------------------------------------------------ + +All galleries are stored under: +~/Projects/PD/Goondex/Galleries/ + +Each imported gallery is placed in its own folder: +Galleries/<timestamp>_<model(s)>_<short_title>/ + +Example: +Galleries/20251106_1032_Mariella_Sun_Takes_A_Shower/ + +Inside each gallery folder: +metadata.json β†’ Core metadata record +001.jpg, 002.jpg, ... β†’ Sequentially numbered images +failed_downloads.json (opt.) β†’ List of images that failed to download +thumbnail.jpg (future) β†’ Designated cover image for previews + +A global index is maintained at: +Galleries/index.json + +------------------------------------------------------------ +2. Naming Convention +------------------------------------------------------------ + +The importer automatically generates folder names using: +<timestamp>_<models>_<shortened_title> + +Rules: +- Timestamp format: YYYYMMDD_HHMM (UTC local) +- Model names separated by underscores +- Title truncated to 40 characters max +- Illegal filesystem characters replaced with underscores +- Spaces are converted to underscores + +This ensures folder names remain unique, sortable, and descriptive. + +------------------------------------------------------------ +3. Metadata Specification (metadata.json) +------------------------------------------------------------ + +Each gallery includes a metadata.json file containing descriptive fields. + +Example: +{ + "title": "Mariella Sun Takes A Shower", + "models": ["Mariella Sun"], + "categories": ["Amateur", "Shower", "Solo"], + "tags": ["Blonde", "Teen", "Wet", "Shower", "Outdoor"], + "inferred_tags": ["Amateur", "Solo", "Wet"], + "source_url": "https://www.pornpics.com/galleries/12345/", + "source": { "network": "PornPics", "channel": null }, + "views": 5421, + "rating": 4.8, + "image_count": 52, + "image_urls": [ + "https://cdn.pornpics.com/2025/11/12345_001.jpg", + "https://cdn.pornpics.com/2025/11/12345_002.jpg" + ], + "import_path": "~/Projects/PD/Goondex/Galleries/20251106_1032_Mariella_Sun_Takes_A_Shower", + "last_refreshed": "2025-11-06T15:40:21Z" +} + +------------------------------------------------------------ +4. Field Definitions +------------------------------------------------------------ + +title +β†’ Human-readable title as extracted from the source site. + +models +β†’ List of performer names detected or scraped from metadata. + +categories +β†’ Source site’s categorical labels (if available). + +tags +β†’ All user and inferred tags combined. + +inferred_tags +β†’ Tags automatically added by the tag_gallery.py module. + +source_url +β†’ The original URL used for import. + +source +β†’ Object with optional "network" and "channel" fields. + Example: { "network": "PornPics", "channel": null } + +views +β†’ Scraped view count from the source (if available). + +rating +β†’ Normalized 0–5 rating (float). + +image_count +β†’ Number of valid images downloaded. + +image_urls +β†’ Full list of image URLs (for re-download or verification). + +import_path +β†’ Absolute path where this gallery is stored locally. + +last_refreshed +β†’ ISO 8601 timestamp marking last metadata update. + +------------------------------------------------------------ +5. Index File +------------------------------------------------------------ + +Galleries/index.json is rebuilt after each import or refresh operation. +It includes essential details for quick CLI lookups and searches. + +Example structure: +{ + "galleries": [ + { + "folder": "20251106_1032_Mariella_Sun_Takes_A_Shower", + "title": "Mariella Sun Takes A Shower", + "models": ["Mariella Sun"], + "tags": ["Blonde", "Teen", "Shower"], + "source": "PornPics", + "image_count": 52, + "last_refreshed": "2025-11-06T15:40:21Z" + } + ] +} + +------------------------------------------------------------ +6. Refresh and Rebuild Process +------------------------------------------------------------ + +When running: +goondex import <url> +β†’ Imports gallery, creates metadata.json, downloads images, auto-tags. + +goondex refresh-one <folder> +β†’ Re-runs tag inference and updates metadata fields. + +goondex refresh-all +β†’ Applies inference and updates to all galleries under Galleries/. + +After any import or refresh, index_builder.py: +- Scans all folders for metadata.json +- Builds Galleries/index.json +- Removes stale entries +- Reports summary to console + +------------------------------------------------------------ +7. Cache and Error Handling +------------------------------------------------------------ + +failed_downloads.json +β†’ Written if any images fail to download. +β†’ Contains URL and error message for each failure. + +Re-importing a gallery with the same title merges metadata: +- Local tags are preserved +- Missing fields (views, rating, etc.) are filled +- Downloaded images are skipped if already present + +------------------------------------------------------------ +8. Future Enhancements (planned for v0.4.x) +------------------------------------------------------------ + +- thumbnail.jpg generation from first image +- Gallery-level previews for upcoming Web UI +- Extended metadata (dominant colours, detected subjects) +- Support for multi-site import with normalized schema + +------------------------------------------------------------ +9. Developer Notes +------------------------------------------------------------ + +Do not manually rename or move gallery folders after import. +Always rebuild the index via CLI if metadata is manually edited. +Each metadata.json should remain human-readable and formatted with indent=4. + +------------------------------------------------------------ +End of File +------------------------------------------------------------ diff --git a/docs/HISTORY.md b/docs/HISTORY.md new file mode 100644 index 0000000..1082fc1 --- /dev/null +++ b/docs/HISTORY.md @@ -0,0 +1,76 @@ +File: docs/HISTORY.md +Version: v0.3.4 +Last updated: November 2025 +Maintainer: Leak Technologies +Project: Goondex + +------------------------------------------------------------ +πŸ“– Goondex β€” Project History +------------------------------------------------------------ + +### Overview +Goondex is the modern evolution of the former **Porndex Importer**, a Python-based gallery indexing and tagging system originally focused on PornPics.com. +Between 2024 and 2025, the project underwent a complete rebuild to address technical debt, unify its codebase, and formalize documentation standards under the Leak Technologies ecosystem. + +This file provides a concise historical overview of that transition, outlining the legacy systems, lessons learned, and the motivations behind the current Goondex architecture. + +------------------------------------------------------------ +πŸ•°οΈ 2024 β€” Origins: PornPics Importer +------------------------------------------------------------ +The earliest version of the system, built throughout 2024, began as a lightweight tool for automatically downloading and organizing PornPics galleries. +It featured: +- Threaded image fetching with metadata export to `metadata.json`. +- Simple folder-based organization by model and network. +- Prototype keyword-based tagging based solely on gallery titles. + +While functional, it lacked flexibility, configuration, and reliable error recovery. +This groundwork eventually became the foundation for Porndex. + +------------------------------------------------------------ +βš™οΈ 2024–2025 β€” Porndex Importer Era +------------------------------------------------------------ +The project expanded rapidly under the name **Porndex**, introducing YAML-based tag dictionaries and the first CLI-driven workflows. +Porndex introduced: +- Early modular tagging using keyword dictionaries. +- Performer metadata enrichment through ThePornDB API. +- SQLite-backed performer database for indexing and updates. +- Validation commands for tag consistency and statistics reporting. + +By mid-2025, the system had grown in complexity and scope, but the architecture was increasingly brittle. +The need for a modular, self-contained, and shell-friendly design became clear β€” leading to the creation of **Goondex**. + +------------------------------------------------------------ +πŸš€ Late 2025 β€” The Goondex Rebuild +------------------------------------------------------------ +In October 2025, the codebase was completely restructured and relaunched as **Goondex**. +This marked the transition from an experimental importer to a formalized, maintainable platform. + +Key advancements introduced in the Goondex rebuild: +- **Unified CLI:** A single entrypoint (`goondex`) for all operations. +- **YAML Tagging Framework:** Refined dictionaries for acts, clothing, and body descriptors. +- **Improved Error Handling:** Safe path operations and better exception tracing. +- **Cross-Shell Compatibility:** Alias scripts for Fish, Bash, and Zsh environments. +- **Documentation Suite:** Full set of Markdown docs β€” `ARCHITECTURE.md`, `TAGGING.md`, `GALLERIES.md`, `ROADMAP.md`, and `BRANDING.md`. + +The rebuild also focused on subtlety and modular design β€” retaining the underlying functionality of the PornPics importer while shedding the β€œPorndex” branding in favor of a more neutral, system-oriented identity. + +------------------------------------------------------------ +🧭 Present β€” Goondex v0.3.x Development Line +------------------------------------------------------------ +The v0.3.x cycle focuses on: +- Robust tagging accuracy and metadata stability. +- Clean CLI interface and cross-environment consistency. +- Proper documentation, logging, and version traceability. +- Preparing the foundation for future ML-assisted tagging modules. + +As of **v0.3.4 (November 2025)**, Goondex features a fully functional tagging system, stable CLI aliasing, and a clearly documented repository structure. + +------------------------------------------------------------ +🧩 Legacy Acknowledgement +------------------------------------------------------------ +Goondex owes its foundation to the original Porndex Importer developed in 2024–2025. +While the old tagging and enrichment systems are no longer active, their core ideas continue to influence Goondex’s modern design philosophy β€” emphasizing modularity, transparency, and resilience. + +------------------------------------------------------------ +Β© 2025 Leak Technologies β€” Goondex Project +------------------------------------------------------------ diff --git a/docs/LICENSE b/docs/LICENSE new file mode 100644 index 0000000..975fb80 --- /dev/null +++ b/docs/LICENSE @@ -0,0 +1,69 @@ +File: LICENSE +Version: v0.3.4 +Last updated: November 2025 +Maintainer: Leak Technologies +Project: Goondex + +------------------------------------------------------------ +Goondex Project License +------------------------------------------------------------ + +Copyright (c) 2025 Leak Technologies +All rights reserved. + +Developed and maintained by Stu Leak and contributors. +Goondex is a locally hosted research and archival utility designed for automated metadata analysis, machine learning dataset preparation, and intelligent gallery indexing. + +------------------------------------------------------------ +Permission and Usage +------------------------------------------------------------ + +1. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the β€œSoftware”), to use, copy, modify, and merge copies of the Software for **personal, educational, or research purposes only**, subject to the following conditions: + + - The Software **must not** be sold, sublicensed, or distributed commercially. + - The Software **must not** be used for or in connection with commercial adult entertainment platforms or profit-seeking ventures. + - All redistributions or modifications must retain this notice in full. + - Attribution to β€œLeak Technologies” must remain intact in all derivative works. + +2. The Software may contain open-source components licensed under their respective terms. + Users are responsible for complying with any additional conditions imposed by such third-party licenses. + +------------------------------------------------------------ +Limitations +------------------------------------------------------------ + +- The Software is provided strictly for **archival and research** use. +- Leak Technologies assumes **no liability** for misuse, redistribution, or any legal consequences resulting from the use of the Software. +- The Software is **not intended for production deployment** on public servers or within commercial frameworks. +- Any attempt to utilize this system for monetized indexing or distribution of copyrighted materials is **expressly prohibited**. + +------------------------------------------------------------ +Warranty Disclaimer +------------------------------------------------------------ + +THE SOFTWARE IS PROVIDED β€œAS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. +IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. + +------------------------------------------------------------ +Ethical Statement +------------------------------------------------------------ + +Goondex operates on the principle of **privacy, consent, and local autonomy**. +It is not a crawler, data harvester, or redistributor. +Users are expected to import only legally obtained material and respect all content ownership rights. + +Any usage that violates laws concerning data ownership, explicit consent, or content distribution invalidates this license. + +------------------------------------------------------------ +Summary +------------------------------------------------------------ + +βœ” Personal, private, or research use β€” permitted +✘ Commercial use, resale, redistribution β€” prohibited +βœ” Modification for local use β€” permitted +✘ Cloud or API resale integration β€” prohibited +βœ” Educational publication citing Goondex β€” permitted with attribution + +------------------------------------------------------------ +End of File +------------------------------------------------------------ diff --git a/docs/README.md b/docs/README.md new file mode 100644 index 0000000..5c929fe --- /dev/null +++ b/docs/README.md @@ -0,0 +1,176 @@ +File: docs/README.md +Version: v0.3.4 +Last updated: November 2025 +Maintainer: Leak Technologies +Project: Goondex + +------------------------------------------------------------ +Goondex β€” PornPics Importer & ML Pipeline +------------------------------------------------------------ + +A modular, documented gallery importer for PornPics.com, forming the foundation of the Goondex ecosystem. +Supports importing, tagging, metadata enrichment, and generation of ML-ready datasets for semantic search and classification. + +------------------------------------------------------------ +1. Project Overview +------------------------------------------------------------ + +Goondex automates the process of: +- Downloading and organizing galleries from PornPics.com +- Generating structured metadata and tag inference +- Enriching galleries via ThePornDB (TPDB) performer API +- Building machine-learning datasets and embeddings +- Enabling semantic, hybrid (text + image) search + +All operations are handled locally β€” no cloud dependencies or external databases are required. +The system is modular, transparent, and designed for research and personal archival use. + +------------------------------------------------------------ +2. Project Structure +------------------------------------------------------------ + +src/ + β”œβ”€β”€ importer/ β†’ Core importer logic and CLI tools + β”‚ β”œβ”€β”€ cli.py β†’ Unified CLI entrypoint (goondex command) + β”‚ β”œβ”€β”€ gallery_importer.py β†’ Gallery parser and downloader + β”‚ β”œβ”€β”€ tag_gallery.py β†’ Tag inference and YAML management + β”‚ β”œβ”€β”€ reports/ β†’ Auto-generated validation and tag stats + β”‚ β”œβ”€β”€ db/ β†’ TPDB performer cache and local databases + β”‚ β”œβ”€β”€ secrets/ β†’ Local-only API keys (ignored by Git) + β”‚ └── tag_dictionaries/ β†’ Modular YAML tag dictionaries + β”‚ + β”œβ”€β”€ ml/ β†’ Machine learning and semantic search + β”‚ β”œβ”€β”€ ml_dataset_builder.py β†’ Builds JSONL dataset for embeddings + β”‚ β”œβ”€β”€ ml_embeddings.py β†’ Generates CLIP + text hybrid vectors + β”‚ β”œβ”€β”€ ml_dataset_inspector.py β†’ (planned) visual dataset viewer + β”‚ └── ml_vision_detector.py β†’ (planned) DINO + SAM visual tagging + β”‚ + β”œβ”€β”€ docs/ β†’ Documentation, changelogs, and brand files + β”œβ”€β”€ tests/ β†’ Unit and integration testing suite + └── assets/ β†’ Static samples and test assets + +------------------------------------------------------------ +3. Environment Setup +------------------------------------------------------------ + +Create a virtual environment and install dependencies: + +bash +python3 -m venv .venv +source .venv/bin/activate +pip install -r requirements.txt + +Set the source path for development: + +bash +export PYTHONPATH=src + +------------------------------------------------------------ +4. Quick Start +------------------------------------------------------------ + +Import a gallery from PornPics: + +bash +goondex import "https://www.pornpics.com/galleries/example-id/" + +Automatically: +- Downloads images and metadata +- Saves to Galleries/<timestamp>_<model>_<title>/ +- Generates metadata.json +- Runs auto-tagging (refresh-one) +- Updates the central gallery index + +------------------------------------------------------------ +5. CLI Overview +------------------------------------------------------------ + +All commands are run via: +goondex <command> [args...] + +Examples: +goondex refresh-all +goondex refresh-one "<folder>" +goondex validate-tags +goondex tag-stats +goondex list-tags "<folder>" +goondex add "<folder>" "TagName" +goondex source bulk set "PornPics" + +The CLI automatically detects YAML tag dictionaries and applies them during refresh or import. + +------------------------------------------------------------ +6. Machine Learning Pipeline +------------------------------------------------------------ + +Build dataset: +bash +python -m ml.ml_dataset_builder + +Output: +ML/porndex_dataset.jsonl + +Each record includes: +{ + "gallery_id": "...", + "title": "...", + "models": ["..."], + "tags": ["..."], + "categories": ["..."], + "image_paths": ["..."] +} + +Build embeddings: +bash +python -m ml.ml_embeddings build --img-samples 8 --device auto + +Output: +ML/embeddings/<gallery_id>.npz +ML/embeddings_index.jsonl + +Search: +bash +python -m ml.ml_embeddings search "asian redhead solo" +Modes: +- semantic (default) β€” hybrid vector cosine similarity +- text β€” text-only search +- strict β€” literal keyword matching + +Verify: +bash +python -m ml.ml_embeddings verify + +------------------------------------------------------------ +7. Development Guidelines +------------------------------------------------------------ + +- Use descriptive variable names and structured commits +- Avoid emojis in code and commit messages +- Always document new features in docs/CHANGELOG.md +- Keep CLI text synchronized with docs/CLI_USAGE.md +- Use version tagging for all major commits + +------------------------------------------------------------ +8. Roadmap Summary +------------------------------------------------------------ + +Stage Feature Description +----------- -------------------------------- ----------------------------- +βœ… v0.3.x Stable CLI & Tagging Unified CLI and YAML cleanup +βš™οΈ v0.4.x ML Embeddings & Dataset Builder Build hybrid vectors for search +⏳ v0.5.x Visual Intelligence DINO + SAM + attribute detection +πŸ”œ v0.6.x Local Web UI Lightweight gallery browser +πŸš€ v1.0.0 Full Stable Release Plugin importers + visual ML tools + +------------------------------------------------------------ +9. Licensing +------------------------------------------------------------ + +License: Research-Use MIT Variant +Author: Leak Technologies +Maintainer: Stu Leak +For personal, non-commercial, and research use only. + +------------------------------------------------------------ +End of File +------------------------------------------------------------ diff --git a/docs/ROADMAP.md b/docs/ROADMAP.md new file mode 100644 index 0000000..5ab3626 --- /dev/null +++ b/docs/ROADMAP.md @@ -0,0 +1,160 @@ +File: docs/ROADMAP.md +Version: v0.3.4 +Last updated: November 2025 +Maintainer: Leak Technologies +Project: Goondex + +------------------------------------------------------------ +Goondex Development Roadmap +------------------------------------------------------------ + +Purpose: +Outline upcoming milestones, version objectives, and long-term development goals for the Goondex ecosystem. +This roadmap provides an overview of feature direction, architectural priorities, and research-driven enhancements. + +------------------------------------------------------------ +1. Project Vision +------------------------------------------------------------ + +Goondex is designed as an automated, privacy-respecting adult content cataloguer, focused on: +- Intelligent tagging and metadata curation +- Machine-learning assisted gallery organization +- Local-first, offline-friendly operation +- Open, modular, and human-readable data formats + +The system evolves through iterative versioning with strong emphasis on stability, transparency, and reproducibility. + +------------------------------------------------------------ +2. Version Milestones +------------------------------------------------------------ + +v0.3.x β€” Consolidation Phase +------------------------------------ +Status: Active +Goals: +- Finalize CLI alias and stable import structure +- Standardize metadata.json schema and YAML dictionaries +- Document all core systems (CLI, Galleries, Tagging, Branding) +- Implement validation tools for dictionaries and index integrity +- Ensure consistency across all module imports (src.importer.*) +- Establish internal branding and developer documentation standards + +v0.4.x β€” Machine Learning Integration +------------------------------------ +Planned Start: December 2025 +Goals: +- Introduce ML dataset builder and embedding engine +- Add hybrid (text + image) search support +- Implement GroundingDINO + Grounded-SAM detection pipeline +- Build attribute heads for ethnicity, gender, and clothing +- Introduce semantic tag inference based on contextual cues +- Develop auto-thumbnail generator for galleries +- Establish foundation for future β€œGoondex ML Core” + +v0.5.x β€” Visual Intelligence and Automation +------------------------------------ +Planned Start: Q1 2026 +Goals: +- Expand ML integration to support local fine-tuning +- Train local model for visual tagging (SAM, CLIP, BLIP2) +- Enable partial face and body region detection +- Add scene clustering (e.g., β€œbathroom scenes”, β€œstudio sets”) +- Improve NLP-based title parsing for better model recognition +- Integrate hybrid similarity search (image-to-gallery) + +v0.6.x β€” Web Interface & UX +------------------------------------ +Planned Start: Q2 2026 +Goals: +- Create lightweight local Web UI for browsing and search +- Add thumbnail preview grid for galleries +- Support filtering by tag, performer, or source +- Allow tag editing via UI (writes to metadata.json) +- Visualize ML embeddings as clusters or heatmaps +- Introduce color-coded category icons based on tag domains + +v0.7.x β€” Multi-Source Expansion +------------------------------------ +Planned Start: Q3 2026 +Goals: +- Add support for multiple import sources (e.g., TheHun, Fapello) +- Normalize cross-site metadata into unified schema +- Introduce per-site tag mappings for source-specific categories +- Develop rate-limiting, retries, and error resilience for scraping +- Expand YAML dictionaries to include new tag categories + +v0.8.x β€” Semantic Intelligence & AI Curation +------------------------------------ +Planned Start: Q4 2026 +Goals: +- Train in-house multimodal model for semantic gallery tagging +- Support β€œsmart tagging” with probabilistic tag confidence +- Implement user feedback learning loop for refinement +- Add multilingual tag inference (English, French, German) +- Develop automatic duplicate detection and merge logic +- Add story-based inference (scene context across images) + +v0.9.x β€” Optimization & Deployment +------------------------------------ +Planned Start: 2027 +Goals: +- Package as standalone application with installer +- Implement database indexing for instant search +- Optimize YAML and JSON parsing for large collections +- Introduce CLI subcommands for advanced maintenance tasks +- Add backup, restore, and migration tools +- Begin Linux packaging (PKGBUILD, Flatpak manifest) + +v1.0.0 β€” Stable Release +------------------------------------ +Planned Start: 2027 +Goals: +- Fully modular architecture with plugin-based importers +- Complete Web UI parity with CLI functionality +- Documented API endpoints for local integrations +- Export system for JSONL / CSV / ML dataset sync +- Full automated test coverage and build pipeline +- Public release of β€œGoondex ML Core” dataset format + +------------------------------------------------------------ +3. Research & Experimental Branches +------------------------------------------------------------ + +ML-Research Branch: +- Embedding fusion experiments (text–image hybrid) +- Visual attribute detection fine-tuning using CLIP variants +- Performance benchmark on local consumer GPUs + +Tag-Lab Branch: +- Dynamic tag clustering using sentence-transformers +- Contextual tagging prototype (scene recognition) +- Human-assisted tag correction feedback loop + +Web-UI Branch: +- Minimalist grid-based gallery explorer +- Tag filters with real-time search +- RESTful interface backed by FastAPI + +------------------------------------------------------------ +4. Long-Term Goals +------------------------------------------------------------ + +- Local inference pipeline fully independent from cloud APIs +- Optional privacy layer for encrypted gallery indexing +- On-device fine-tuning for user-specific preferences +- Extend beyond adult content into broader visual media indexing +- Formalize Goondex Metadata Specification (GMS 1.0) for interoperability + +------------------------------------------------------------ +5. Development Philosophy +------------------------------------------------------------ + +- Local-first: all functions must work offline +- Transparent: all data stored in readable YAML/JSON +- Modular: each subsystem must be independently testable +- Ethical: prioritizes privacy and non-exploitative content handling +- Accessible: written with clear documentation and open interfaces + +------------------------------------------------------------ +End of File +------------------------------------------------------------ diff --git a/docs/TAGGING.md b/docs/TAGGING.md new file mode 100644 index 0000000..f548a26 --- /dev/null +++ b/docs/TAGGING.md @@ -0,0 +1,180 @@ +File: docs/TAGGING.md +Version: v0.3.4 +Last updated: November 2025 +Maintainer: Leak Technologies +Project: Goondex + +------------------------------------------------------------ +Goondex Tagging System Documentation +------------------------------------------------------------ + +Purpose: +Define how Goondex handles tagging, tag inference, YAML dictionaries, and validation. +This document standardizes how tags are generated, stored, and maintained for galleries within the Goondex framework. + +------------------------------------------------------------ +1. Overview +------------------------------------------------------------ + +The tagging system in Goondex provides: +- Automatic tag inference based on keywords, metadata, and categories. +- Human-editable tags for user-defined labeling. +- YAML dictionaries for consistent terminology and modular configuration. +- Validation and reporting tools to prevent duplication or conflicting tags. + +All tagging logic is implemented in: +src/importer/tag_gallery.py +src/importer/tag_utils.py +src/importer/tag_dictionaries/ + +------------------------------------------------------------ +2. Tag Categories +------------------------------------------------------------ + +Tags are divided into modular YAML dictionaries for clarity and maintainability. +Each dictionary focuses on a single thematic domain: + +tag_dictionaries/ + body.yml β†’ physical descriptors (e.g. Blonde, Curvy, Muscular) + acts.yml β†’ sexual acts or positions (e.g. Blowjob, Anal, Doggystyle) + clothing.yml β†’ garments and accessories (e.g. Lingerie, Socks, Latex) + context.yml β†’ settings or environments (e.g. Beach, Office, Shower) + fetish.yml β†’ specific fetish content (e.g. BDSM, Pee Fetish, Bondage) + orientation.yml β†’ sexual orientation or group type (e.g. Straight, Lesbian, Gay) + +All dictionaries share a simple key–value structure: +"keyword": "TagName" + +Example (clothing.yml): +socks: Socks +panties: Panties +lingerie: Lingerie +stockings: Stockings + +------------------------------------------------------------ +3. Tag Inference Logic +------------------------------------------------------------ + +Automatic tagging is handled by infer_tags() in tag_gallery.py. +The system scans text data extracted from metadata.json: + +- title +- categories +- tags (pre-existing) +- source network and channel +- optional inferred fields + +Process: +1. Combine text fields into one lowercase text blob. +2. Search all keywords from every YAML dictionary. +3. For each keyword match, add the corresponding tag. +4. Merge inferred tags with existing manual tags. +5. Save to metadata.json under inferred_tags. + +Example: +Input metadata.title β†’ β€œBusty Blonde Rides Hard” +Detected β†’ ["Busty", "Blonde", "Riding", "Hardcore"] + +The result: +"tags": ["Busty", "Blonde", "Riding", "Hardcore"] + +------------------------------------------------------------ +4. Manual Tagging +------------------------------------------------------------ + +Users can add or remove tags manually through the CLI. + +Examples: +goondex add "<folder>" "Outdoor" +goondex remove "<folder>" "Solo" +goondex add-multi "<folder>" "Amateur, Teen, Shaved" + +Manual tags are stored in metadata.json under "tags". +Inferred tags are stored separately under "inferred_tags" to maintain clarity. + +------------------------------------------------------------ +5. Tag Validation +------------------------------------------------------------ + +Validation ensures all YAML tag dictionaries remain consistent and free of errors. + +Run: +goondex validate-tags + +Checks performed: +- Duplicate keywords within a dictionary +- Conflicting or identical values across multiple dictionaries +- Empty entries or malformed YAML +- Case inconsistencies between similar entries + +Outputs: +src/importer/reports/tag_validation.json +src/importer/reports/tag_conflicts.txt + +CLI Summary Example: +Files loaded: 6 +Keywords total: 421 +Conflicts: 2 +Duplicates: 4 +Empty entries: 0 +[βœ“] Validation finished. + +------------------------------------------------------------ +6. Tag Statistics +------------------------------------------------------------ + +Generate tag frequency statistics across all galleries: +goondex tag-stats + +Outputs: +src/importer/reports/tag_stats.json +src/importer/reports/tag_stats_sorted.txt + +CLI displays top tags and usage counts: +1. Teen 42 +2. Blonde 37 +3. Outdoor 29 +4. Lingerie 25 + +------------------------------------------------------------ +7. Known Limitations +------------------------------------------------------------ + +- Keyword overlap between categories can cause false positives (e.g. β€œEnglish” being inferred from β€œBritish”). +- Contextual interpretation (e.g. β€œWet Hair” vs. β€œWet”) is not yet implemented. +- Case-insensitive matching may include unintended words (e.g. β€œDaddy” vs. β€œdaddy issues”). +- YAML entries are static β€” dynamic NLP inference is planned for v0.5.x. + +------------------------------------------------------------ +8. Best Practices +------------------------------------------------------------ + +- Keep YAML entries lowercase on the left-hand keyword. +- Use concise and consistent tag names on the right-hand side. +- Avoid ambiguous single-word tags (e.g. β€œhot”, β€œnice”, β€œpretty”). +- Run goondex validate-tags before each commit. +- Do not edit inferred_tags manually β€” always refresh via CLI. +- Use add-multi for efficient manual tagging after bulk imports. + +------------------------------------------------------------ +9. Future Enhancements (v0.4.x–v0.5.x) +------------------------------------------------------------ + +- Implement weighted tagging confidence using NLP models. +- Integrate GroundingDINO + SAM for visual tagging assistance. +- Introduce β€œtag confidence scores” to help refine inference reliability. +- Develop cross-source tag normalization for multiple site importers. +- Support user-defined alias groups (e.g. β€œAss” = β€œButt” = β€œBooty”). + +------------------------------------------------------------ +10. Developer Notes +------------------------------------------------------------ + +All tag inference should remain human-readable and reversible. +The YAML system was chosen for transparency and editability. +Tags should serve as both descriptive metadata and ML training features. +Avoid unnecessary expansion β€” focus on clarity and accuracy over volume. + +------------------------------------------------------------ +End of File +------------------------------------------------------------