v0.3.4-docs-update: finalized documentation suite and version file

This commit is contained in:
Team Goon 2025-11-06 14:29:21 -05:00
parent 89027acd21
commit 35142587c5
13 changed files with 1447 additions and 610 deletions

View File

145
README.md
View File

@ -1,145 +0,0 @@
# 🧠 PornPics Gallery Importer (Porndex System)
**Version 0.4.2 — Unified Importer & ML Pipeline**
A modular and well-documented gallery importer for [PornPics.com](https://www.pornpics.com) built for the **Porndex** ecosystem.
Supports importing, tagging, metadata enrichment, and machine learningready dataset generation.
---
## 📂 Project Structure
src/ → Core source
├── importer/ → Gallery importers, tag tools, and TPDB bridge
│ ├── cli.py → Unified CLI (porndex-importer)
│ ├── gallery_importer.py → Gallery parsing/downloading
│ ├── tag_gallery.py → Tag management & YAML dictionaries
│ ├── reports/ → Tag and enrichment summaries
│ ├── db/ → Cached sources & enrichment data
│ ├── secrets/ → API keys and credentials (ignored in Git)
│ └── tag_dictionaries/ → YAML-based tag definitions
├── ml/ → Machine learning modules
│ ├── ml_dataset_builder.py → Build JSONL dataset
│ ├── ml_embeddings.py → Generate CLIP+Text embeddings
│ ├── ml_dataset_inspector.py → Inspect or visualize dataset (planned)
│ └── ml_vision_detector.py → GroundingDINO + SAM integration (planned)
├── docs/ → Documentation & changelogs
├── tests/ → Unit and integration tests
└── assets/ → Static data or sample media
yaml
Copy code
---
## ⚙️ Setup
```bash
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
Then, from the root of the project:
bash
Copy code
export PYTHONPATH=src
🚀 Quick Start
Import a Gallery
bash
Copy code
porndex-importer import "https://www.pornpics.com/galleries/example-gallery-id/"
Automatically:
Downloads images and metadata
Saves to Galleries/<timestamp>_<model>_<title>/
Creates metadata.json
Runs auto-tagging (refresh-one)
Updates library index
🧩 Core Features
Feature Description
Importer Downloads and parses galleries from PornPics
Auto-Tagging Generates tags based on YAML dictionaries
Metadata Refresh Updates all galleries with new metadata
Source Management Track and bulk-update content sources
CLI Tool Unified command: porndex-importer
TPDB Bridge Enrich performers and metadata via ThePornDB API
ML Dataset Builder Generates a unified dataset (JSONL)
Hybrid Embeddings Builds combined CLIP + text vectors for semantic search
🤖 Machine Learning Pipeline
1⃣ Build Dataset
bash
Copy code
python -m ml.ml_dataset_builder
Creates:
bash
Copy code
ML/porndex_dataset.jsonl
Each record includes title, models, tags, and full image paths (no file duplication).
2⃣ Build Embeddings
bash
Copy code
python -m ml.ml_embeddings build --img-samples 8 --device auto
Generates:
bash
Copy code
ML/embeddings/<gallery_id>.npz
ML/embeddings_index.jsonl
Uses:
SentenceTransformer for text
OpenCLIP (ViT-B/32) for images
and produces a combined hybrid vector.
3⃣ Search Your Library
bash
Copy code
# Semantic search (default)
python -m ml.ml_embeddings search "japanese redhead creampie"
# Strict literal search
python -m ml.ml_embeddings search "interracial bbc" --mode strict
4⃣ Verify Integrity
bash
Copy code
python -m ml.ml_embeddings verify
Displays:
Total indexed records
Images sampled
NPZ validation summary
🧠 Development Guidelines
No emojis in code or commits.
Use descriptive variable names.
Commit only verified working features.
Document all new features in docs/CHANGELOG.md.
Keep docs and CLI output in sync with docs/CLI_USAGE.md.
🗺️ Roadmap (v0.4.x → v0.5.x)
Stage Feature Description
✅ ML Embedding Search Hybrid text+image similarity
⚙️ Gender & Ethnicity Detection Person-level classification
⏳ GroundingDINO Integration Object/region localization
⏳ Grounded SAM + BLIP Visual attribute extraction (clothing, actions)
🔜 Active Learning Re-train from gallery metadata and tags
📄 License
MIT — Internal Research Use Only
Author: Leak Technologies

View File

@ -0,0 +1,46 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!-- Created with Inkscape (http://www.inkscape.org/) -->
<svg
width="600"
height="180"
viewBox="0 0 600 180"
version="1.1"
id="svg1"
inkscape:version="1.4.2 (ebf0e940d0, 2025-05-08)"
sodipodi:docname="GOONDEX_logo.svg"
xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
xmlns="http://www.w3.org/2000/svg"
xmlns:svg="http://www.w3.org/2000/svg">
<sodipodi:namedview
id="namedview1"
pagecolor="#505050"
bordercolor="#eeeeee"
borderopacity="1"
inkscape:showpageshadow="0"
inkscape:pageopacity="0"
inkscape:pagecheckerboard="0"
inkscape:deskcolor="#505050"
inkscape:document-units="px"
inkscape:zoom="1.5483025"
inkscape:cx="269.00429"
inkscape:cy="227.02282"
inkscape:window-width="1920"
inkscape:window-height="1080"
inkscape:window-x="0"
inkscape:window-y="0"
inkscape:window-maximized="1"
inkscape:current-layer="layer1" />
<defs
id="defs1" />
<g
inkscape:label="Layer 1"
inkscape:groupmode="layer"
id="layer1">
<path
id="path26"
style="font-size:48px;font-family:'Gmarket Sans';-inkscape-font-specification:'Gmarket Sans';fill:#ff5fa2;fill-opacity:1;stroke-width:0;stroke-linecap:round;paint-order:markers fill stroke"
d="m 89.6071,50.4885 c -23.10416,0 -39.60743,16.69024 -39.60743,39.51149 0,22.82124 16.50328,39.51151 39.60743,39.51151 22.06683,0 39.04336,-15.65212 39.04336,-37.90755 v -3.39592 h -40.6473 v 8.57996 h 30.08354 c -1.13163,13.67388 -12.63408,23.85962 -28.0997,23.85962 -17.6346,0 -30.08354,-12.82442 -30.08354,-30.64762 0,-17.72891 12.44569,-30.45956 29.89167,-30.45956 12.91947,0 23.57566,7.07122 26.68766,16.59581 h 10.75176 C 123.27385,61.70793 108.84487,50.48851 89.6071,50.4885 Z m 83.73122,0 c -22.91553,0 -39.79544,16.69024 -39.79544,39.51149 0,22.82124 16.97586,39.51151 39.89139,39.51151 11.29137,0 21.11695,-4.05219 28.18029,-10.89376 7.08654,6.84157 16.93497,10.89376 28.22632,10.89376 22.91556,0 39.79544,-16.69026 39.79544,-39.51151 0,-22.82124 -16.97196,-39.51149 -39.88752,-39.51149 -11.29138,0 -21.11698,4.05217 -28.18029,10.89376 -7.08655,-6.84159 -16.9388,-10.89376 -28.23019,-10.89376 z m 156.5227,1.32 v 59.3152 L 284.12556,52.28048 h -9.23996 v 75.53498 h 9.89995 V 68.50023 l 45.73537,58.84324 h 9.3359 V 51.80846 Z m 18.51061,0.47198 v 75.53499 h 26.7796 c 26.0276,0 41.1193,-15.1812 41.1193,-37.71954 0,-0.52548 -0.01,-1.04807 -0.027,-1.56558 -0.041,-1.2646 -0.1283,-2.50346 -0.2647,-3.71824 h -0.01 c -2.2059,-19.60648 -16.8839,-32.53165 -40.82,-32.53165 z m 74.6754,0 v 75.53499 h 54.5072 v -8.77182 h -44.6073 V 93.5839 h 40.4593 v -8.77179 h -40.4593 V 61.04843 h 43.7593 v -8.76795 z m 60.6582,0 26.3116,37.34349 -27.2555,38.1915 h 11.5998 l 21.9717,-30.93156 21.8797,30.93156 h 11.7878 l -27.3476,-38.47545 26.4036,-37.05954 h -11.5039 l -21.0277,29.79956 -20.8436,-29.79956 z m -310.36688,7.25996 c 9.05961,0 16.87419,3.48312 22.25184,9.3704 -3.60762,5.99921 -5.63683,13.17524 -5.63683,21.08915 0,7.89949 2.03062,15.06595 5.64066,21.05848 -5.36636,5.89579 -13.15175,9.40112 -22.15972,9.40112 -17.2574,0 -29.98763,-12.73069 -29.98763,-30.4596 0,-17.8232 12.63428,-30.45955 29.89168,-30.45955 z m 56.41046,0 c 17.25739,0 29.98763,12.63635 29.98763,30.45955 0,17.72891 -12.73244,30.45958 -29.89553,30.45958 -9.05209,0 -16.85927,-3.50401 -22.23648,-9.39344 3.59921,-5.99469 5.62531,-13.16204 5.62531,-21.06614 0,-7.91897 -2.04458,-15.09915 -5.67135,-21.10067 5.35263,-5.88029 13.13673,-9.35888 22.19042,-9.35888 z m 128.52272,1.60391 h 17.1637 c 17.2271,0 28.4182,8.55424 30.4864,23.66776 h -23.3339 v 8.77179 h 23.5335 c 0.098,-1.12825 0.1497,-2.29113 0.1497,-3.48797 0,1.19665 -0.052,2.35989 -0.1497,3.48797 -1.4059,16.20741 -12.7883,25.27173 -30.686,25.27173 h -17.1637 z M 201.58388,79.12157 c 1.1334,3.32559 1.74589,6.97798 1.74589,10.87842 0,3.87376 -0.60919,7.50886 -1.73821,10.82471 -1.13003,-3.31585 -1.73825,-6.95095 -1.73825,-10.82471 0,-3.90044 0.60423,-7.55286 1.73057,-10.87842 z m -28.99762,21.55347 c -2.1037,1.2e-4 -3.80849,1.70661 -3.80647,3.81032 9e-5,2.10223 1.70425,3.80637 3.80647,3.80649 2.10221,-1.2e-4 3.80637,-1.70426 3.80651,-3.80649 0.002,-2.10371 -1.70281,-3.8102 -3.80651,-3.81032 z m 56.4642,0 c -2.10372,1.2e-4 -3.80849,1.70661 -3.80651,3.81032 1.3e-4,2.10223 1.70425,3.80637 3.80651,3.80649 2.10222,-1.2e-4 3.80637,-1.70426 3.80649,-3.80649 0.002,-2.10371 -1.70279,-3.8102 -3.80649,-3.81032 z" />
</g>
</svg>

After

Width:  |  Height:  |  Size: 4.5 KiB

142
docs/ARCHITECTURE.md Normal file
View File

@ -0,0 +1,142 @@
File: docs/ARCHITECTURE.md
Version: v0.3.4
Last updated: November 2025
Maintainer: Leak Technologies
Project: Goondex
------------------------------------------------------------
Goondex System Architecture Overview
------------------------------------------------------------
Purpose:
This document outlines the internal structure, key modules, and data flow of Goondex. It defines how importer, tagging, and metadata systems interact, ensuring consistent development practices and clear separation of responsibilities across the codebase.
------------------------------------------------------------
High-Level Overview
------------------------------------------------------------
Goondex is a modular image importer and metadata indexer designed primarily for PornPics galleries.
Its long-term goal is to evolve into a general-purpose adult media cataloguing and tagging framework.
Core functions:
1. Gallery importing and metadata storage
2. Automated tag inference from titles and descriptions
3. Performer and source enrichment (via ThePornDB)
4. Semantic search and ML dataset generation (planned v0.4.x)
5. CLI-driven operations with simple developer alias support
------------------------------------------------------------
Primary Directories
------------------------------------------------------------
src/importer/
cli.py - Main command entrypoint for all user-facing operations.
gallery_importer.py - Handles gallery downloading, metadata management, and index updates.
tag_gallery.py - Manages tagging logic, inference, and validation.
tag_utils.py - Shared utilities for YAML parsing and tag validation.
index_builder.py - Rebuilds gallery index files after import or refresh.
fetch_gallery_metadata.py - Scrapes PornPics galleries for metadata and image URLs.
tpdb_bridge.py - Integrates with ThePornDB API for performer enrichment.
config/ - Contains YAML and JSON config templates for environment paths.
reports/ - Stores generated statistics, tag summaries, and validation logs.
tag_dictionaries/ - Modular YAML tag dictionaries (body, acts, clothing, context).
docs/
BRANDING.md - Defines visual and branding identity.
ARCHITECTURE.md - This file.
CHANGELOG.md - Version history and release notes.
ROADMAP.md - Planned features and milestones.
------------------------------------------------------------
Core Workflow
------------------------------------------------------------
1. Import
- User runs `goondex import <url>` or `python -m src.importer.cli import <url>`.
- The system fetches gallery metadata and image URLs.
- Metadata is saved to disk under Galleries/<timestamp>_<model_name>/.
- Images are downloaded using threaded requests.
2. Tagging
- On import completion, automatic tagging runs via tag_gallery.py.
- Inferred tags are based on YAML dictionaries and keyword matches.
- Users can adjust tags manually or re-run `goondex refresh-one`.
3. Indexing
- After import, index_builder.py rebuilds a global index for CLI listing.
- Index entries include title, models, source, and folder references.
4. Enrichment (optional)
- ThePornDB bridge pulls performer metadata and merges it with local entries.
- Data is stored in a lightweight SQLite database for reusability.
5. Validation
- Tag dictionaries are validated via `goondex validate-tags`.
- Reports are saved to /src/importer/reports/ for long-term tracking.
------------------------------------------------------------
Data Structure
------------------------------------------------------------
Each gallery folder includes:
metadata.json - Core descriptive data and tags.
failed_downloads.json (optional) - Log of skipped or failed images.
inferred_tags - Automatically detected tags stored separately from user edits.
source_url - Original import link for refresh operations.
Metadata fields (core schema):
title
models
categories
tags
image_urls
source_url
views
rating
last_refreshed
------------------------------------------------------------
CLI Commands (as of v0.3.4)
------------------------------------------------------------
import <url> - Import new gallery from PornPics.
refresh-all - Refresh tags for all galleries.
refresh-one <folder> - Refresh tags for one gallery.
validate-tags - Validate YAML tag dictionaries.
tag-stats - Generate tag frequency report.
list - List all galleries.
list-tags <folder> - List tags for one gallery.
add <folder> <tag> - Add a tag manually.
remove <folder> <tag> - Remove a tag manually.
add-multi - Add multiple tags at once.
show-metadata - Display metadata.json contents.
source set - Set or bulk-set gallery source.
------------------------------------------------------------
Planned Evolution
------------------------------------------------------------
v0.4.x
- Introduce ML dataset builder for hybrid text and image embeddings.
- Implement semantic search with CLIP model integration.
- Support multiple site importers beyond PornPics.
- Add confidence scoring for auto-tagging accuracy.
v0.5.x
- Implement Web UI with search, tag filters, and visual gallery grid.
- Introduce local model inference (GroundingDINO + SAM).
- Build API layer for remote clients.
------------------------------------------------------------
Design Philosophy
------------------------------------------------------------
Keep it modular, transparent, and locally maintainable.
Every import should leave a clean, readable data trail.
Avoid hard dependencies — keep Python standard library primary, with only essential external libraries (requests, tqdm, yaml).
All scripts must remain executable from both CLI and within the src context.
Maintain clean commit history and clearly versioned documentation (as done with this file).
------------------------------------------------------------
End of File
------------------------------------------------------------

109
docs/BRANDING.md Normal file
View File

@ -0,0 +1,109 @@
File: docs/BRANDING.md
Version: v0.3.4
Last updated: November 2025
Maintainer: Leak Technologies
Project: Goondex
------------------------------------------------------------
Goondex Branding Guide
------------------------------------------------------------
Purpose:
Define the current visual identity and colour palette for Goondex, including logo interpretation, UI application, and consistency notes. This file is versioned and updated alongside major CLI and UI revisions.
------------------------------------------------------------
Colour Palette
------------------------------------------------------------
Primary Accent: Flamingo Pulse
Hex: #FF5FA2
Usage: Core brand colour used in the logo, buttons, and key highlights.
Secondary Accent: Electric Plum
Hex: #8C2F5C
Usage: Shadow tone and hover states. Complements Flamingo Pulse.
Highlight Glow: Neon Rose
Hex: #FF9BCB
Usage: Used sparingly for glow or gradient highlight effects.
Background (Dark): Deep Charcoal
Hex: #1E1E1E
Usage: Main dark-mode background for UI and presentation.
Text (Primary): Soft Porcelain
Hex: #E9E9E9
Usage: Default readable text colour on dark backgrounds.
Text (Muted): Ash Gray
Hex: #B0B0B0
Usage: Secondary text, metadata, or inactive UI states.
Print Match (CMYK):
Flamingo Pulse → C:0 M:70 Y:15 K:0
------------------------------------------------------------
Logo Guidelines
------------------------------------------------------------
Wordmark:
Custom type featuring interlinked "OO" characters that form a playful, anatomical suggestion of breasts.
The "D" subtly mirrors a phallic shape, continuing the tongue-in-cheek theme.
The logo remains clean and geometric, ensuring the humour is implied, not explicit.
Primary Colour:
Flamingo Pulse (#FF5FA2)
Backgrounds:
Best contrast on Deep Charcoal (#1E1E1E) or near-black backgrounds.
Minimum Spacing:
Maintain clear space around the logo equal to the height of the "G" on all sides.
Optional Glow:
Apply Neon Rose (#FF9BCB) outer glow at 1015% opacity for digital assets only.
Avoid glow in print use.
------------------------------------------------------------
Typography
------------------------------------------------------------
Preferred typefaces:
- Montserrat
- Poppins
- Satoshi
Fallbacks:
- Sans-serif system fonts (Arial, Helvetica, etc.)
All-caps for primary titles and logos.
Mixed case for documentation and UI elements.
------------------------------------------------------------
UI Application
------------------------------------------------------------
Buttons:
Flamingo Pulse base colour, with Electric Plum overlay on hover (1020% opacity).
Links and Highlights:
Flamingo Pulse.
Cards and Panels:
Deep Charcoal background with 1px border in Electric Plum (20% opacity).
Headers and Dividers:
Use Soft Porcelain for text and Electric Plum for thin divider accents.
------------------------------------------------------------
Notes
------------------------------------------------------------
Keep the palette consistent; avoid introducing new accent colours.
Ensure contrast ratios meet accessibility guidelines.
When printing, verify Flamingo Pulse tone accuracy using the CMYK approximation listed above.
Maintain a professional tone with playful undertones. The logo and colour scheme should feel adult, confident, and minimalist — never explicit.
------------------------------------------------------------
End of File
------------------------------------------------------------

View File

@ -1,344 +1,159 @@
# 📜 Goondex — Full Changelog
> **Repository:** Leak Technologies
> **Branch:** main
> **Version Line:** v0.3.x Development Cycle
> _Formerly: Porndex Importer (PornPics Importer Module)_
---
## [v0.3.3] — Stable CLI Alias & Import Path Fix (2025-11-06)
### ✨ Added
- Introduced unified `goondex` CLI alias, now functional across Fish, Bash, and Zsh shells.
- Added `--help` and `--version` flags with consistent colorized output.
- Standardized usage and examples block for user clarity.
### 🛠 Changed
- Refactored all internal imports to use absolute `src.importer.*` paths for compatibility.
- Updated `gallery_importer.py` to call `src.importer.tag_gallery` for subprocess calls.
- Simplified alias setup scripts under `/src/utils/install_alias.sh` and `/src/utils/install_alias.fish`.
### 🧹 Maintenance
- Rebuilt virtual environment (`.venv`) and dependency tree under Python 3.13.
- Verified clean CLI operation with `goondex --version` and `goondex --help`.
- Confirmed consistent behavior across development and installed modes.
---
## [v0.3.2-rebuild] — Repository Cleanup & Stabilization (2025-11-02)
### ✨ Added
- Introduced project-wide `.gitignore` to exclude gallery media and model weights.
- Added `VERSION` file (v0.3.2) for synchronized CLI and metadata versioning.
- Implemented environment fix for Fish-shell virtualenv activation.
- Ensured unified `porndex` CLI entrypoint under `/src/importer/cli.py`.
### 🧹 Maintenance
- Removed redundant and outdated tags (v0.3.0v0.4.1) from remote.
- Normalized repository tree and re-pushed clean 4.6 GiB → base v0.3.2.
- Prepared groundwork for `--help` and `--version` CLI arguments.
---
## [v0.3.0] — Modular Tagging Framework Foundation (2025-10-18)
### ✨ Added
- Introduced **YAML-based Tag Dictionaries** stored under `/src/importer/tagging/` for modular, human-readable tag definitions.
- Implemented initial **`refresh-all`** and **`refresh-one`** commands for reapplying tag inference to galleries.
- Added **persistent `inferred_tags` field** in `metadata.json` to differentiate between automated and manual tags.
- Implemented **automatic source inference** for known networks (e.g., Brazzers, FTV Girls, PornPics).
- Enhanced CLI output with colorized progress indicators and summary totals.
### 🛠 Changed
- Refactored `tag_gallery.py` for modular tagging architecture.
- Centralized configuration paths to `/src/importer/config/` for easier project-wide access.
### 🧹 Maintenance
- Improved exception handling for missing or malformed tag dictionaries.
- Added consistent emoji/logging system across CLI commands.
---
## [v0.3.1] — CLI Polishing & Dictionary Improvements (2025-10-19)
### ✨ Added
- Introduced **CLI argument parsing** with `argparse` for a unified user interface.
- Added `--verbose` flag for detailed debugging output.
- Added **metadata validation** to ensure all tag dictionaries contain unique keywords.
### 🛠 Changed
- Adjusted internal path resolution to work from both installed and development environments.
- Improved `load_all_tag_maps()` with caching and better error resilience.
### 🧹 Maintenance
- Cleaned duplicate mappings within YAML files.
- Improved documentation and inline docstrings throughout importer modules.
---
## [v0.3.2] — TPDB Bridge Integration (2025-10-21)
### ✨ Added
- Introduced **`tpdb_bridge.py`** for importing performer data from *ThePornDB* API.
- Added local **SQLite performer database** under `/src/importer/db/performers.db`.
- Added commands:
- `fetch` — Import performers in a single batch.
- `fill-index` — Continuously pull until a limit is reached.
- `enrich` — Fetch and merge extended performer metadata.
- `sync-all` — Hybrid incremental fetch + enrich loop.
- Introduced **local API key management** using `tpdb_api_key.txt` under `/secrets/`.
### 🧹 Maintenance
- Verified importer against TPDB rate limits and ensured safe error recovery.
- Added initial test data exports to `/src/importer/reports/`.
---
## [v0.3.3] — YAML Tag Inference Update (2025-10-20)
### ✨ Added
- Dynamic **YAML tag dictionary loader** for modular tag categories.
- Introduced **automatic source inference** for common networks.
- Added **`refresh-all`** bulk operation to reapply tag inference globally.
### 🛠 Changed
- Refactored `infer_tags()` to merge results from multiple YAML files dynamically.
- Enhanced progress and summary reporting for tag inference.
### 🧹 Maintenance
- Fixed `AttributeError: 'int' object has no attribute 'lower'` when parsing numeric YAML values.
- Standardized internal naming conventions.
---
## [v0.3.4] — Tag Dictionary Validation & Cleanup (2025-10-20)
### ✨ Added
- **`validate-tags`** CLI command for verifying YAML tag dictionaries.
- Detects duplicates, empty entries, and conflicting keywords.
- Outputs detailed summaries with per-keyword conflict listings.
### 🛠 Changed
- Standardized YAML structure enforcement (consistent key capitalization and layout).
- Added human-readable validation summaries.
### 🧹 Maintenance
- General code cleanup and consistent logging system updates.
---
## [v0.3.5] — Tag Statistics & Unified CLI Update (2025-10-20)
### ✨ Added
- **Tag Statistics System**
- Introduced `tag-stats` command to generate frequency analytics across all gallery metadata.
- Produces both console summaries and saved reports:
- `reports/tag_stats.json` — JSON-formatted tag counts.
- `reports/tag_stats_sorted.txt` — human-readable ranked list.
- **Unified CLI Interface (`cli.py`)**
- Consolidated all tagging and maintenance operations into a single entrypoint:
- `refresh-all`, `refresh-one`, `validate-tags`, `tag-stats`, `list`, `list-tags`, `add`, `remove`, `add-multi`, `show-metadata`, `source`
- Standardized command syntax and output formatting across all operations.
### 🛠 Changed
- Centralized tag frequency logic into `tag_gallery.py`.
- Refactored CLI dispatch system for scalability and better error handling.
- Standardized output style (headers, dividers, alignment).
### 🧹 Maintenance
- Automatic creation of `/src/importer/reports/` when missing.
- Verified all tag operations across 60+ galleries.
- Unified terminology and capitalization across CLI help text and docstrings.
### 🧭 Next Steps
- Add color-coded CLI output for readability.
- Implement `--export-csv` flag for `tag-stats` output.
- Begin roadmap for **v0.4.0** introducing ML-based tag confidence scoring and category weighting.
---
## [v0.3.6] — Enrichment Verification & Freshness Tracking (2025-10-26)
### ✨ Added
- **verify-enrichment command**
- Scans performer database for missing metadata (e.g., `url`, `last_updated`).
- Reports enriched vs incomplete entries, with preview via `--show-missing`.
- **Freshness tracking**
- Displays oldest and most recent enrichment timestamps.
- Warns if data is older than the freshness threshold.
- **Automatic TPDB key validation**
- Checks for valid API key and provides setup help if missing.
### 🛠 Changed
- Enrichment logic now guarantees `url` and `last_updated` fields for all performers.
- Improved emoji-based CLI logs for clarity.
- CLI outputs enrichment stats after each batch during `sync-all`.
### 🧹 Maintenance
- Cleanup and refactor of `tpdb_bridge.py` for readability and modular design.
- Verified completeness: **5,087 performers enriched** and up to date.
- Improved sleep timing and network error recovery during long sync runs.
### 🧭 Next Steps
- Add `--stale-days` CLI flag for user-defined freshness thresholds.
- Implement automatic enrichment scheduling via cron or systemd.
- Add shortcut alias `porndex-importer verify` for database status checks.
---
[v0.3.7] — Scene-Based Enrichment & Channel Auto-Upgrade (2025-10-26)
✨ Added
Scene-based enrichment system
New flag --use-scenes enables intelligent inference of performer studios/channels using recent scene data from ThePornDB.
Automatically scans /performers/{id}/scenes for studio, site, or network fields when direct metadata is missing.
Dynamically upgrades performer entries from “Unknown” to valid channel names (e.g., “Desire Room”, “I Want Clips: Princess Chanel”).
Enhanced enrichment diagnostics
--debug-channels now outputs detailed channel inference logs with origin type (e.g., “via scene” or “via performer metadata”).
Emoji-coded output for improved clarity:
🎞 Scene-based upgrades
🎬 Direct metadata
⚫ Missing channel info
Progress verification
verify-enrichment now reports precise completion percentages and lists the most recent 20 upgraded performers.
🛠 Changed
Enrichment process now performs automatic in-place upgrades of performer_sources without overwriting other fields.
Optimized query logic to prioritize unverified performers and handle large datasets efficiently.
Added fine-grained sleep control between API requests to stay compliant with TPDB rate limits.
🧹 Maintenance
Refactored enrichment functions for modularity:
_fetch_studio_from_scenes() introduced for scene scanning.
Simplified argument handling and enriched exception tracing.
Verified enrichment stability across 100 performers with 44% successful channel discovery in live test.
Improved timestamp consistency in verification logs and upgraded database schema resilience.
[v0.4.2] — Unified Importer, ML Pipeline, and Semantic Search (2025-10-27)
✨ Added
Unified Importer CLI (porndex-importer)
Replaces legacy multi-script workflow with a single command entrypoint.
Introduced import, refresh-all, refresh-one, validate-tags, tag-stats, and source subcommands.
Includes colorized CLI summaries and consistent emoji headers.
Machine Learning Dataset Builder
New module: ml/ml_dataset_builder.py
Generates structured dataset in ML/porndex_dataset.jsonl from all indexed galleries.
Each record includes title, models, tags, and image paths for hybrid ML ingestion.
Embedding Generation Module
Added ml/ml_embeddings.py to create hybrid text + image embeddings.
Builds per-gallery NPZ files under ML/embeddings/ and a consolidated embeddings_index.jsonl.
Supports configurable --img-samples and automatic device detection (--device auto).
Semantic & Strict Search
search command supports three modes:
semantic: CLIP + text hybrid cosine similarity (default)
text: text-only vector space search
strict: literal match filtering before vector ranking
Results show top-ranked galleries, confidence scores, and gallery IDs.
ML Verification Command
verify confirms index consistency, embedding count, and file integrity.
Directory Auto-Creation
Automatically generates ML/embeddings/ and ML/ if missing.
🛠 Changed
Importer Pipeline Refactor
Moved all CLI handling into src/importer/cli.py.
Centralized environment setup and config loading.
Replaced direct Python script calls with porndex-importer entrypoint.
Tagging System
Unified YAML dictionary loading for clothing, acts, body, and context.
Improved tag inference logging and duplicate suppression.
Output Formatting
Standardized headers, dividers, and indentation across all CLI commands.
Added readable time and path indicators for long-running operations.
🧹 Maintenance
Verified full ML dataset build across 150 test galleries (100% JSONL completion).
Added fallback for empty or missing image lists in dataset builder.
Improved error handling for partial downloads and interrupted imports.
Streamlined path resolution for consistent operation across dev and installed modes.
Updated documentation:
/docs/CLI_USAGE.md rewritten for v0.4.2.
/README.md modernized with full project tree and ML pipeline overview.
🧭 Next Steps
Begin v0.4.3v0.5.x roadmap:
Integrate GroundingDINO + GroundedSAM for visual region detection.
Implement attribute extraction (gender → ethnicity → clothing).
Build visual verification tool (ml_dataset_inspector.py).
Add tag-confidence weighting system.
Extend TPDB bridge to cross-link enriched performer metadata into ML training records.
🧩 Summary of Current State (as of v0.4.2)
✅ Fully unified CLI under porndex-importer
✅ Stable YAML tagging + validation
✅ Complete ML dataset and embedding generation workflow
✅ Working hybrid semantic search
✅ Verified 150-gallery dataset index
© 2025 Leak Technologies — Porndex Importer Project
File: docs/CHANGELOG.md
Version: v0.3.4
Last updated: November 2025
Maintainer: Leak Technologies
Project: Goondex
------------------------------------------------------------
📜 Goondex — Full Changelog
------------------------------------------------------------
Repository: Leak Technologies
Branch: main
Version Line: v0.3.x Development Cycle
------------------------------------------------------------
[v0.3.4] — Tagging Logic & Documentation Overhaul (2025-11-07)
------------------------------------------------------------
🛠 Fixes
- Rebuilt tag parsing and inference logic for PornPics galleries.
- Prevented accidental tag overwrites during gallery refreshes.
- Improved duplicate tag merging, case normalization, and category alignment.
📘 Documentation
- Added `TAGGING.md` with full explanation of YAML tag inference.
- Added `ARCHITECTURE.md` outlining Goondexs module hierarchy.
- Added `GALLERIES.md` detailing folder naming and metadata schema.
- Updated `README.md`, `ROADMAP.md`, and `BRANDING.md` to match v0.3.4 project direction.
🧹 Maintenance
- Added `--show-tags` debug flag to CLI for verifying tag inference results.
- Improved path handling, error catching, and overall refresh stability.
- Normalized tag capitalization across YAML dictionaries and output summaries.
🧭 Next Steps
- Implement category weighting for tag relevance scoring.
- Introduce interactive tag inspector CLI (`goondex inspect-tags`).
- Begin work on validation of inferred vs manual tag consistency.
------------------------------------------------------------
[v0.3.3] — Stable CLI Alias & Import Path Fix (2025-11-06)
------------------------------------------------------------
✨ Added
- Added unified `goondex` CLI alias for Fish, Bash, and Zsh environments.
- Introduced `--help` and `--version` flags with consistent colorized output.
- Improved help text readability and formatting for all commands.
🛠 Changed
- Standardized all imports to `src.importer.*` format for compatibility.
- Updated subprocess calls to use `src.importer.tag_gallery`.
- Simplified alias setup scripts under `/src/utils/install_alias.sh` and `/src/utils/install_alias.fish`.
🧹 Maintenance
- Rebuilt virtual environment under Python 3.13.
- Verified clean execution of CLI across all supported shells.
- Confirmed consistent `goondex --version` and `goondex --help` output.
------------------------------------------------------------
[v0.3.2-rebuild] — Repository Cleanup & Stabilization (2025-11-02)
------------------------------------------------------------
✨ Added
- Introduced `.gitignore` to exclude gallery media and ML assets.
- Added `VERSION` file for synchronized CLI and metadata versioning.
- Fixed Fish-shell virtualenv activation behavior.
🧹 Maintenance
- Removed redundant commits and outdated files from older Porndex lineage.
- Normalized repository structure to a clean, modular state.
- Established new base branch for Goondex v0.3.x development.
------------------------------------------------------------
[v0.3.1] — CLI Polishing & Internal Improvements (2025-10-19)
------------------------------------------------------------
✨ Added
- Added unified CLI argument parsing with `argparse`.
- Introduced verbose mode for debugging (`--verbose`).
🛠 Changed
- Improved internal path resolution for dev vs installed modes.
- Enhanced YAML loader fault tolerance and caching.
🧹 Maintenance
- Cleaned redundant imports and improved logging consistency.
- Added internal docstrings for importer functions.
------------------------------------------------------------
[v0.3.0] — Goondex Framework Foundation (2025-10-18)
------------------------------------------------------------
✨ Added
- Established base project structure under `/src/importer/`.
- Implemented initial gallery importer for PornPics.com.
- Introduced modular YAML tag dictionary system.
- Added basic CLI commands:
- `import <url>`
- `refresh-all`
- `refresh-one`
- `validate-tags`
- `tag-stats`
- `list`, `list-tags`, `add`, `remove`, `show-metadata`, `source set`
🛠 Changed
- Reorganized importer modules for clarity and testability.
🧹 Maintenance
- Set up `docs/`, `reports/`, and `assets/` directories.
- Created initial `CHANGELOG.md` and `README.md`.
------------------------------------------------------------
📦 Legacy Development — Porndex Importer (20242025)
------------------------------------------------------------
_The following section documents the earlier development cycle
that led to the creation of Goondex. The system used a different
tagging and metadata architecture before the full rebuild in
late 2025. These entries remain for historical and archival
purposes only._
------------------------------------------------------------
[v0.2.x] — TPDB Integration & Enrichment Phase (2025-04 → 2025-10)
------------------------------------------------------------
- Integrated ThePornDB API for performer enrichment.
- Added `fetch`, `fill-index`, `enrich`, and `sync-all` commands.
- Created SQLite database for performer metadata.
- Added automatic API key validation and freshness checks.
- Verified thousands of enriched performers with timestamp logging.
------------------------------------------------------------
[v0.1.x] — Modular Tagging System Prototype (2024-12 → 2025-03)
------------------------------------------------------------
- Introduced first YAML-based tag dictionaries for clothing, acts, and body type.
- Implemented prototype tag inference pipeline using keyword heuristics.
- Added basic CLI interface for gallery tagging and metadata refresh.
- Created `refresh-all`, `refresh-one`, and `validate-tags` operations.
- Implemented early tag frequency statistics and conflict validation.
------------------------------------------------------------
[v0.0.x] — PornPics Importer Foundations (2024)
------------------------------------------------------------
- Built initial gallery importer for PornPics.com.
- Implemented threaded image downloading with metadata.json output.
- Added local caching and source indexing system.
- Developed basic tag extraction based on gallery titles and captions.
- Established early directory structure under `/src/importer/`.
------------------------------------------------------------
🧩 Legacy Summary
------------------------------------------------------------
The Porndex Importer laid the groundwork for gallery parsing,
basic tagging, and performer enrichment, but its architecture
was replaced by the more robust, modular Goondex framework in
OctoberNovember 2025. Goondex introduces a new YAML-based
tagging model, a cleaner CLI, and improved documentation
standards across all modules.
------------------------------------------------------------
© 2025 Leak Technologies — Goondex Project
------------------------------------------------------------

View File

@ -1,172 +1,184 @@
# 🎩 PornPics Importer — CLI Usage Guide
### Version 0.4.2 — Import, Auto-Tag & ML Integration
File: docs/CLI_USAGE.md
Version: v0.4.2
Last updated: November 2025
Maintainer: Leak Technologies
Project: Goondex
---
------------------------------------------------------------
Goondex CLI Usage Guide
------------------------------------------------------------
## 📦 Overview
Purpose:
Provide a full command reference for importing, tagging, validating, and searching PornPics galleries using the Goondex command-line interface.
Tooling to:
------------------------------------------------------------
Overview
------------------------------------------------------------
- Import & refresh PornPics.com galleries
- Auto-tag via YAML dictionaries and unified CLI
- Manage sources/tags and gallery index
- Build ML datasets & hybrid (text+image) embeddings
- Run semantic / strict search over your library
The Goondex CLI provides a unified workflow to:
1. Import and refresh PornPics galleries
2. Automatically tag galleries using YAML dictionaries
3. Manage sources and metadata through a single command entrypoint
4. Generate statistics and validation reports
5. Build and search machine learning datasets (hybrid text + image)
Project root:
~/Projects/PD/PornPics_Importer/Porndex_PornpicsImporter/
yaml
Copy code
------------------------------------------------------------
1. Importing Galleries
------------------------------------------------------------
---
Quick Import (preferred):
goondex import "https://www.pornpics.com/galleries/<gallery-id>/"
## 🧬 1) Importing Galleries
Process:
- Creates a new folder in Galleries/<timestamp>_<models>_<title>/
- Downloads all images (threaded)
- Saves metadata.json
- Auto-tags the gallery using refresh-one
- Rebuilds the global index
### Quick Import (preferred)
```bash
porndex-importer import "https://www.pornpics.com/galleries/<gallery-id>/"
What happens:
Saves to Galleries/<timestamp>_<models>_<title>/
Downloads images (threaded) and writes metadata.json
Auto-tags the gallery (refresh-one) and rebuilds the index
Prints a colorized gallery summary
Legacy (direct script)
bash
Copy code
Legacy method:
python src/importer/gallery_importer.py "https://www.pornpics.com/galleries/<gallery-id>/"
🔁 2) Refreshing Metadata
Refresh all galleries
bash
Copy code
------------------------------------------------------------
2. Refreshing Metadata
------------------------------------------------------------
Refresh all galleries:
python src/importer/gallery_importer.py --refresh-all
Re-fetches metadata for every gallery with source_url
Merges fields (preserves local tags)
Function:
- Re-fetches metadata for every gallery that has a source_url
- Merges new fields without overwriting local tags
- Automatically re-applies tag inference
- Rebuilds Galleries/index.json
Auto-reapplies tag inference
------------------------------------------------------------
3. Tag Management
------------------------------------------------------------
Updates Galleries/index.json
Unified syntax:
goondex <command> [args...]
🎟️ 3) Tag Management (via unified CLI)
bash
Copy code
porndex-importer <command> [args...]
Common commands:
Common operations:
refresh-all → refresh tags for all galleries
refresh-one "<folder>" → refresh tags for a single gallery
validate-tags → validate YAML tag dictionaries
tag-stats → generate frequency report (saved to src/importer/reports)
list → list all galleries
list-tags "<folder>" → show tags for one gallery
add "<folder>" "Tag" → add a tag manually
remove "<folder>" "Tag" → remove a tag manually
add-multi "<folder>" "Tag1,Tag2" → add multiple tags at once
show-metadata "<folder>" → view metadata.json content
source "<folder>" set "Source" → set a single source
source bulk set "Source" → set the same source for all galleries
Action Command
Refresh all porndex-importer refresh-all
Refresh one porndex-importer refresh-one "<folder>"
Validate YAML dictionaries porndex-importer validate-tags
Tag statistics (reports to /src/importer/reports) porndex-importer tag-stats
List galleries porndex-importer list
List tags (one) porndex-importer list-tags "<folder>"
Add tag porndex-importer add "<folder>" "TagName"
Remove tag porndex-importer remove "<folder>" "TagName"
Add multiple porndex-importer add-multi "<folder>" "Tag1,Tag2"
Show metadata porndex-importer show-metadata "<folder>"
Set source (single) porndex-importer source "<folder>" set "Brazzers"
Set source (bulk) porndex-importer source bulk set "PornPics"
Tag inference uses YAML dictionaries stored under:
src/importer/tag_dictionaries/
Tag inference uses YAML dictionaries under src/importer/tag_dictionaries/ (clothing, acts, body, context, etc.).
------------------------------------------------------------
4. TPDB Performer Bridge (optional)
------------------------------------------------------------
🧫 4) TPDB Performer Bridge (optional)
bash
Copy code
Command:
python -m performers.tpdb_bridge <cmd> [flags]
Highlights:
check-key, fetch, fill-index, enrich, sync-all
Common flags:
check-key, fetch, fill-index, enrich, sync-all
list-sources, add-source, delete-source
verify-enrichment --export-json
list-sources, add-source, delete-source
Database:
src/importer/db/performers.db
verify-enrichment --export-json
Reports:
src/importer/reports/
Stores DB at src/importer/db/performers.db and reports to src/importer/reports/.
------------------------------------------------------------
5. Example Workflow
------------------------------------------------------------
⚙️ 5) Example Workflow
bash
Copy code
# Import a gallery
porndex-importer import "https://www.pornpics.com/galleries/<id>/"
Import a gallery:
goondex import "https://www.pornpics.com/galleries/<id>/"
# Refresh tags for one folder (if you edited metadata)
porndex-importer refresh-one "<folder-name>"
Refresh tags for one folder:
goondex refresh-one "<folder-name>"
# Validate YAML dictionaries
porndex-importer validate-tags
Validate YAML dictionaries:
goondex validate-tags
# Build tag stats
porndex-importer tag-stats
🤖 6) Machine Learning (ML) Pipeline
Build dataset (reads from Galleries/, no file moves)
bash
Copy code
Generate tag statistics:
goondex tag-stats
------------------------------------------------------------
6. Machine Learning (ML) Pipeline
------------------------------------------------------------
Dataset builder:
python -m ml.ml_dataset_builder
Creates ML/porndex_dataset.jsonl with records:
json
Copy code
Creates file:
ML/porndex_dataset.jsonl
Example entry:
{
"gallery_id": "...",
"title": "...",
"models": ["..."],
"tags": ["..."],
"categories": ["..."],
"image_paths": [".../Galleries/.../images/001.jpg", "..."]
"image_paths": [".../Galleries/.../001.jpg"]
}
Build hybrid embeddings (text + image)
bash
Copy code
Build hybrid embeddings:
python -m ml.ml_embeddings build --img-samples 8 --device auto
Outputs:
ML/embeddings/<gallery_id>.npz (text / image / combined vectors)
ML/embeddings/<gallery_id>.npz
ML/embeddings_index.jsonl
Search (semantic / text / strict)
bash
Copy code
# Hybrid semantic (default)
python -m ml.ml_embeddings search "japanese redhead creampie"
# Text-only space
python -m ml.ml_embeddings search "japanese redhead creampie" --index text
# Strict keyword pre-filter (title/tags must include all tokens)
Search modes:
python -m ml.ml_embeddings search "japanese redhead creampie"
python -m ml.ml_embeddings search "japanese redhead creampie" --index text
python -m ml.ml_embeddings search "interracial bbc" --mode strict
Verify
bash
Copy code
Verify embedding integrity:
python -m ml.ml_embeddings verify
🗂️ 7) Data Locations
Path Purpose
Galleries/ Imported galleries (images + metadata)
Galleries/index.json Library index
src/importer/reports/ Tag stats & TPDB reports
ML/porndex_dataset.jsonl ML dataset source
ML/embeddings/ NPZ vectors
ML/embeddings_index.jsonl Search index
🧭 8) Roadmap (post-v0.4.2)
GroundingDINO + Grounded-SAM for localized detections (people, clothing)
------------------------------------------------------------
7. Data Locations
------------------------------------------------------------
Attribute heads for gender → ethnicity → clothing brand (e.g., socks)
Galleries/ → imported galleries and images
Galleries/index.json → master index of all galleries
src/importer/reports/ → YAML validation and statistics reports
ML/porndex_dataset.jsonl → ML dataset definition
ML/embeddings/ → embedding vector files
ML/embeddings_index.jsonl → search index for semantic lookups
Active-learning loop to leverage existing metadata as weak labels
------------------------------------------------------------
8. Roadmap (post-v0.4.2)
------------------------------------------------------------
🧾 Notes
All commands are local/offline friendly.
- Integrate GroundingDINO + Grounded-SAM for localized object detection
- Add attribute heads for gender, ethnicity, and clothing
- Develop an active-learning loop to refine weakly-labeled data
- Introduce interactive tag editor for review and correction
Rebuilding dataset/embeddings is safe and idempotent.
------------------------------------------------------------
Notes
------------------------------------------------------------
Importer auto-tags on import/refresh using the YAML dictionaries.
All commands operate locally and offline.
Rebuilding datasets and embeddings is safe and idempotent.
Importer auto-tags new galleries using YAML dictionaries by default.
All modules adhere to the clean modular design outlined in ARCHITECTURE.md.
Versioned documentation ensures clarity between CLI and code versions.
Author: Leak Technologies • License: MIT (internal research)
------------------------------------------------------------
End of File
------------------------------------------------------------

197
docs/GALLERIES.md Normal file
View File

@ -0,0 +1,197 @@
File: docs/GALLERIES.md
Version: v0.3.4
Last updated: November 2025
Maintainer: Leak Technologies
Project: Goondex
------------------------------------------------------------
Goondex Gallery Structure and Metadata Specification
------------------------------------------------------------
Purpose:
Define how galleries are stored, named, and structured within the Goondex system.
This document standardizes the folder layout, metadata schema, and indexing process to ensure all galleries are consistent and compatible with importer, tagger, and ML modules.
------------------------------------------------------------
1. Directory Structure
------------------------------------------------------------
All galleries are stored under:
~/Projects/PD/Goondex/Galleries/
Each imported gallery is placed in its own folder:
Galleries/<timestamp>_<model(s)>_<short_title>/
Example:
Galleries/20251106_1032_Mariella_Sun_Takes_A_Shower/
Inside each gallery folder:
metadata.json → Core metadata record
001.jpg, 002.jpg, ... → Sequentially numbered images
failed_downloads.json (opt.) → List of images that failed to download
thumbnail.jpg (future) → Designated cover image for previews
A global index is maintained at:
Galleries/index.json
------------------------------------------------------------
2. Naming Convention
------------------------------------------------------------
The importer automatically generates folder names using:
<timestamp>_<models>_<shortened_title>
Rules:
- Timestamp format: YYYYMMDD_HHMM (UTC local)
- Model names separated by underscores
- Title truncated to 40 characters max
- Illegal filesystem characters replaced with underscores
- Spaces are converted to underscores
This ensures folder names remain unique, sortable, and descriptive.
------------------------------------------------------------
3. Metadata Specification (metadata.json)
------------------------------------------------------------
Each gallery includes a metadata.json file containing descriptive fields.
Example:
{
"title": "Mariella Sun Takes A Shower",
"models": ["Mariella Sun"],
"categories": ["Amateur", "Shower", "Solo"],
"tags": ["Blonde", "Teen", "Wet", "Shower", "Outdoor"],
"inferred_tags": ["Amateur", "Solo", "Wet"],
"source_url": "https://www.pornpics.com/galleries/12345/",
"source": { "network": "PornPics", "channel": null },
"views": 5421,
"rating": 4.8,
"image_count": 52,
"image_urls": [
"https://cdn.pornpics.com/2025/11/12345_001.jpg",
"https://cdn.pornpics.com/2025/11/12345_002.jpg"
],
"import_path": "~/Projects/PD/Goondex/Galleries/20251106_1032_Mariella_Sun_Takes_A_Shower",
"last_refreshed": "2025-11-06T15:40:21Z"
}
------------------------------------------------------------
4. Field Definitions
------------------------------------------------------------
title
→ Human-readable title as extracted from the source site.
models
→ List of performer names detected or scraped from metadata.
categories
→ Source sites categorical labels (if available).
tags
→ All user and inferred tags combined.
inferred_tags
→ Tags automatically added by the tag_gallery.py module.
source_url
→ The original URL used for import.
source
→ Object with optional "network" and "channel" fields.
Example: { "network": "PornPics", "channel": null }
views
→ Scraped view count from the source (if available).
rating
→ Normalized 05 rating (float).
image_count
→ Number of valid images downloaded.
image_urls
→ Full list of image URLs (for re-download or verification).
import_path
→ Absolute path where this gallery is stored locally.
last_refreshed
→ ISO 8601 timestamp marking last metadata update.
------------------------------------------------------------
5. Index File
------------------------------------------------------------
Galleries/index.json is rebuilt after each import or refresh operation.
It includes essential details for quick CLI lookups and searches.
Example structure:
{
"galleries": [
{
"folder": "20251106_1032_Mariella_Sun_Takes_A_Shower",
"title": "Mariella Sun Takes A Shower",
"models": ["Mariella Sun"],
"tags": ["Blonde", "Teen", "Shower"],
"source": "PornPics",
"image_count": 52,
"last_refreshed": "2025-11-06T15:40:21Z"
}
]
}
------------------------------------------------------------
6. Refresh and Rebuild Process
------------------------------------------------------------
When running:
goondex import <url>
→ Imports gallery, creates metadata.json, downloads images, auto-tags.
goondex refresh-one <folder>
→ Re-runs tag inference and updates metadata fields.
goondex refresh-all
→ Applies inference and updates to all galleries under Galleries/.
After any import or refresh, index_builder.py:
- Scans all folders for metadata.json
- Builds Galleries/index.json
- Removes stale entries
- Reports summary to console
------------------------------------------------------------
7. Cache and Error Handling
------------------------------------------------------------
failed_downloads.json
→ Written if any images fail to download.
→ Contains URL and error message for each failure.
Re-importing a gallery with the same title merges metadata:
- Local tags are preserved
- Missing fields (views, rating, etc.) are filled
- Downloaded images are skipped if already present
------------------------------------------------------------
8. Future Enhancements (planned for v0.4.x)
------------------------------------------------------------
- thumbnail.jpg generation from first image
- Gallery-level previews for upcoming Web UI
- Extended metadata (dominant colours, detected subjects)
- Support for multi-site import with normalized schema
------------------------------------------------------------
9. Developer Notes
------------------------------------------------------------
Do not manually rename or move gallery folders after import.
Always rebuild the index via CLI if metadata is manually edited.
Each metadata.json should remain human-readable and formatted with indent=4.
------------------------------------------------------------
End of File
------------------------------------------------------------

76
docs/HISTORY.md Normal file
View File

@ -0,0 +1,76 @@
File: docs/HISTORY.md
Version: v0.3.4
Last updated: November 2025
Maintainer: Leak Technologies
Project: Goondex
------------------------------------------------------------
📖 Goondex — Project History
------------------------------------------------------------
### Overview
Goondex is the modern evolution of the former **Porndex Importer**, a Python-based gallery indexing and tagging system originally focused on PornPics.com.
Between 2024 and 2025, the project underwent a complete rebuild to address technical debt, unify its codebase, and formalize documentation standards under the Leak Technologies ecosystem.
This file provides a concise historical overview of that transition, outlining the legacy systems, lessons learned, and the motivations behind the current Goondex architecture.
------------------------------------------------------------
🕰️ 2024 — Origins: PornPics Importer
------------------------------------------------------------
The earliest version of the system, built throughout 2024, began as a lightweight tool for automatically downloading and organizing PornPics galleries.
It featured:
- Threaded image fetching with metadata export to `metadata.json`.
- Simple folder-based organization by model and network.
- Prototype keyword-based tagging based solely on gallery titles.
While functional, it lacked flexibility, configuration, and reliable error recovery.
This groundwork eventually became the foundation for Porndex.
------------------------------------------------------------
⚙️ 20242025 — Porndex Importer Era
------------------------------------------------------------
The project expanded rapidly under the name **Porndex**, introducing YAML-based tag dictionaries and the first CLI-driven workflows.
Porndex introduced:
- Early modular tagging using keyword dictionaries.
- Performer metadata enrichment through ThePornDB API.
- SQLite-backed performer database for indexing and updates.
- Validation commands for tag consistency and statistics reporting.
By mid-2025, the system had grown in complexity and scope, but the architecture was increasingly brittle.
The need for a modular, self-contained, and shell-friendly design became clear — leading to the creation of **Goondex**.
------------------------------------------------------------
🚀 Late 2025 — The Goondex Rebuild
------------------------------------------------------------
In October 2025, the codebase was completely restructured and relaunched as **Goondex**.
This marked the transition from an experimental importer to a formalized, maintainable platform.
Key advancements introduced in the Goondex rebuild:
- **Unified CLI:** A single entrypoint (`goondex`) for all operations.
- **YAML Tagging Framework:** Refined dictionaries for acts, clothing, and body descriptors.
- **Improved Error Handling:** Safe path operations and better exception tracing.
- **Cross-Shell Compatibility:** Alias scripts for Fish, Bash, and Zsh environments.
- **Documentation Suite:** Full set of Markdown docs — `ARCHITECTURE.md`, `TAGGING.md`, `GALLERIES.md`, `ROADMAP.md`, and `BRANDING.md`.
The rebuild also focused on subtlety and modular design — retaining the underlying functionality of the PornPics importer while shedding the “Porndex” branding in favor of a more neutral, system-oriented identity.
------------------------------------------------------------
🧭 Present — Goondex v0.3.x Development Line
------------------------------------------------------------
The v0.3.x cycle focuses on:
- Robust tagging accuracy and metadata stability.
- Clean CLI interface and cross-environment consistency.
- Proper documentation, logging, and version traceability.
- Preparing the foundation for future ML-assisted tagging modules.
As of **v0.3.4 (November 2025)**, Goondex features a fully functional tagging system, stable CLI aliasing, and a clearly documented repository structure.
------------------------------------------------------------
🧩 Legacy Acknowledgement
------------------------------------------------------------
Goondex owes its foundation to the original Porndex Importer developed in 20242025.
While the old tagging and enrichment systems are no longer active, their core ideas continue to influence Goondexs modern design philosophy — emphasizing modularity, transparency, and resilience.
------------------------------------------------------------
© 2025 Leak Technologies — Goondex Project
------------------------------------------------------------

69
docs/LICENSE Normal file
View File

@ -0,0 +1,69 @@
File: LICENSE
Version: v0.3.4
Last updated: November 2025
Maintainer: Leak Technologies
Project: Goondex
------------------------------------------------------------
Goondex Project License
------------------------------------------------------------
Copyright (c) 2025 Leak Technologies
All rights reserved.
Developed and maintained by Stu Leak and contributors.
Goondex is a locally hosted research and archival utility designed for automated metadata analysis, machine learning dataset preparation, and intelligent gallery indexing.
------------------------------------------------------------
Permission and Usage
------------------------------------------------------------
1. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to use, copy, modify, and merge copies of the Software for **personal, educational, or research purposes only**, subject to the following conditions:
- The Software **must not** be sold, sublicensed, or distributed commercially.
- The Software **must not** be used for or in connection with commercial adult entertainment platforms or profit-seeking ventures.
- All redistributions or modifications must retain this notice in full.
- Attribution to “Leak Technologies” must remain intact in all derivative works.
2. The Software may contain open-source components licensed under their respective terms.
Users are responsible for complying with any additional conditions imposed by such third-party licenses.
------------------------------------------------------------
Limitations
------------------------------------------------------------
- The Software is provided strictly for **archival and research** use.
- Leak Technologies assumes **no liability** for misuse, redistribution, or any legal consequences resulting from the use of the Software.
- The Software is **not intended for production deployment** on public servers or within commercial frameworks.
- Any attempt to utilize this system for monetized indexing or distribution of copyrighted materials is **expressly prohibited**.
------------------------------------------------------------
Warranty Disclaimer
------------------------------------------------------------
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
------------------------------------------------------------
Ethical Statement
------------------------------------------------------------
Goondex operates on the principle of **privacy, consent, and local autonomy**.
It is not a crawler, data harvester, or redistributor.
Users are expected to import only legally obtained material and respect all content ownership rights.
Any usage that violates laws concerning data ownership, explicit consent, or content distribution invalidates this license.
------------------------------------------------------------
Summary
------------------------------------------------------------
✔ Personal, private, or research use — permitted
✘ Commercial use, resale, redistribution — prohibited
✔ Modification for local use — permitted
✘ Cloud or API resale integration — prohibited
✔ Educational publication citing Goondex — permitted with attribution
------------------------------------------------------------
End of File
------------------------------------------------------------

176
docs/README.md Normal file
View File

@ -0,0 +1,176 @@
File: docs/README.md
Version: v0.3.4
Last updated: November 2025
Maintainer: Leak Technologies
Project: Goondex
------------------------------------------------------------
Goondex — PornPics Importer & ML Pipeline
------------------------------------------------------------
A modular, documented gallery importer for PornPics.com, forming the foundation of the Goondex ecosystem.
Supports importing, tagging, metadata enrichment, and generation of ML-ready datasets for semantic search and classification.
------------------------------------------------------------
1. Project Overview
------------------------------------------------------------
Goondex automates the process of:
- Downloading and organizing galleries from PornPics.com
- Generating structured metadata and tag inference
- Enriching galleries via ThePornDB (TPDB) performer API
- Building machine-learning datasets and embeddings
- Enabling semantic, hybrid (text + image) search
All operations are handled locally — no cloud dependencies or external databases are required.
The system is modular, transparent, and designed for research and personal archival use.
------------------------------------------------------------
2. Project Structure
------------------------------------------------------------
src/
├── importer/ → Core importer logic and CLI tools
│ ├── cli.py → Unified CLI entrypoint (goondex command)
│ ├── gallery_importer.py → Gallery parser and downloader
│ ├── tag_gallery.py → Tag inference and YAML management
│ ├── reports/ → Auto-generated validation and tag stats
│ ├── db/ → TPDB performer cache and local databases
│ ├── secrets/ → Local-only API keys (ignored by Git)
│ └── tag_dictionaries/ → Modular YAML tag dictionaries
├── ml/ → Machine learning and semantic search
│ ├── ml_dataset_builder.py → Builds JSONL dataset for embeddings
│ ├── ml_embeddings.py → Generates CLIP + text hybrid vectors
│ ├── ml_dataset_inspector.py → (planned) visual dataset viewer
│ └── ml_vision_detector.py → (planned) DINO + SAM visual tagging
├── docs/ → Documentation, changelogs, and brand files
├── tests/ → Unit and integration testing suite
└── assets/ → Static samples and test assets
------------------------------------------------------------
3. Environment Setup
------------------------------------------------------------
Create a virtual environment and install dependencies:
bash
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
Set the source path for development:
bash
export PYTHONPATH=src
------------------------------------------------------------
4. Quick Start
------------------------------------------------------------
Import a gallery from PornPics:
bash
goondex import "https://www.pornpics.com/galleries/example-id/"
Automatically:
- Downloads images and metadata
- Saves to Galleries/<timestamp>_<model>_<title>/
- Generates metadata.json
- Runs auto-tagging (refresh-one)
- Updates the central gallery index
------------------------------------------------------------
5. CLI Overview
------------------------------------------------------------
All commands are run via:
goondex <command> [args...]
Examples:
goondex refresh-all
goondex refresh-one "<folder>"
goondex validate-tags
goondex tag-stats
goondex list-tags "<folder>"
goondex add "<folder>" "TagName"
goondex source bulk set "PornPics"
The CLI automatically detects YAML tag dictionaries and applies them during refresh or import.
------------------------------------------------------------
6. Machine Learning Pipeline
------------------------------------------------------------
Build dataset:
bash
python -m ml.ml_dataset_builder
Output:
ML/porndex_dataset.jsonl
Each record includes:
{
"gallery_id": "...",
"title": "...",
"models": ["..."],
"tags": ["..."],
"categories": ["..."],
"image_paths": ["..."]
}
Build embeddings:
bash
python -m ml.ml_embeddings build --img-samples 8 --device auto
Output:
ML/embeddings/<gallery_id>.npz
ML/embeddings_index.jsonl
Search:
bash
python -m ml.ml_embeddings search "asian redhead solo"
Modes:
- semantic (default) — hybrid vector cosine similarity
- text — text-only search
- strict — literal keyword matching
Verify:
bash
python -m ml.ml_embeddings verify
------------------------------------------------------------
7. Development Guidelines
------------------------------------------------------------
- Use descriptive variable names and structured commits
- Avoid emojis in code and commit messages
- Always document new features in docs/CHANGELOG.md
- Keep CLI text synchronized with docs/CLI_USAGE.md
- Use version tagging for all major commits
------------------------------------------------------------
8. Roadmap Summary
------------------------------------------------------------
Stage Feature Description
----------- -------------------------------- -----------------------------
✅ v0.3.x Stable CLI & Tagging Unified CLI and YAML cleanup
⚙️ v0.4.x ML Embeddings & Dataset Builder Build hybrid vectors for search
⏳ v0.5.x Visual Intelligence DINO + SAM + attribute detection
🔜 v0.6.x Local Web UI Lightweight gallery browser
🚀 v1.0.0 Full Stable Release Plugin importers + visual ML tools
------------------------------------------------------------
9. Licensing
------------------------------------------------------------
License: Research-Use MIT Variant
Author: Leak Technologies
Maintainer: Stu Leak
For personal, non-commercial, and research use only.
------------------------------------------------------------
End of File
------------------------------------------------------------

160
docs/ROADMAP.md Normal file
View File

@ -0,0 +1,160 @@
File: docs/ROADMAP.md
Version: v0.3.4
Last updated: November 2025
Maintainer: Leak Technologies
Project: Goondex
------------------------------------------------------------
Goondex Development Roadmap
------------------------------------------------------------
Purpose:
Outline upcoming milestones, version objectives, and long-term development goals for the Goondex ecosystem.
This roadmap provides an overview of feature direction, architectural priorities, and research-driven enhancements.
------------------------------------------------------------
1. Project Vision
------------------------------------------------------------
Goondex is designed as an automated, privacy-respecting adult content cataloguer, focused on:
- Intelligent tagging and metadata curation
- Machine-learning assisted gallery organization
- Local-first, offline-friendly operation
- Open, modular, and human-readable data formats
The system evolves through iterative versioning with strong emphasis on stability, transparency, and reproducibility.
------------------------------------------------------------
2. Version Milestones
------------------------------------------------------------
v0.3.x — Consolidation Phase
------------------------------------
Status: Active
Goals:
- Finalize CLI alias and stable import structure
- Standardize metadata.json schema and YAML dictionaries
- Document all core systems (CLI, Galleries, Tagging, Branding)
- Implement validation tools for dictionaries and index integrity
- Ensure consistency across all module imports (src.importer.*)
- Establish internal branding and developer documentation standards
v0.4.x — Machine Learning Integration
------------------------------------
Planned Start: December 2025
Goals:
- Introduce ML dataset builder and embedding engine
- Add hybrid (text + image) search support
- Implement GroundingDINO + Grounded-SAM detection pipeline
- Build attribute heads for ethnicity, gender, and clothing
- Introduce semantic tag inference based on contextual cues
- Develop auto-thumbnail generator for galleries
- Establish foundation for future “Goondex ML Core”
v0.5.x — Visual Intelligence and Automation
------------------------------------
Planned Start: Q1 2026
Goals:
- Expand ML integration to support local fine-tuning
- Train local model for visual tagging (SAM, CLIP, BLIP2)
- Enable partial face and body region detection
- Add scene clustering (e.g., “bathroom scenes”, “studio sets”)
- Improve NLP-based title parsing for better model recognition
- Integrate hybrid similarity search (image-to-gallery)
v0.6.x — Web Interface & UX
------------------------------------
Planned Start: Q2 2026
Goals:
- Create lightweight local Web UI for browsing and search
- Add thumbnail preview grid for galleries
- Support filtering by tag, performer, or source
- Allow tag editing via UI (writes to metadata.json)
- Visualize ML embeddings as clusters or heatmaps
- Introduce color-coded category icons based on tag domains
v0.7.x — Multi-Source Expansion
------------------------------------
Planned Start: Q3 2026
Goals:
- Add support for multiple import sources (e.g., TheHun, Fapello)
- Normalize cross-site metadata into unified schema
- Introduce per-site tag mappings for source-specific categories
- Develop rate-limiting, retries, and error resilience for scraping
- Expand YAML dictionaries to include new tag categories
v0.8.x — Semantic Intelligence & AI Curation
------------------------------------
Planned Start: Q4 2026
Goals:
- Train in-house multimodal model for semantic gallery tagging
- Support “smart tagging” with probabilistic tag confidence
- Implement user feedback learning loop for refinement
- Add multilingual tag inference (English, French, German)
- Develop automatic duplicate detection and merge logic
- Add story-based inference (scene context across images)
v0.9.x — Optimization & Deployment
------------------------------------
Planned Start: 2027
Goals:
- Package as standalone application with installer
- Implement database indexing for instant search
- Optimize YAML and JSON parsing for large collections
- Introduce CLI subcommands for advanced maintenance tasks
- Add backup, restore, and migration tools
- Begin Linux packaging (PKGBUILD, Flatpak manifest)
v1.0.0 — Stable Release
------------------------------------
Planned Start: 2027
Goals:
- Fully modular architecture with plugin-based importers
- Complete Web UI parity with CLI functionality
- Documented API endpoints for local integrations
- Export system for JSONL / CSV / ML dataset sync
- Full automated test coverage and build pipeline
- Public release of “Goondex ML Core” dataset format
------------------------------------------------------------
3. Research & Experimental Branches
------------------------------------------------------------
ML-Research Branch:
- Embedding fusion experiments (textimage hybrid)
- Visual attribute detection fine-tuning using CLIP variants
- Performance benchmark on local consumer GPUs
Tag-Lab Branch:
- Dynamic tag clustering using sentence-transformers
- Contextual tagging prototype (scene recognition)
- Human-assisted tag correction feedback loop
Web-UI Branch:
- Minimalist grid-based gallery explorer
- Tag filters with real-time search
- RESTful interface backed by FastAPI
------------------------------------------------------------
4. Long-Term Goals
------------------------------------------------------------
- Local inference pipeline fully independent from cloud APIs
- Optional privacy layer for encrypted gallery indexing
- On-device fine-tuning for user-specific preferences
- Extend beyond adult content into broader visual media indexing
- Formalize Goondex Metadata Specification (GMS 1.0) for interoperability
------------------------------------------------------------
5. Development Philosophy
------------------------------------------------------------
- Local-first: all functions must work offline
- Transparent: all data stored in readable YAML/JSON
- Modular: each subsystem must be independently testable
- Ethical: prioritizes privacy and non-exploitative content handling
- Accessible: written with clear documentation and open interfaces
------------------------------------------------------------
End of File
------------------------------------------------------------

180
docs/TAGGING.md Normal file
View File

@ -0,0 +1,180 @@
File: docs/TAGGING.md
Version: v0.3.4
Last updated: November 2025
Maintainer: Leak Technologies
Project: Goondex
------------------------------------------------------------
Goondex Tagging System Documentation
------------------------------------------------------------
Purpose:
Define how Goondex handles tagging, tag inference, YAML dictionaries, and validation.
This document standardizes how tags are generated, stored, and maintained for galleries within the Goondex framework.
------------------------------------------------------------
1. Overview
------------------------------------------------------------
The tagging system in Goondex provides:
- Automatic tag inference based on keywords, metadata, and categories.
- Human-editable tags for user-defined labeling.
- YAML dictionaries for consistent terminology and modular configuration.
- Validation and reporting tools to prevent duplication or conflicting tags.
All tagging logic is implemented in:
src/importer/tag_gallery.py
src/importer/tag_utils.py
src/importer/tag_dictionaries/
------------------------------------------------------------
2. Tag Categories
------------------------------------------------------------
Tags are divided into modular YAML dictionaries for clarity and maintainability.
Each dictionary focuses on a single thematic domain:
tag_dictionaries/
body.yml → physical descriptors (e.g. Blonde, Curvy, Muscular)
acts.yml → sexual acts or positions (e.g. Blowjob, Anal, Doggystyle)
clothing.yml → garments and accessories (e.g. Lingerie, Socks, Latex)
context.yml → settings or environments (e.g. Beach, Office, Shower)
fetish.yml → specific fetish content (e.g. BDSM, Pee Fetish, Bondage)
orientation.yml → sexual orientation or group type (e.g. Straight, Lesbian, Gay)
All dictionaries share a simple keyvalue structure:
"keyword": "TagName"
Example (clothing.yml):
socks: Socks
panties: Panties
lingerie: Lingerie
stockings: Stockings
------------------------------------------------------------
3. Tag Inference Logic
------------------------------------------------------------
Automatic tagging is handled by infer_tags() in tag_gallery.py.
The system scans text data extracted from metadata.json:
- title
- categories
- tags (pre-existing)
- source network and channel
- optional inferred fields
Process:
1. Combine text fields into one lowercase text blob.
2. Search all keywords from every YAML dictionary.
3. For each keyword match, add the corresponding tag.
4. Merge inferred tags with existing manual tags.
5. Save to metadata.json under inferred_tags.
Example:
Input metadata.title → “Busty Blonde Rides Hard”
Detected → ["Busty", "Blonde", "Riding", "Hardcore"]
The result:
"tags": ["Busty", "Blonde", "Riding", "Hardcore"]
------------------------------------------------------------
4. Manual Tagging
------------------------------------------------------------
Users can add or remove tags manually through the CLI.
Examples:
goondex add "<folder>" "Outdoor"
goondex remove "<folder>" "Solo"
goondex add-multi "<folder>" "Amateur, Teen, Shaved"
Manual tags are stored in metadata.json under "tags".
Inferred tags are stored separately under "inferred_tags" to maintain clarity.
------------------------------------------------------------
5. Tag Validation
------------------------------------------------------------
Validation ensures all YAML tag dictionaries remain consistent and free of errors.
Run:
goondex validate-tags
Checks performed:
- Duplicate keywords within a dictionary
- Conflicting or identical values across multiple dictionaries
- Empty entries or malformed YAML
- Case inconsistencies between similar entries
Outputs:
src/importer/reports/tag_validation.json
src/importer/reports/tag_conflicts.txt
CLI Summary Example:
Files loaded: 6
Keywords total: 421
Conflicts: 2
Duplicates: 4
Empty entries: 0
[✓] Validation finished.
------------------------------------------------------------
6. Tag Statistics
------------------------------------------------------------
Generate tag frequency statistics across all galleries:
goondex tag-stats
Outputs:
src/importer/reports/tag_stats.json
src/importer/reports/tag_stats_sorted.txt
CLI displays top tags and usage counts:
1. Teen 42
2. Blonde 37
3. Outdoor 29
4. Lingerie 25
------------------------------------------------------------
7. Known Limitations
------------------------------------------------------------
- Keyword overlap between categories can cause false positives (e.g. “English” being inferred from “British”).
- Contextual interpretation (e.g. “Wet Hair” vs. “Wet”) is not yet implemented.
- Case-insensitive matching may include unintended words (e.g. “Daddy” vs. “daddy issues”).
- YAML entries are static — dynamic NLP inference is planned for v0.5.x.
------------------------------------------------------------
8. Best Practices
------------------------------------------------------------
- Keep YAML entries lowercase on the left-hand keyword.
- Use concise and consistent tag names on the right-hand side.
- Avoid ambiguous single-word tags (e.g. “hot”, “nice”, “pretty”).
- Run goondex validate-tags before each commit.
- Do not edit inferred_tags manually — always refresh via CLI.
- Use add-multi for efficient manual tagging after bulk imports.
------------------------------------------------------------
9. Future Enhancements (v0.4.xv0.5.x)
------------------------------------------------------------
- Implement weighted tagging confidence using NLP models.
- Integrate GroundingDINO + SAM for visual tagging assistance.
- Introduce “tag confidence scores” to help refine inference reliability.
- Develop cross-source tag normalization for multiple site importers.
- Support user-defined alias groups (e.g. “Ass” = “Butt” = “Booty”).
------------------------------------------------------------
10. Developer Notes
------------------------------------------------------------
All tag inference should remain human-readable and reversible.
The YAML system was chosen for transparency and editability.
Tags should serve as both descriptive metadata and ML training features.
Avoid unnecessary expansion — focus on clarity and accuracy over volume.
------------------------------------------------------------
End of File
------------------------------------------------------------