Goondex/docs/CHANGELOG.md

12 KiB
Raw Blame History

📜 Goondex — Full Changelog

Repository: Leak Technologies
Branch: main
Version Line: v0.3.x Development Cycle
Formerly: Porndex Importer (PornPics Importer Module)


[v0.3.3] — Stable CLI Alias & Import Path Fix (2025-11-06)

Added

  • Introduced unified goondex CLI alias, now functional across Fish, Bash, and Zsh shells.
  • Added --help and --version flags with consistent colorized output.
  • Standardized usage and examples block for user clarity.

🛠 Changed

  • Refactored all internal imports to use absolute src.importer.* paths for compatibility.
  • Updated gallery_importer.py to call src.importer.tag_gallery for subprocess calls.
  • Simplified alias setup scripts under /src/utils/install_alias.sh and /src/utils/install_alias.fish.

🧹 Maintenance

  • Rebuilt virtual environment (.venv) and dependency tree under Python 3.13.
  • Verified clean CLI operation with goondex --version and goondex --help.
  • Confirmed consistent behavior across development and installed modes.

[v0.3.2-rebuild] — Repository Cleanup & Stabilization (2025-11-02)

Added

  • Introduced project-wide .gitignore to exclude gallery media and model weights.
  • Added VERSION file (v0.3.2) for synchronized CLI and metadata versioning.
  • Implemented environment fix for Fish-shell virtualenv activation.
  • Ensured unified porndex CLI entrypoint under /src/importer/cli.py.

🧹 Maintenance

  • Removed redundant and outdated tags (v0.3.0v0.4.1) from remote.
  • Normalized repository tree and re-pushed clean 4.6 GiB → base v0.3.2.
  • Prepared groundwork for --help and --version CLI arguments.

[v0.3.0] — Modular Tagging Framework Foundation (2025-10-18)

Added

  • Introduced YAML-based Tag Dictionaries stored under /src/importer/tagging/ for modular, human-readable tag definitions.
  • Implemented initial refresh-all and refresh-one commands for reapplying tag inference to galleries.
  • Added persistent inferred_tags field in metadata.json to differentiate between automated and manual tags.
  • Implemented automatic source inference for known networks (e.g., Brazzers, FTV Girls, PornPics).
  • Enhanced CLI output with colorized progress indicators and summary totals.

🛠 Changed

  • Refactored tag_gallery.py for modular tagging architecture.
  • Centralized configuration paths to /src/importer/config/ for easier project-wide access.

🧹 Maintenance

  • Improved exception handling for missing or malformed tag dictionaries.
  • Added consistent emoji/logging system across CLI commands.

[v0.3.1] — CLI Polishing & Dictionary Improvements (2025-10-19)

Added

  • Introduced CLI argument parsing with argparse for a unified user interface.
  • Added --verbose flag for detailed debugging output.
  • Added metadata validation to ensure all tag dictionaries contain unique keywords.

🛠 Changed

  • Adjusted internal path resolution to work from both installed and development environments.
  • Improved load_all_tag_maps() with caching and better error resilience.

🧹 Maintenance

  • Cleaned duplicate mappings within YAML files.
  • Improved documentation and inline docstrings throughout importer modules.

[v0.3.2] — TPDB Bridge Integration (2025-10-21)

Added

  • Introduced tpdb_bridge.py for importing performer data from ThePornDB API.
  • Added local SQLite performer database under /src/importer/db/performers.db.
  • Added commands:
    • fetch — Import performers in a single batch.
    • fill-index — Continuously pull until a limit is reached.
    • enrich — Fetch and merge extended performer metadata.
    • sync-all — Hybrid incremental fetch + enrich loop.
  • Introduced local API key management using tpdb_api_key.txt under /secrets/.

🧹 Maintenance

  • Verified importer against TPDB rate limits and ensured safe error recovery.
  • Added initial test data exports to /src/importer/reports/.

[v0.3.3] — YAML Tag Inference Update (2025-10-20)

Added

  • Dynamic YAML tag dictionary loader for modular tag categories.
  • Introduced automatic source inference for common networks.
  • Added refresh-all bulk operation to reapply tag inference globally.

🛠 Changed

  • Refactored infer_tags() to merge results from multiple YAML files dynamically.
  • Enhanced progress and summary reporting for tag inference.

🧹 Maintenance

  • Fixed AttributeError: 'int' object has no attribute 'lower' when parsing numeric YAML values.
  • Standardized internal naming conventions.

[v0.3.4] — Tag Dictionary Validation & Cleanup (2025-10-20)

Added

  • validate-tags CLI command for verifying YAML tag dictionaries.
    • Detects duplicates, empty entries, and conflicting keywords.
    • Outputs detailed summaries with per-keyword conflict listings.

🛠 Changed

  • Standardized YAML structure enforcement (consistent key capitalization and layout).
  • Added human-readable validation summaries.

🧹 Maintenance

  • General code cleanup and consistent logging system updates.

[v0.3.5] — Tag Statistics & Unified CLI Update (2025-10-20)

Added

  • Tag Statistics System
    • Introduced tag-stats command to generate frequency analytics across all gallery metadata.
    • Produces both console summaries and saved reports:
      • reports/tag_stats.json — JSON-formatted tag counts.
      • reports/tag_stats_sorted.txt — human-readable ranked list.
  • Unified CLI Interface (cli.py)
    • Consolidated all tagging and maintenance operations into a single entrypoint:
      • refresh-all, refresh-one, validate-tags, tag-stats, list, list-tags, add, remove, add-multi, show-metadata, source
    • Standardized command syntax and output formatting across all operations.

🛠 Changed

  • Centralized tag frequency logic into tag_gallery.py.
  • Refactored CLI dispatch system for scalability and better error handling.
  • Standardized output style (headers, dividers, alignment).

🧹 Maintenance

  • Automatic creation of /src/importer/reports/ when missing.
  • Verified all tag operations across 60+ galleries.
  • Unified terminology and capitalization across CLI help text and docstrings.

🧭 Next Steps

  • Add color-coded CLI output for readability.
  • Implement --export-csv flag for tag-stats output.
  • Begin roadmap for v0.4.0 introducing ML-based tag confidence scoring and category weighting.

[v0.3.6] — Enrichment Verification & Freshness Tracking (2025-10-26)

Added

  • verify-enrichment command
    • Scans performer database for missing metadata (e.g., url, last_updated).
    • Reports enriched vs incomplete entries, with preview via --show-missing.
  • Freshness tracking
    • Displays oldest and most recent enrichment timestamps.
    • Warns if data is older than the freshness threshold.
  • Automatic TPDB key validation
    • Checks for valid API key and provides setup help if missing.

🛠 Changed

  • Enrichment logic now guarantees url and last_updated fields for all performers.
  • Improved emoji-based CLI logs for clarity.
  • CLI outputs enrichment stats after each batch during sync-all.

🧹 Maintenance

  • Cleanup and refactor of tpdb_bridge.py for readability and modular design.
  • Verified completeness: 5,087 performers enriched and up to date.
  • Improved sleep timing and network error recovery during long sync runs.

🧭 Next Steps

  • Add --stale-days CLI flag for user-defined freshness thresholds.
  • Implement automatic enrichment scheduling via cron or systemd.
  • Add shortcut alias porndex-importer verify for database status checks.

[v0.3.7] — Scene-Based Enrichment & Channel Auto-Upgrade (2025-10-26) Added

Scene-based enrichment system

New flag --use-scenes enables intelligent inference of performer studios/channels using recent scene data from ThePornDB.

Automatically scans /performers/{id}/scenes for studio, site, or network fields when direct metadata is missing.

Dynamically upgrades performer entries from “Unknown” to valid channel names (e.g., “Desire Room”, “I Want Clips: Princess Chanel”).

Enhanced enrichment diagnostics

--debug-channels now outputs detailed channel inference logs with origin type (e.g., “via scene” or “via performer metadata”).

Emoji-coded output for improved clarity:

🎞 Scene-based upgrades

🎬 Direct metadata

Missing channel info

Progress verification

verify-enrichment now reports precise completion percentages and lists the most recent 20 upgraded performers.

🛠 Changed

Enrichment process now performs automatic in-place upgrades of performer_sources without overwriting other fields.

Optimized query logic to prioritize unverified performers and handle large datasets efficiently.

Added fine-grained sleep control between API requests to stay compliant with TPDB rate limits.

🧹 Maintenance

Refactored enrichment functions for modularity:

_fetch_studio_from_scenes() introduced for scene scanning.

Simplified argument handling and enriched exception tracing.

Verified enrichment stability across 100 performers with 44% successful channel discovery in live test.

Improved timestamp consistency in verification logs and upgraded database schema resilience.

[v0.4.2] — Unified Importer, ML Pipeline, and Semantic Search (2025-10-27) Added

Unified Importer CLI (porndex-importer)

Replaces legacy multi-script workflow with a single command entrypoint.

Introduced import, refresh-all, refresh-one, validate-tags, tag-stats, and source subcommands.

Includes colorized CLI summaries and consistent emoji headers.

Machine Learning Dataset Builder

New module: ml/ml_dataset_builder.py

Generates structured dataset in ML/porndex_dataset.jsonl from all indexed galleries.

Each record includes title, models, tags, and image paths for hybrid ML ingestion.

Embedding Generation Module

Added ml/ml_embeddings.py to create hybrid text + image embeddings.

Builds per-gallery NPZ files under ML/embeddings/ and a consolidated embeddings_index.jsonl.

Supports configurable --img-samples and automatic device detection (--device auto).

Semantic & Strict Search

search command supports three modes:

semantic: CLIP + text hybrid cosine similarity (default)

text: text-only vector space search

strict: literal match filtering before vector ranking

Results show top-ranked galleries, confidence scores, and gallery IDs.

ML Verification Command

verify confirms index consistency, embedding count, and file integrity.

Directory Auto-Creation

Automatically generates ML/embeddings/ and ML/ if missing.

🛠 Changed

Importer Pipeline Refactor

Moved all CLI handling into src/importer/cli.py.

Centralized environment setup and config loading.

Replaced direct Python script calls with porndex-importer entrypoint.

Tagging System

Unified YAML dictionary loading for clothing, acts, body, and context.

Improved tag inference logging and duplicate suppression.

Output Formatting

Standardized headers, dividers, and indentation across all CLI commands.

Added readable time and path indicators for long-running operations.

🧹 Maintenance

Verified full ML dataset build across 150 test galleries (100% JSONL completion).

Added fallback for empty or missing image lists in dataset builder.

Improved error handling for partial downloads and interrupted imports.

Streamlined path resolution for consistent operation across dev and installed modes.

Updated documentation:

/docs/CLI_USAGE.md rewritten for v0.4.2.

/README.md modernized with full project tree and ML pipeline overview.

🧭 Next Steps

Begin v0.4.3v0.5.x roadmap:

Integrate GroundingDINO + GroundedSAM for visual region detection.

Implement attribute extraction (gender → ethnicity → clothing).

Build visual verification tool (ml_dataset_inspector.py).

Add tag-confidence weighting system.

Extend TPDB bridge to cross-link enriched performer metadata into ML training records.

🧩 Summary of Current State (as of v0.4.2)

Fully unified CLI under porndex-importer Stable YAML tagging + validation Complete ML dataset and embedding generation workflow Working hybrid semantic search Verified 150-gallery dataset index

© 2025 Leak Technologies — Porndex Importer Project