docs: update changelog for v0.3.2-rebuild and Goondex rename
This commit is contained in:
parent
48592148fc
commit
24c56d2221
325
docs/CHANGELOG.md
Normal file
325
docs/CHANGELOG.md
Normal file
|
|
@ -0,0 +1,325 @@
|
||||||
|
# 📜 Goondex — Full Changelog
|
||||||
|
> **Repository:** Leak Technologies
|
||||||
|
> **Branch:** main
|
||||||
|
> **Version Line:** v0.3.x Development Cycle
|
||||||
|
> _Formerly: Porndex Importer (PornPics Importer Module)_
|
||||||
|
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## [v0.3.2-rebuild] — Repository Cleanup & Stabilization (2025-11-02)
|
||||||
|
|
||||||
|
### ✨ Added
|
||||||
|
- Introduced project-wide `.gitignore` to exclude gallery media and model weights.
|
||||||
|
- Added `VERSION` file (v0.3.2) for synchronized CLI and metadata versioning.
|
||||||
|
- Implemented environment fix for Fish-shell virtualenv activation.
|
||||||
|
- Ensured unified `porndex` CLI entrypoint under `/src/importer/cli.py`.
|
||||||
|
|
||||||
|
### 🧹 Maintenance
|
||||||
|
- Removed redundant and outdated tags (v0.3.0–v0.4.1) from remote.
|
||||||
|
- Normalized repository tree and re-pushed clean 4.6 GiB → base v0.3.2.
|
||||||
|
- Prepared groundwork for `--help` and `--version` CLI arguments.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## [v0.3.0] — Modular Tagging Framework Foundation (2025-10-18)
|
||||||
|
|
||||||
|
### ✨ Added
|
||||||
|
- Introduced **YAML-based Tag Dictionaries** stored under `/src/importer/tagging/` for modular, human-readable tag definitions.
|
||||||
|
- Implemented initial **`refresh-all`** and **`refresh-one`** commands for reapplying tag inference to galleries.
|
||||||
|
- Added **persistent `inferred_tags` field** in `metadata.json` to differentiate between automated and manual tags.
|
||||||
|
- Implemented **automatic source inference** for known networks (e.g., Brazzers, FTV Girls, PornPics).
|
||||||
|
- Enhanced CLI output with colorized progress indicators and summary totals.
|
||||||
|
|
||||||
|
### 🛠 Changed
|
||||||
|
- Refactored `tag_gallery.py` for modular tagging architecture.
|
||||||
|
- Centralized configuration paths to `/src/importer/config/` for easier project-wide access.
|
||||||
|
|
||||||
|
### 🧹 Maintenance
|
||||||
|
- Improved exception handling for missing or malformed tag dictionaries.
|
||||||
|
- Added consistent emoji/logging system across CLI commands.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## [v0.3.1] — CLI Polishing & Dictionary Improvements (2025-10-19)
|
||||||
|
|
||||||
|
### ✨ Added
|
||||||
|
- Introduced **CLI argument parsing** with `argparse` for a unified user interface.
|
||||||
|
- Added `--verbose` flag for detailed debugging output.
|
||||||
|
- Added **metadata validation** to ensure all tag dictionaries contain unique keywords.
|
||||||
|
|
||||||
|
### 🛠 Changed
|
||||||
|
- Adjusted internal path resolution to work from both installed and development environments.
|
||||||
|
- Improved `load_all_tag_maps()` with caching and better error resilience.
|
||||||
|
|
||||||
|
### 🧹 Maintenance
|
||||||
|
- Cleaned duplicate mappings within YAML files.
|
||||||
|
- Improved documentation and inline docstrings throughout importer modules.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## [v0.3.2] — TPDB Bridge Integration (2025-10-21)
|
||||||
|
|
||||||
|
### ✨ Added
|
||||||
|
- Introduced **`tpdb_bridge.py`** for importing performer data from *ThePornDB* API.
|
||||||
|
- Added local **SQLite performer database** under `/src/importer/db/performers.db`.
|
||||||
|
- Added commands:
|
||||||
|
- `fetch` — Import performers in a single batch.
|
||||||
|
- `fill-index` — Continuously pull until a limit is reached.
|
||||||
|
- `enrich` — Fetch and merge extended performer metadata.
|
||||||
|
- `sync-all` — Hybrid incremental fetch + enrich loop.
|
||||||
|
- Introduced **local API key management** using `tpdb_api_key.txt` under `/secrets/`.
|
||||||
|
|
||||||
|
### 🧹 Maintenance
|
||||||
|
- Verified importer against TPDB rate limits and ensured safe error recovery.
|
||||||
|
- Added initial test data exports to `/src/importer/reports/`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## [v0.3.3] — YAML Tag Inference Update (2025-10-20)
|
||||||
|
|
||||||
|
### ✨ Added
|
||||||
|
- Dynamic **YAML tag dictionary loader** for modular tag categories.
|
||||||
|
- Introduced **automatic source inference** for common networks.
|
||||||
|
- Added **`refresh-all`** bulk operation to reapply tag inference globally.
|
||||||
|
|
||||||
|
### 🛠 Changed
|
||||||
|
- Refactored `infer_tags()` to merge results from multiple YAML files dynamically.
|
||||||
|
- Enhanced progress and summary reporting for tag inference.
|
||||||
|
|
||||||
|
### 🧹 Maintenance
|
||||||
|
- Fixed `AttributeError: 'int' object has no attribute 'lower'` when parsing numeric YAML values.
|
||||||
|
- Standardized internal naming conventions.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## [v0.3.4] — Tag Dictionary Validation & Cleanup (2025-10-20)
|
||||||
|
|
||||||
|
### ✨ Added
|
||||||
|
- **`validate-tags`** CLI command for verifying YAML tag dictionaries.
|
||||||
|
- Detects duplicates, empty entries, and conflicting keywords.
|
||||||
|
- Outputs detailed summaries with per-keyword conflict listings.
|
||||||
|
|
||||||
|
### 🛠 Changed
|
||||||
|
- Standardized YAML structure enforcement (consistent key capitalization and layout).
|
||||||
|
- Added human-readable validation summaries.
|
||||||
|
|
||||||
|
### 🧹 Maintenance
|
||||||
|
- General code cleanup and consistent logging system updates.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## [v0.3.5] — Tag Statistics & Unified CLI Update (2025-10-20)
|
||||||
|
|
||||||
|
### ✨ Added
|
||||||
|
- **Tag Statistics System**
|
||||||
|
- Introduced `tag-stats` command to generate frequency analytics across all gallery metadata.
|
||||||
|
- Produces both console summaries and saved reports:
|
||||||
|
- `reports/tag_stats.json` — JSON-formatted tag counts.
|
||||||
|
- `reports/tag_stats_sorted.txt` — human-readable ranked list.
|
||||||
|
- **Unified CLI Interface (`cli.py`)**
|
||||||
|
- Consolidated all tagging and maintenance operations into a single entrypoint:
|
||||||
|
- `refresh-all`, `refresh-one`, `validate-tags`, `tag-stats`, `list`, `list-tags`, `add`, `remove`, `add-multi`, `show-metadata`, `source`
|
||||||
|
- Standardized command syntax and output formatting across all operations.
|
||||||
|
|
||||||
|
### 🛠 Changed
|
||||||
|
- Centralized tag frequency logic into `tag_gallery.py`.
|
||||||
|
- Refactored CLI dispatch system for scalability and better error handling.
|
||||||
|
- Standardized output style (headers, dividers, alignment).
|
||||||
|
|
||||||
|
### 🧹 Maintenance
|
||||||
|
- Automatic creation of `/src/importer/reports/` when missing.
|
||||||
|
- Verified all tag operations across 60+ galleries.
|
||||||
|
- Unified terminology and capitalization across CLI help text and docstrings.
|
||||||
|
|
||||||
|
### 🧭 Next Steps
|
||||||
|
- Add color-coded CLI output for readability.
|
||||||
|
- Implement `--export-csv` flag for `tag-stats` output.
|
||||||
|
- Begin roadmap for **v0.4.0** introducing ML-based tag confidence scoring and category weighting.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## [v0.3.6] — Enrichment Verification & Freshness Tracking (2025-10-26)
|
||||||
|
|
||||||
|
### ✨ Added
|
||||||
|
- **verify-enrichment command**
|
||||||
|
- Scans performer database for missing metadata (e.g., `url`, `last_updated`).
|
||||||
|
- Reports enriched vs incomplete entries, with preview via `--show-missing`.
|
||||||
|
- **Freshness tracking**
|
||||||
|
- Displays oldest and most recent enrichment timestamps.
|
||||||
|
- Warns if data is older than the freshness threshold.
|
||||||
|
- **Automatic TPDB key validation**
|
||||||
|
- Checks for valid API key and provides setup help if missing.
|
||||||
|
|
||||||
|
### 🛠 Changed
|
||||||
|
- Enrichment logic now guarantees `url` and `last_updated` fields for all performers.
|
||||||
|
- Improved emoji-based CLI logs for clarity.
|
||||||
|
- CLI outputs enrichment stats after each batch during `sync-all`.
|
||||||
|
|
||||||
|
### 🧹 Maintenance
|
||||||
|
- Cleanup and refactor of `tpdb_bridge.py` for readability and modular design.
|
||||||
|
- Verified completeness: **5,087 performers enriched** and up to date.
|
||||||
|
- Improved sleep timing and network error recovery during long sync runs.
|
||||||
|
|
||||||
|
### 🧭 Next Steps
|
||||||
|
- Add `--stale-days` CLI flag for user-defined freshness thresholds.
|
||||||
|
- Implement automatic enrichment scheduling via cron or systemd.
|
||||||
|
- Add shortcut alias `porndex-importer verify` for database status checks.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
[v0.3.7] — Scene-Based Enrichment & Channel Auto-Upgrade (2025-10-26)
|
||||||
|
✨ Added
|
||||||
|
|
||||||
|
Scene-based enrichment system
|
||||||
|
|
||||||
|
New flag --use-scenes enables intelligent inference of performer studios/channels using recent scene data from ThePornDB.
|
||||||
|
|
||||||
|
Automatically scans /performers/{id}/scenes for studio, site, or network fields when direct metadata is missing.
|
||||||
|
|
||||||
|
Dynamically upgrades performer entries from “Unknown” to valid channel names (e.g., “Desire Room”, “I Want Clips: Princess Chanel”).
|
||||||
|
|
||||||
|
Enhanced enrichment diagnostics
|
||||||
|
|
||||||
|
--debug-channels now outputs detailed channel inference logs with origin type (e.g., “via scene” or “via performer metadata”).
|
||||||
|
|
||||||
|
Emoji-coded output for improved clarity:
|
||||||
|
|
||||||
|
🎞 Scene-based upgrades
|
||||||
|
|
||||||
|
🎬 Direct metadata
|
||||||
|
|
||||||
|
⚫ Missing channel info
|
||||||
|
|
||||||
|
Progress verification
|
||||||
|
|
||||||
|
verify-enrichment now reports precise completion percentages and lists the most recent 20 upgraded performers.
|
||||||
|
|
||||||
|
🛠 Changed
|
||||||
|
|
||||||
|
Enrichment process now performs automatic in-place upgrades of performer_sources without overwriting other fields.
|
||||||
|
|
||||||
|
Optimized query logic to prioritize unverified performers and handle large datasets efficiently.
|
||||||
|
|
||||||
|
Added fine-grained sleep control between API requests to stay compliant with TPDB rate limits.
|
||||||
|
|
||||||
|
🧹 Maintenance
|
||||||
|
|
||||||
|
Refactored enrichment functions for modularity:
|
||||||
|
|
||||||
|
_fetch_studio_from_scenes() introduced for scene scanning.
|
||||||
|
|
||||||
|
Simplified argument handling and enriched exception tracing.
|
||||||
|
|
||||||
|
Verified enrichment stability across 100 performers with 44% successful channel discovery in live test.
|
||||||
|
|
||||||
|
Improved timestamp consistency in verification logs and upgraded database schema resilience.
|
||||||
|
|
||||||
|
[v0.4.2] — Unified Importer, ML Pipeline, and Semantic Search (2025-10-27)
|
||||||
|
✨ Added
|
||||||
|
|
||||||
|
Unified Importer CLI (porndex-importer)
|
||||||
|
|
||||||
|
Replaces legacy multi-script workflow with a single command entrypoint.
|
||||||
|
|
||||||
|
Introduced import, refresh-all, refresh-one, validate-tags, tag-stats, and source subcommands.
|
||||||
|
|
||||||
|
Includes colorized CLI summaries and consistent emoji headers.
|
||||||
|
|
||||||
|
Machine Learning Dataset Builder
|
||||||
|
|
||||||
|
New module: ml/ml_dataset_builder.py
|
||||||
|
|
||||||
|
Generates structured dataset in ML/porndex_dataset.jsonl from all indexed galleries.
|
||||||
|
|
||||||
|
Each record includes title, models, tags, and image paths for hybrid ML ingestion.
|
||||||
|
|
||||||
|
Embedding Generation Module
|
||||||
|
|
||||||
|
Added ml/ml_embeddings.py to create hybrid text + image embeddings.
|
||||||
|
|
||||||
|
Builds per-gallery NPZ files under ML/embeddings/ and a consolidated embeddings_index.jsonl.
|
||||||
|
|
||||||
|
Supports configurable --img-samples and automatic device detection (--device auto).
|
||||||
|
|
||||||
|
Semantic & Strict Search
|
||||||
|
|
||||||
|
search command supports three modes:
|
||||||
|
|
||||||
|
semantic: CLIP + text hybrid cosine similarity (default)
|
||||||
|
|
||||||
|
text: text-only vector space search
|
||||||
|
|
||||||
|
strict: literal match filtering before vector ranking
|
||||||
|
|
||||||
|
Results show top-ranked galleries, confidence scores, and gallery IDs.
|
||||||
|
|
||||||
|
ML Verification Command
|
||||||
|
|
||||||
|
verify confirms index consistency, embedding count, and file integrity.
|
||||||
|
|
||||||
|
Directory Auto-Creation
|
||||||
|
|
||||||
|
Automatically generates ML/embeddings/ and ML/ if missing.
|
||||||
|
|
||||||
|
🛠 Changed
|
||||||
|
|
||||||
|
Importer Pipeline Refactor
|
||||||
|
|
||||||
|
Moved all CLI handling into src/importer/cli.py.
|
||||||
|
|
||||||
|
Centralized environment setup and config loading.
|
||||||
|
|
||||||
|
Replaced direct Python script calls with porndex-importer entrypoint.
|
||||||
|
|
||||||
|
Tagging System
|
||||||
|
|
||||||
|
Unified YAML dictionary loading for clothing, acts, body, and context.
|
||||||
|
|
||||||
|
Improved tag inference logging and duplicate suppression.
|
||||||
|
|
||||||
|
Output Formatting
|
||||||
|
|
||||||
|
Standardized headers, dividers, and indentation across all CLI commands.
|
||||||
|
|
||||||
|
Added readable time and path indicators for long-running operations.
|
||||||
|
|
||||||
|
🧹 Maintenance
|
||||||
|
|
||||||
|
Verified full ML dataset build across 150 test galleries (100% JSONL completion).
|
||||||
|
|
||||||
|
Added fallback for empty or missing image lists in dataset builder.
|
||||||
|
|
||||||
|
Improved error handling for partial downloads and interrupted imports.
|
||||||
|
|
||||||
|
Streamlined path resolution for consistent operation across dev and installed modes.
|
||||||
|
|
||||||
|
Updated documentation:
|
||||||
|
|
||||||
|
/docs/CLI_USAGE.md rewritten for v0.4.2.
|
||||||
|
|
||||||
|
/README.md modernized with full project tree and ML pipeline overview.
|
||||||
|
|
||||||
|
🧭 Next Steps
|
||||||
|
|
||||||
|
Begin v0.4.3–v0.5.x roadmap:
|
||||||
|
|
||||||
|
Integrate GroundingDINO + GroundedSAM for visual region detection.
|
||||||
|
|
||||||
|
Implement attribute extraction (gender → ethnicity → clothing).
|
||||||
|
|
||||||
|
Build visual verification tool (ml_dataset_inspector.py).
|
||||||
|
|
||||||
|
Add tag-confidence weighting system.
|
||||||
|
|
||||||
|
Extend TPDB bridge to cross-link enriched performer metadata into ML training records.
|
||||||
|
|
||||||
|
🧩 Summary of Current State (as of v0.4.2)
|
||||||
|
|
||||||
|
✅ Fully unified CLI under porndex-importer
|
||||||
|
✅ Stable YAML tagging + validation
|
||||||
|
✅ Complete ML dataset and embedding generation workflow
|
||||||
|
✅ Working hybrid semantic search
|
||||||
|
✅ Verified 150-gallery dataset index
|
||||||
|
|
||||||
|
© 2025 Leak Technologies — Porndex Importer Project
|
||||||
Loading…
Reference in New Issue
Block a user