6.5 KiB
File: docs/TAGGING.md
Version: v0.3.4
Last updated: November 2025
Maintainer: Leak Technologies
Project: Goondex
Goondex Tagging System Documentation
Purpose:
Define how Goondex handles tagging, tag inference, YAML dictionaries, and validation.
This document standardizes how tags are generated, stored, and maintained for galleries within the Goondex framework.
- Overview
The tagging system in Goondex provides:
- Automatic tag inference based on keywords, metadata, and categories.
- Human-editable tags for user-defined labeling.
- YAML dictionaries for consistent terminology and modular configuration.
- Validation and reporting tools to prevent duplication or conflicting tags.
All tagging logic is implemented in:
src/importer/tag_gallery.py
src/importer/tag_utils.py
src/importer/tag_dictionaries/
- Tag Categories
Tags are divided into modular YAML dictionaries for clarity and maintainability.
Each dictionary focuses on a single thematic domain:
tag_dictionaries/
body.yml → physical descriptors (e.g. Blonde, Curvy, Muscular)
acts.yml → sexual acts or positions (e.g. Blowjob, Anal, Doggystyle)
clothing.yml → garments and accessories (e.g. Lingerie, Socks, Latex)
context.yml → settings or environments (e.g. Beach, Office, Shower)
fetish.yml → specific fetish content (e.g. BDSM, Pee Fetish, Bondage)
orientation.yml → sexual orientation or group type (e.g. Straight, Lesbian, Gay)
All dictionaries share a simple key–value structure: "keyword": "TagName"
Example (clothing.yml):
socks: Socks
panties: Panties
lingerie: Lingerie
stockings: Stockings
- Tag Inference Logic
Automatic tagging is handled by infer_tags() in tag_gallery.py.
The system scans text data extracted from metadata.json:
- title
- categories
- tags (pre-existing)
- source network and channel
- optional inferred fields
Process:
- Combine text fields into one lowercase text blob.
- Search all keywords from every YAML dictionary.
- For each keyword match, add the corresponding tag.
- Merge inferred tags with existing manual tags.
- Save to metadata.json under inferred_tags.
Example:
Input metadata.title → “Busty Blonde Rides Hard”
Detected → ["Busty", "Blonde", "Riding", "Hardcore"]
The result: "tags": ["Busty", "Blonde", "Riding", "Hardcore"]
- Manual Tagging
Users can add or remove tags manually through the CLI.
Examples:
goondex add "" "Outdoor"
goondex remove "" "Solo"
goondex add-multi "" "Amateur, Teen, Shaved"
Manual tags are stored in metadata.json under "tags".
Inferred tags are stored separately under "inferred_tags" to maintain clarity.
- Tag Validation
Validation ensures all YAML tag dictionaries remain consistent and free of errors.
Run: goondex validate-tags
Checks performed:
- Duplicate keywords within a dictionary
- Conflicting or identical values across multiple dictionaries
- Empty entries or malformed YAML
- Case inconsistencies between similar entries
Outputs:
src/importer/reports/tag_validation.json
src/importer/reports/tag_conflicts.txt
CLI Summary Example:
Files loaded: 6
Keywords total: 421
Conflicts: 2
Duplicates: 4
Empty entries: 0
[✓] Validation finished.
- Tag Statistics
Generate tag frequency statistics across all galleries: goondex tag-stats
Outputs:
src/importer/reports/tag_stats.json
src/importer/reports/tag_stats_sorted.txt
CLI displays top tags and usage counts:
- Teen 42
- Blonde 37
- Outdoor 29
- Lingerie 25
- Known Limitations
- Keyword overlap between categories can cause false positives (e.g. “English” being inferred from “British”).
- Contextual interpretation (e.g. “Wet Hair” vs. “Wet”) is not yet implemented.
- Case-insensitive matching may include unintended words (e.g. “Daddy” vs. “daddy issues”).
- YAML entries are static — dynamic NLP inference is planned for v0.5.x.
- Best Practices
- Keep YAML entries lowercase on the left-hand keyword.
- Use concise and consistent tag names on the right-hand side.
- Avoid ambiguous single-word tags (e.g. “hot”, “nice”, “pretty”).
- Run goondex validate-tags before each commit.
- Do not edit inferred_tags manually — always refresh via CLI.
- Use add-multi for efficient manual tagging after bulk imports.
- Future Enhancements (v0.4.x–v0.5.x)
- Implement weighted tagging confidence using NLP models.
- Integrate GroundingDINO + SAM for visual tagging assistance.
- Introduce “tag confidence scores” to help refine inference reliability.
- Develop cross-source tag normalization for multiple site importers.
- Support user-defined alias groups (e.g. “Ass” = “Butt” = “Booty”).
- Developer Notes
All tag inference should remain human-readable and reversible.
The YAML system was chosen for transparency and editability.
Tags should serve as both descriptive metadata and ML training features.
Avoid unnecessary expansion — focus on clarity and accuracy over volume.