Goondex/docs/TAGGING.md

6.5 KiB
Raw Blame History

File: docs/TAGGING.md
Version: v0.3.4
Last updated: November 2025
Maintainer: Leak Technologies
Project: Goondex


Goondex Tagging System Documentation

Purpose:
Define how Goondex handles tagging, tag inference, YAML dictionaries, and validation.
This document standardizes how tags are generated, stored, and maintained for galleries within the Goondex framework.


  1. Overview

The tagging system in Goondex provides:

  • Automatic tag inference based on keywords, metadata, and categories.
  • Human-editable tags for user-defined labeling.
  • YAML dictionaries for consistent terminology and modular configuration.
  • Validation and reporting tools to prevent duplication or conflicting tags.

All tagging logic is implemented in: src/importer/tag_gallery.py
src/importer/tag_utils.py
src/importer/tag_dictionaries/


  1. Tag Categories

Tags are divided into modular YAML dictionaries for clarity and maintainability.
Each dictionary focuses on a single thematic domain:

tag_dictionaries/ body.yml → physical descriptors (e.g. Blonde, Curvy, Muscular)
acts.yml → sexual acts or positions (e.g. Blowjob, Anal, Doggystyle)
clothing.yml → garments and accessories (e.g. Lingerie, Socks, Latex)
context.yml → settings or environments (e.g. Beach, Office, Shower)
fetish.yml → specific fetish content (e.g. BDSM, Pee Fetish, Bondage)
orientation.yml → sexual orientation or group type (e.g. Straight, Lesbian, Gay)

All dictionaries share a simple keyvalue structure: "keyword": "TagName"

Example (clothing.yml): socks: Socks
panties: Panties
lingerie: Lingerie
stockings: Stockings


  1. Tag Inference Logic

Automatic tagging is handled by infer_tags() in tag_gallery.py.
The system scans text data extracted from metadata.json:

  • title
  • categories
  • tags (pre-existing)
  • source network and channel
  • optional inferred fields

Process:

  1. Combine text fields into one lowercase text blob.
  2. Search all keywords from every YAML dictionary.
  3. For each keyword match, add the corresponding tag.
  4. Merge inferred tags with existing manual tags.
  5. Save to metadata.json under inferred_tags.

Example: Input metadata.title → “Busty Blonde Rides Hard”
Detected → ["Busty", "Blonde", "Riding", "Hardcore"]

The result: "tags": ["Busty", "Blonde", "Riding", "Hardcore"]


  1. Manual Tagging

Users can add or remove tags manually through the CLI.

Examples: goondex add "" "Outdoor"
goondex remove "" "Solo"
goondex add-multi "" "Amateur, Teen, Shaved"

Manual tags are stored in metadata.json under "tags".
Inferred tags are stored separately under "inferred_tags" to maintain clarity.


  1. Tag Validation

Validation ensures all YAML tag dictionaries remain consistent and free of errors.

Run: goondex validate-tags

Checks performed:

  • Duplicate keywords within a dictionary
  • Conflicting or identical values across multiple dictionaries
  • Empty entries or malformed YAML
  • Case inconsistencies between similar entries

Outputs: src/importer/reports/tag_validation.json
src/importer/reports/tag_conflicts.txt

CLI Summary Example: Files loaded: 6
Keywords total: 421
Conflicts: 2
Duplicates: 4
Empty entries: 0
[✓] Validation finished.


  1. Tag Statistics

Generate tag frequency statistics across all galleries: goondex tag-stats

Outputs: src/importer/reports/tag_stats.json
src/importer/reports/tag_stats_sorted.txt

CLI displays top tags and usage counts:

  1. Teen 42
  2. Blonde 37
  3. Outdoor 29
  4. Lingerie 25

  1. Known Limitations

  • Keyword overlap between categories can cause false positives (e.g. “English” being inferred from “British”).
  • Contextual interpretation (e.g. “Wet Hair” vs. “Wet”) is not yet implemented.
  • Case-insensitive matching may include unintended words (e.g. “Daddy” vs. “daddy issues”).
  • YAML entries are static — dynamic NLP inference is planned for v0.5.x.

  1. Best Practices

  • Keep YAML entries lowercase on the left-hand keyword.
  • Use concise and consistent tag names on the right-hand side.
  • Avoid ambiguous single-word tags (e.g. “hot”, “nice”, “pretty”).
  • Run goondex validate-tags before each commit.
  • Do not edit inferred_tags manually — always refresh via CLI.
  • Use add-multi for efficient manual tagging after bulk imports.

  1. Future Enhancements (v0.4.xv0.5.x)

  • Implement weighted tagging confidence using NLP models.
  • Integrate GroundingDINO + SAM for visual tagging assistance.
  • Introduce “tag confidence scores” to help refine inference reliability.
  • Develop cross-source tag normalization for multiple site importers.
  • Support user-defined alias groups (e.g. “Ass” = “Butt” = “Booty”).

  1. Developer Notes

All tag inference should remain human-readable and reversible.
The YAML system was chosen for transparency and editability.
Tags should serve as both descriptive metadata and ML training features.
Avoid unnecessary expansion — focus on clarity and accuracy over volume.


End of File