6.1 KiB
File: docs/ARCHITECTURE.md
Version: v0.3.4
Last updated: November 2025
Maintainer: Leak Technologies
Project: Goondex
Goondex System Architecture Overview
Purpose:
This document outlines the internal structure, key modules, and data flow of Goondex. It defines how importer, tagging, and metadata systems interact, ensuring consistent development practices and clear separation of responsibilities across the codebase.
High-Level Overview
Goondex is a modular image importer and metadata indexer designed primarily for PornPics galleries.
Its long-term goal is to evolve into a general-purpose adult media cataloguing and tagging framework.
Core functions:
- Gallery importing and metadata storage
- Automated tag inference from titles and descriptions
- Performer and source enrichment (via ThePornDB)
- Semantic search and ML dataset generation (planned v0.4.x)
- CLI-driven operations with simple developer alias support
Primary Directories
src/importer/
cli.py - Main command entrypoint for all user-facing operations.
gallery_importer.py - Handles gallery downloading, metadata management, and index updates.
tag_gallery.py - Manages tagging logic, inference, and validation.
tag_utils.py - Shared utilities for YAML parsing and tag validation.
index_builder.py - Rebuilds gallery index files after import or refresh.
fetch_gallery_metadata.py - Scrapes PornPics galleries for metadata and image URLs.
tpdb_bridge.py - Integrates with ThePornDB API for performer enrichment.
config/ - Contains YAML and JSON config templates for environment paths.
reports/ - Stores generated statistics, tag summaries, and validation logs.
tag_dictionaries/ - Modular YAML tag dictionaries (body, acts, clothing, context).
docs/
BRANDING.md - Defines visual and branding identity.
ARCHITECTURE.md - This file.
CHANGELOG.md - Version history and release notes.
ROADMAP.md - Planned features and milestones.
Core Workflow
-
Import
- User runs
goondex import <url>orpython -m src.importer.cli import <url>. - The system fetches gallery metadata and image URLs.
- Metadata is saved to disk under Galleries/_<model_name>/.
- Images are downloaded using threaded requests.
- User runs
-
Tagging
- On import completion, automatic tagging runs via tag_gallery.py.
- Inferred tags are based on YAML dictionaries and keyword matches.
- Users can adjust tags manually or re-run
goondex refresh-one.
-
Indexing
- After import, index_builder.py rebuilds a global index for CLI listing.
- Index entries include title, models, source, and folder references.
-
Enrichment (optional)
- ThePornDB bridge pulls performer metadata and merges it with local entries.
- Data is stored in a lightweight SQLite database for reusability.
-
Validation
- Tag dictionaries are validated via
goondex validate-tags. - Reports are saved to /src/importer/reports/ for long-term tracking.
- Tag dictionaries are validated via
Data Structure
Each gallery folder includes:
metadata.json - Core descriptive data and tags.
failed_downloads.json (optional) - Log of skipped or failed images.
inferred_tags - Automatically detected tags stored separately from user edits.
source_url - Original import link for refresh operations.
Metadata fields (core schema):
title
models
categories
tags
image_urls
source_url
views
rating
last_refreshed
CLI Commands (as of v0.3.4)
import - Import new gallery from PornPics.
refresh-all - Refresh tags for all galleries.
refresh-one - Refresh tags for one gallery.
validate-tags - Validate YAML tag dictionaries.
tag-stats - Generate tag frequency report.
list - List all galleries.
list-tags - List tags for one gallery.
add - Add a tag manually.
remove - Remove a tag manually.
add-multi - Add multiple tags at once.
show-metadata - Display metadata.json contents.
source set - Set or bulk-set gallery source.
Planned Evolution
v0.4.x
- Introduce ML dataset builder for hybrid text and image embeddings.
- Implement semantic search with CLIP model integration.
- Support multiple site importers beyond PornPics.
- Add confidence scoring for auto-tagging accuracy.
v0.5.x
- Implement Web UI with search, tag filters, and visual gallery grid.
- Introduce local model inference (GroundingDINO + SAM).
- Build API layer for remote clients.
Design Philosophy
Keep it modular, transparent, and locally maintainable.
Every import should leave a clean, readable data trail.
Avoid hard dependencies — keep Python standard library primary, with only essential external libraries (requests, tqdm, yaml).
All scripts must remain executable from both CLI and within the src context.
Maintain clean commit history and clearly versioned documentation (as done with this file).