177 lines
6.0 KiB
Markdown
177 lines
6.0 KiB
Markdown
File: docs/README.md
|
|
Version: v0.3.4
|
|
Last updated: November 2025
|
|
Maintainer: Leak Technologies
|
|
Project: Goondex
|
|
|
|
------------------------------------------------------------
|
|
Goondex — PornPics Importer & ML Pipeline
|
|
------------------------------------------------------------
|
|
|
|
A modular, documented gallery importer for PornPics.com, forming the foundation of the Goondex ecosystem.
|
|
Supports importing, tagging, metadata enrichment, and generation of ML-ready datasets for semantic search and classification.
|
|
|
|
------------------------------------------------------------
|
|
1. Project Overview
|
|
------------------------------------------------------------
|
|
|
|
Goondex automates the process of:
|
|
- Downloading and organizing galleries from PornPics.com
|
|
- Generating structured metadata and tag inference
|
|
- Enriching galleries via ThePornDB (TPDB) performer API
|
|
- Building machine-learning datasets and embeddings
|
|
- Enabling semantic, hybrid (text + image) search
|
|
|
|
All operations are handled locally — no cloud dependencies or external databases are required.
|
|
The system is modular, transparent, and designed for research and personal archival use.
|
|
|
|
------------------------------------------------------------
|
|
2. Project Structure
|
|
------------------------------------------------------------
|
|
|
|
src/
|
|
├── importer/ → Core importer logic and CLI tools
|
|
│ ├── cli.py → Unified CLI entrypoint (goondex command)
|
|
│ ├── gallery_importer.py → Gallery parser and downloader
|
|
│ ├── tag_gallery.py → Tag inference and YAML management
|
|
│ ├── reports/ → Auto-generated validation and tag stats
|
|
│ ├── db/ → TPDB performer cache and local databases
|
|
│ ├── secrets/ → Local-only API keys (ignored by Git)
|
|
│ └── tag_dictionaries/ → Modular YAML tag dictionaries
|
|
│
|
|
├── ml/ → Machine learning and semantic search
|
|
│ ├── ml_dataset_builder.py → Builds JSONL dataset for embeddings
|
|
│ ├── ml_embeddings.py → Generates CLIP + text hybrid vectors
|
|
│ ├── ml_dataset_inspector.py → (planned) visual dataset viewer
|
|
│ └── ml_vision_detector.py → (planned) DINO + SAM visual tagging
|
|
│
|
|
├── docs/ → Documentation, changelogs, and brand files
|
|
├── tests/ → Unit and integration testing suite
|
|
└── assets/ → Static samples and test assets
|
|
|
|
------------------------------------------------------------
|
|
3. Environment Setup
|
|
------------------------------------------------------------
|
|
|
|
Create a virtual environment and install dependencies:
|
|
|
|
bash
|
|
python3 -m venv .venv
|
|
source .venv/bin/activate
|
|
pip install -r requirements.txt
|
|
|
|
Set the source path for development:
|
|
|
|
bash
|
|
export PYTHONPATH=src
|
|
|
|
------------------------------------------------------------
|
|
4. Quick Start
|
|
------------------------------------------------------------
|
|
|
|
Import a gallery from PornPics:
|
|
|
|
bash
|
|
goondex import "https://www.pornpics.com/galleries/example-id/"
|
|
|
|
Automatically:
|
|
- Downloads images and metadata
|
|
- Saves to Galleries/<timestamp>_<model>_<title>/
|
|
- Generates metadata.json
|
|
- Runs auto-tagging (refresh-one)
|
|
- Updates the central gallery index
|
|
|
|
------------------------------------------------------------
|
|
5. CLI Overview
|
|
------------------------------------------------------------
|
|
|
|
All commands are run via:
|
|
goondex <command> [args...]
|
|
|
|
Examples:
|
|
goondex refresh-all
|
|
goondex refresh-one "<folder>"
|
|
goondex validate-tags
|
|
goondex tag-stats
|
|
goondex list-tags "<folder>"
|
|
goondex add "<folder>" "TagName"
|
|
goondex source bulk set "PornPics"
|
|
|
|
The CLI automatically detects YAML tag dictionaries and applies them during refresh or import.
|
|
|
|
------------------------------------------------------------
|
|
6. Machine Learning Pipeline
|
|
------------------------------------------------------------
|
|
|
|
Build dataset:
|
|
bash
|
|
python -m ml.ml_dataset_builder
|
|
|
|
Output:
|
|
ML/porndex_dataset.jsonl
|
|
|
|
Each record includes:
|
|
{
|
|
"gallery_id": "...",
|
|
"title": "...",
|
|
"models": ["..."],
|
|
"tags": ["..."],
|
|
"categories": ["..."],
|
|
"image_paths": ["..."]
|
|
}
|
|
|
|
Build embeddings:
|
|
bash
|
|
python -m ml.ml_embeddings build --img-samples 8 --device auto
|
|
|
|
Output:
|
|
ML/embeddings/<gallery_id>.npz
|
|
ML/embeddings_index.jsonl
|
|
|
|
Search:
|
|
bash
|
|
python -m ml.ml_embeddings search "asian redhead solo"
|
|
Modes:
|
|
- semantic (default) — hybrid vector cosine similarity
|
|
- text — text-only search
|
|
- strict — literal keyword matching
|
|
|
|
Verify:
|
|
bash
|
|
python -m ml.ml_embeddings verify
|
|
|
|
------------------------------------------------------------
|
|
7. Development Guidelines
|
|
------------------------------------------------------------
|
|
|
|
- Use descriptive variable names and structured commits
|
|
- Avoid emojis in code and commit messages
|
|
- Always document new features in docs/CHANGELOG.md
|
|
- Keep CLI text synchronized with docs/CLI_USAGE.md
|
|
- Use version tagging for all major commits
|
|
|
|
------------------------------------------------------------
|
|
8. Roadmap Summary
|
|
------------------------------------------------------------
|
|
|
|
Stage Feature Description
|
|
----------- -------------------------------- -----------------------------
|
|
✅ v0.3.x Stable CLI & Tagging Unified CLI and YAML cleanup
|
|
⚙️ v0.4.x ML Embeddings & Dataset Builder Build hybrid vectors for search
|
|
⏳ v0.5.x Visual Intelligence DINO + SAM + attribute detection
|
|
🔜 v0.6.x Local Web UI Lightweight gallery browser
|
|
🚀 v1.0.0 Full Stable Release Plugin importers + visual ML tools
|
|
|
|
------------------------------------------------------------
|
|
9. Licensing
|
|
------------------------------------------------------------
|
|
|
|
License: Research-Use MIT Variant
|
|
Author: Leak Technologies
|
|
Maintainer: Stu Leak
|
|
For personal, non-commercial, and research use only.
|
|
|
|
------------------------------------------------------------
|
|
End of File
|
|
------------------------------------------------------------
|