Stu Leak 16fb407a3c v0.1.0-dev4: Add web frontend with UI component library

- Implement full web interface with Go html/template server
- Add GX component library (buttons, dialogs, tables, forms, etc.)
- Create scene/performer/studio/movie detail and listing pages
- Add Adult Empire scraper for additional metadata sources
- Implement movie support with database schema
- Add import and sync services for data management
- Include comprehensive API and frontend documentation
- Add custom color scheme and responsive layout

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-17 10:47:30 -05:00

9.0 KiB

Raw Blame History

Adult Empire Scraper Integration

Version: v0.1.0-dev4 Last Updated: 2025-11-16

Overview

Goondex now includes a full-featured Adult Empire scraper based on the Stash app's scraping architecture. This allows you to fetch metadata, cover art, and performer information directly from Adult Empire (adultdvdempire.com).

Features

✅ Scene Scraping

Extract scene title, description, release date
Download cover art/thumbnails
Retrieve studio information
Get performer lists
Extract tags/categories
Scene code/SKU
Director information

✅ Performer Scraping

Extract performer name, aliases
Download profile images
Retrieve birthdate, ethnicity, nationality
Physical attributes (height, measurements, hair/eye color)
Biography text

✅ Search Functionality

Search scenes by title
Search performers by name
Get search results with thumbnails

Architecture

The Adult Empire scraper is implemented in /internal/scraper/adultemp/ with the following components:

Files

types.go - Data structures for scraped content
client.go - HTTP client with cookie/session management
xpath.go - XPath parsing utilities for HTML extraction
scraper.go - Main scraper implementation

Components

┌─────────────────┐
│  Scraper API    │  - ScrapeSceneByURL()
│                 │  - ScrapePerformerByURL()
│                 │  - SearchScenesByName()
│                 │  - SearchPerformersByName()
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  HTTP Client    │  - Cookie jar for sessions
│                 │  - Age verification
│                 │  - Auth token support
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  XPath Parser   │  - Extract data from HTML
│                 │  - Parse dates, heights
│                 │  - Clean text content
└─────────────────┘

Usage

Authentication (Optional)

For full access to Adult Empire content, you can set an authentication token:

scraper, err := adultemp.NewScraper()
if err != nil {
    log.Fatal(err)
}

// Optional: Set your Adult Empire session token
scraper.SetAuthToken("your-etoken-here")

Getting your etoken:

Log into adultdvdempire.com
Open browser DevTools (F12)
Go to Application → Cookies → adultdvdempire.com
Copy the value of the etoken cookie

Scrape a Scene by URL

ctx := context.Background()
sceneData, err := scraper.ScrapeSceneByURL(ctx, "https://www.adultdvdempire.com/12345/scene-name")
if err != nil {
    log.Fatal(err)
}

// Convert to Goondex model
scene := scraper.ConvertSceneToModel(sceneData)

// Save to database
// db.Scenes.Create(scene)

Search for Scenes

results, err := scraper.SearchScenesByName(ctx, "scene title")
if err != nil {
    log.Fatal(err)
}

for _, result := range results {
    fmt.Printf("Title: %s\n", result.Title)
    fmt.Printf("URL: %s\n", result.URL)
    fmt.Printf("Image: %s\n", result.Image)
}

Scrape a Performer

performerData, err := scraper.ScrapePerformerByURL(ctx, "https://www.adultdvdempire.com/performer/12345/name")
if err != nil {
    log.Fatal(err)
}

// Convert to Goondex model
performer := scraper.ConvertPerformerToModel(performerData)

Search for Performers

results, err := scraper.SearchPerformersByName(ctx, "performer name")
if err != nil {
    log.Fatal(err)
}

for _, result := range results {
    fmt.Printf("Name: %s\n", result.Title)
    fmt.Printf("URL: %s\n", result.URL)
}

Data Structures

SceneData

type SceneData struct {
    Title       string      // Scene title
    URL         string      // Adult Empire URL
    Date        string      // Release date
    Studio      string      // Studio name
    Image       string      // Cover image URL
    Description string      // Synopsis/description
    Performers  []string    // List of performer names
    Tags        []string    // Categories/tags
    Code        string      // Scene code/SKU
    Director    string      // Director name
}

PerformerData

type PerformerData struct {
    Name         string      // Performer name
    URL          string      // Adult Empire URL
    Image        string      // Profile image URL
    Birthdate    string      // Date of birth
    Ethnicity    string      // Ethnicity
    Country      string      // Country of origin
    Height       string      // Height (converted to cm)
    Measurements string      // Body measurements
    HairColor    string      // Hair color
    EyeColor     string      // Eye color
    Biography    string      // Bio text
    Aliases      []string    // Alternative names
}

XPath Selectors

The scraper uses XPath to extract data from Adult Empire pages. Key selectors include:

Scene Selectors

Title: //h1[@class='title']
Date: //div[@class='release-date']/text()
Studio: //a[contains(@href, '/studio/')]/text()
Image: //div[@class='item-image']//img/@src
Description: //div[@class='synopsis']
Performers: //a[contains(@href, '/performer/')]/text()
Tags: //a[contains(@href, '/category/')]/text()

Performer Selectors

Name: //h1[@class='performer-name']
Image: //div[@class='performer-image']//img/@src
Birthdate: //span[@class='birthdate']/text()
Height: //span[@class='height']/text()
Bio: //div[@class='bio']

Note: Adult Empire may change their HTML structure. If scraping fails, XPath selectors in scraper.go may need updates.

Utilities

Date Parsing

dateStr := ParseDate("Jan 15, 2024")  // Handles various formats

Height Conversion

heightCm := ParseHeight("5'6\"")  // Converts feet/inches to cm (168)

Text Cleaning

cleanedText := CleanText(rawHTML)  // Removes "Show More/Less" and extra whitespace

URL Normalization

fullURL := ExtractURL("/path/to/scene", "https://www.adultdvdempire.com")
// Returns: "https://www.adultdvdempire.com/path/to/scene"

Integration with Goondex

The Adult Empire scraper integrates seamlessly with the existing Goondex architecture:

Scrape data from Adult Empire using the scraper
Convert to Goondex models using converter functions
Save to the database using existing stores
Display in the web UI with cover art and metadata

Example Workflow

// 1. Search for a scene
results, _ := scraper.SearchScenesByName(ctx, "scene name")

// 2. Pick the first result and scrape full details
sceneData, _ := scraper.ScrapeSceneByURL(ctx, results[0].URL)

// 3. Convert to Goondex model
scene := scraper.ConvertSceneToModel(sceneData)

// 4. Save to database
sceneStore := db.NewSceneStore(database)
sceneStore.Create(scene)

// 5. Now it appears in the web UI!

Future Enhancements

Planned improvements for the Adult Empire scraper:

⏳ Bulk Import - Import entire studios or series
⏳ Auto-Update - Periodically refresh metadata
⏳ Image Caching - Download and cache cover art locally
⏳ Duplicate Detection - Avoid importing the same scene twice
⏳ Advanced Search - Filter by studio, date range, tags
⏳ Web UI Integration - Search and import from the dashboard

Troubleshooting

"Failed to parse HTML"

The Adult Empire page structure may have changed
Update XPath selectors in scraper.go

"Request failed: 403 Forbidden"

You may need to set an auth token
Adult Empire may be blocking automated requests
Try setting a valid etoken cookie

"No results found"

Check that the search query is correct
Adult Empire search may have different spelling
Try broader search terms

Scene/Performer data incomplete

Some fields may not be present on all pages
XPath selectors may need adjustment
Check the raw HTML to verify field availability

Comparison with TPDB Scraper

Feature	TPDB	Adult Empire
API	✅ Official JSON API	❌ HTML scraping
Auth	✅ API key	⚠️ Session cookie
Rate Limits	✅ Documented	⚠️ Unknown
Stability	✅ Stable schema	⚠️ May change
Coverage	✅ Comprehensive	✅ Comprehensive
Images	✅ High quality	✅ High quality

Recommendation: Use TPDB as the primary source and Adult Empire as a fallback or supplemental source.

Contributing

To improve Adult Empire scraping:

Update XPath selectors if Adult Empire changes their HTML
Add support for additional fields
Improve date/height parsing
Add more robust error handling

Version History

v0.1.0-dev4 (2025-11-16): Initial Adult Empire scraper implementation
- HTTP client with cookie support
- XPath parsing utilities
- Scene and performer scraping
- Search functionality
- Model conversion utilities

Last Updated: 2025-11-16 Maintainer: Goondex Team

9.0 KiB Raw Blame History