Goondex/docs/ADULT_EMPIRE_SCRAPER.md
Stu Leak 16fb407a3c v0.1.0-dev4: Add web frontend with UI component library
- Implement full web interface with Go html/template server
- Add GX component library (buttons, dialogs, tables, forms, etc.)
- Create scene/performer/studio/movie detail and listing pages
- Add Adult Empire scraper for additional metadata sources
- Implement movie support with database schema
- Add import and sync services for data management
- Include comprehensive API and frontend documentation
- Add custom color scheme and responsive layout

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-17 10:47:30 -05:00

9.0 KiB

Adult Empire Scraper Integration

Version: v0.1.0-dev4 Last Updated: 2025-11-16

Overview

Goondex now includes a full-featured Adult Empire scraper based on the Stash app's scraping architecture. This allows you to fetch metadata, cover art, and performer information directly from Adult Empire (adultdvdempire.com).

Features

Scene Scraping

  • Extract scene title, description, release date
  • Download cover art/thumbnails
  • Retrieve studio information
  • Get performer lists
  • Extract tags/categories
  • Scene code/SKU
  • Director information

Performer Scraping

  • Extract performer name, aliases
  • Download profile images
  • Retrieve birthdate, ethnicity, nationality
  • Physical attributes (height, measurements, hair/eye color)
  • Biography text

Search Functionality

  • Search scenes by title
  • Search performers by name
  • Get search results with thumbnails

Architecture

The Adult Empire scraper is implemented in /internal/scraper/adultemp/ with the following components:

Files

  1. types.go - Data structures for scraped content
  2. client.go - HTTP client with cookie/session management
  3. xpath.go - XPath parsing utilities for HTML extraction
  4. scraper.go - Main scraper implementation

Components

┌─────────────────┐
│  Scraper API    │  - ScrapeSceneByURL()
│                 │  - ScrapePerformerByURL()
│                 │  - SearchScenesByName()
│                 │  - SearchPerformersByName()
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  HTTP Client    │  - Cookie jar for sessions
│                 │  - Age verification
│                 │  - Auth token support
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  XPath Parser   │  - Extract data from HTML
│                 │  - Parse dates, heights
│                 │  - Clean text content
└─────────────────┘

Usage

Authentication (Optional)

For full access to Adult Empire content, you can set an authentication token:

scraper, err := adultemp.NewScraper()
if err != nil {
    log.Fatal(err)
}

// Optional: Set your Adult Empire session token
scraper.SetAuthToken("your-etoken-here")

Getting your etoken:

  1. Log into adultdvdempire.com
  2. Open browser DevTools (F12)
  3. Go to Application → Cookies → adultdvdempire.com
  4. Copy the value of the etoken cookie

Scrape a Scene by URL

ctx := context.Background()
sceneData, err := scraper.ScrapeSceneByURL(ctx, "https://www.adultdvdempire.com/12345/scene-name")
if err != nil {
    log.Fatal(err)
}

// Convert to Goondex model
scene := scraper.ConvertSceneToModel(sceneData)

// Save to database
// db.Scenes.Create(scene)

Search for Scenes

results, err := scraper.SearchScenesByName(ctx, "scene title")
if err != nil {
    log.Fatal(err)
}

for _, result := range results {
    fmt.Printf("Title: %s\n", result.Title)
    fmt.Printf("URL: %s\n", result.URL)
    fmt.Printf("Image: %s\n", result.Image)
}

Scrape a Performer

performerData, err := scraper.ScrapePerformerByURL(ctx, "https://www.adultdvdempire.com/performer/12345/name")
if err != nil {
    log.Fatal(err)
}

// Convert to Goondex model
performer := scraper.ConvertPerformerToModel(performerData)

Search for Performers

results, err := scraper.SearchPerformersByName(ctx, "performer name")
if err != nil {
    log.Fatal(err)
}

for _, result := range results {
    fmt.Printf("Name: %s\n", result.Title)
    fmt.Printf("URL: %s\n", result.URL)
}

Data Structures

SceneData

type SceneData struct {
    Title       string      // Scene title
    URL         string      // Adult Empire URL
    Date        string      // Release date
    Studio      string      // Studio name
    Image       string      // Cover image URL
    Description string      // Synopsis/description
    Performers  []string    // List of performer names
    Tags        []string    // Categories/tags
    Code        string      // Scene code/SKU
    Director    string      // Director name
}

PerformerData

type PerformerData struct {
    Name         string      // Performer name
    URL          string      // Adult Empire URL
    Image        string      // Profile image URL
    Birthdate    string      // Date of birth
    Ethnicity    string      // Ethnicity
    Country      string      // Country of origin
    Height       string      // Height (converted to cm)
    Measurements string      // Body measurements
    HairColor    string      // Hair color
    EyeColor     string      // Eye color
    Biography    string      // Bio text
    Aliases      []string    // Alternative names
}

XPath Selectors

The scraper uses XPath to extract data from Adult Empire pages. Key selectors include:

Scene Selectors

  • Title: //h1[@class='title']
  • Date: //div[@class='release-date']/text()
  • Studio: //a[contains(@href, '/studio/')]/text()
  • Image: //div[@class='item-image']//img/@src
  • Description: //div[@class='synopsis']
  • Performers: //a[contains(@href, '/performer/')]/text()
  • Tags: //a[contains(@href, '/category/')]/text()

Performer Selectors

  • Name: //h1[@class='performer-name']
  • Image: //div[@class='performer-image']//img/@src
  • Birthdate: //span[@class='birthdate']/text()
  • Height: //span[@class='height']/text()
  • Bio: //div[@class='bio']

Note: Adult Empire may change their HTML structure. If scraping fails, XPath selectors in scraper.go may need updates.

Utilities

Date Parsing

dateStr := ParseDate("Jan 15, 2024")  // Handles various formats

Height Conversion

heightCm := ParseHeight("5'6\"")  // Converts feet/inches to cm (168)

Text Cleaning

cleanedText := CleanText(rawHTML)  // Removes "Show More/Less" and extra whitespace

URL Normalization

fullURL := ExtractURL("/path/to/scene", "https://www.adultdvdempire.com")
// Returns: "https://www.adultdvdempire.com/path/to/scene"

Integration with Goondex

The Adult Empire scraper integrates seamlessly with the existing Goondex architecture:

  1. Scrape data from Adult Empire using the scraper
  2. Convert to Goondex models using converter functions
  3. Save to the database using existing stores
  4. Display in the web UI with cover art and metadata

Example Workflow

// 1. Search for a scene
results, _ := scraper.SearchScenesByName(ctx, "scene name")

// 2. Pick the first result and scrape full details
sceneData, _ := scraper.ScrapeSceneByURL(ctx, results[0].URL)

// 3. Convert to Goondex model
scene := scraper.ConvertSceneToModel(sceneData)

// 4. Save to database
sceneStore := db.NewSceneStore(database)
sceneStore.Create(scene)

// 5. Now it appears in the web UI!

Future Enhancements

Planned improvements for the Adult Empire scraper:

  • Bulk Import - Import entire studios or series
  • Auto-Update - Periodically refresh metadata
  • Image Caching - Download and cache cover art locally
  • Duplicate Detection - Avoid importing the same scene twice
  • Advanced Search - Filter by studio, date range, tags
  • Web UI Integration - Search and import from the dashboard

Troubleshooting

"Failed to parse HTML"

  • The Adult Empire page structure may have changed
  • Update XPath selectors in scraper.go

"Request failed: 403 Forbidden"

  • You may need to set an auth token
  • Adult Empire may be blocking automated requests
  • Try setting a valid etoken cookie

"No results found"

  • Check that the search query is correct
  • Adult Empire search may have different spelling
  • Try broader search terms

Scene/Performer data incomplete

  • Some fields may not be present on all pages
  • XPath selectors may need adjustment
  • Check the raw HTML to verify field availability

Comparison with TPDB Scraper

Feature TPDB Adult Empire
API Official JSON API HTML scraping
Auth API key ⚠️ Session cookie
Rate Limits Documented ⚠️ Unknown
Stability Stable schema ⚠️ May change
Coverage Comprehensive Comprehensive
Images High quality High quality

Recommendation: Use TPDB as the primary source and Adult Empire as a fallback or supplemental source.

Contributing

To improve Adult Empire scraping:

  1. Update XPath selectors if Adult Empire changes their HTML
  2. Add support for additional fields
  3. Improve date/height parsing
  4. Add more robust error handling

Version History

  • v0.1.0-dev4 (2025-11-16): Initial Adult Empire scraper implementation
    • HTTP client with cookie support
    • XPath parsing utilities
    • Scene and performer scraping
    • Search functionality
    • Model conversion utilities

Last Updated: 2025-11-16 Maintainer: Goondex Team