# Adult Empire Scraper Integration **Version**: v0.1.0-dev5 **Last Updated**: 2025-11-17 ## Overview Goondex now includes a full-featured Adult Empire scraper based on the Stash app's scraping architecture. This allows you to fetch metadata, cover art, and performer information directly from Adult Empire (adultdvdempire.com). ## Features ### ✅ Scene Scraping - Extract scene title, description, release date - Download cover art/thumbnails - Retrieve studio information - Get performer lists - Extract tags/categories - Scene code/SKU - Director information ### ✅ Performer Scraping - Extract performer name, aliases - Download profile images - Retrieve birthdate, ethnicity, nationality - Physical attributes (height, measurements, hair/eye color) - Biography text ### ✅ Search Functionality - Search scenes by title - Search performers by name - Get search results with thumbnails ## Architecture The Adult Empire scraper is implemented in `/internal/scraper/adultemp/` with the following components: ### Files 1. **`types.go`** - Data structures for scraped content 2. **`client.go`** - HTTP client with cookie/session management 3. **`xpath.go`** - XPath parsing utilities for HTML extraction 4. **`scraper.go`** - Main scraper implementation ### Components ``` ┌─────────────────┐ │ Scraper API │ - ScrapeSceneByURL() │ │ - ScrapePerformerByURL() │ │ - SearchScenesByName() │ │ - SearchPerformersByName() └────────┬────────┘ │ ▼ ┌─────────────────┐ │ HTTP Client │ - Cookie jar for sessions │ │ - Age verification │ │ - Auth token support └────────┬────────┘ │ ▼ ┌─────────────────┐ │ XPath Parser │ - Extract data from HTML │ │ - Parse dates, heights │ │ - Clean text content └─────────────────┘ ``` ## Usage ### Authentication (Optional) For full access to Adult Empire content, you can set an authentication token: ```go scraper, err := adultemp.NewScraper() if err != nil { log.Fatal(err) } // Optional: Set your Adult Empire session token scraper.SetAuthToken("your-etoken-here") ``` **Getting your etoken:** 1. Log into adultdvdempire.com 2. Open browser DevTools (F12) 3. Go to Application → Cookies → adultdvdempire.com 4. Copy the value of the `etoken` cookie ### Scrape a Scene by URL ```go ctx := context.Background() sceneData, err := scraper.ScrapeSceneByURL(ctx, "https://www.adultdvdempire.com/12345/scene-name") if err != nil { log.Fatal(err) } // Convert to Goondex model scene := scraper.ConvertSceneToModel(sceneData) // Save to database // db.Scenes.Create(scene) ``` ### Search for Scenes ```go results, err := scraper.SearchScenesByName(ctx, "scene title") if err != nil { log.Fatal(err) } for _, result := range results { fmt.Printf("Title: %s\n", result.Title) fmt.Printf("URL: %s\n", result.URL) fmt.Printf("Image: %s\n", result.Image) } ``` ### Scrape a Performer ```go performerData, err := scraper.ScrapePerformerByURL(ctx, "https://www.adultdvdempire.com/performer/12345/name") if err != nil { log.Fatal(err) } // Convert to Goondex model performer := scraper.ConvertPerformerToModel(performerData) ``` ### Search for Performers ```go results, err := scraper.SearchPerformersByName(ctx, "performer name") if err != nil { log.Fatal(err) } for _, result := range results { fmt.Printf("Name: %s\n", result.Title) fmt.Printf("URL: %s\n", result.URL) } ``` ## Data Structures ### SceneData ```go type SceneData struct { Title string // Scene title URL string // Adult Empire URL Date string // Release date Studio string // Studio name Image string // Cover image URL Description string // Synopsis/description Performers []string // List of performer names Tags []string // Categories/tags Code string // Scene code/SKU Director string // Director name } ``` ### PerformerData ```go type PerformerData struct { Name string // Performer name URL string // Adult Empire URL Image string // Profile image URL Birthdate string // Date of birth Ethnicity string // Ethnicity Country string // Country of origin Height string // Height (converted to cm) Measurements string // Body measurements HairColor string // Hair color EyeColor string // Eye color Biography string // Bio text Aliases []string // Alternative names } ``` ## XPath Selectors The scraper uses XPath to extract data from Adult Empire pages. Key selectors include: ### Scene Selectors - **Title**: `//h1[@class='title']` - **Date**: `//div[@class='release-date']/text()` - **Studio**: `//a[contains(@href, '/studio/')]/text()` - **Image**: `//div[@class='item-image']//img/@src` - **Description**: `//div[@class='synopsis']` - **Performers**: `//a[contains(@href, '/performer/')]/text()` - **Tags**: `//a[contains(@href, '/category/')]/text()` ### Performer Selectors - **Name**: `//h1[@class='performer-name']` - **Image**: `//div[@class='performer-image']//img/@src` - **Birthdate**: `//span[@class='birthdate']/text()` - **Height**: `//span[@class='height']/text()` - **Bio**: `//div[@class='bio']` **Note**: Adult Empire may change their HTML structure. If scraping fails, XPath selectors in `scraper.go` may need updates. ## Utilities ### Date Parsing ```go dateStr := ParseDate("Jan 15, 2024") // Handles various formats ``` ### Height Conversion ```go heightCm := ParseHeight("5'6\"") // Converts feet/inches to cm (168) ``` ### Text Cleaning ```go cleanedText := CleanText(rawHTML) // Removes "Show More/Less" and extra whitespace ``` ### URL Normalization ```go fullURL := ExtractURL("/path/to/scene", "https://www.adultdvdempire.com") // Returns: "https://www.adultdvdempire.com/path/to/scene" ``` ## Integration with Goondex The Adult Empire scraper integrates seamlessly with the existing Goondex architecture: 1. **Scrape** data from Adult Empire using the scraper 2. **Convert** to Goondex models using converter functions 3. **Save** to the database using existing stores 4. **Display** in the web UI with cover art and metadata ### Example Workflow ```go // 1. Search for a scene results, _ := scraper.SearchScenesByName(ctx, "scene name") // 2. Pick the first result and scrape full details sceneData, _ := scraper.ScrapeSceneByURL(ctx, results[0].URL) // 3. Convert to Goondex model scene := scraper.ConvertSceneToModel(sceneData) // 4. Save to database sceneStore := db.NewSceneStore(database) sceneStore.Create(scene) // 5. Now it appears in the web UI! ``` ## Future Enhancements Planned improvements for the Adult Empire scraper: - ⏳ **Bulk Import** - Import entire studios or series - ⏳ **Auto-Update** - Periodically refresh metadata - ⏳ **Image Caching** - Download and cache cover art locally - ⏳ **Duplicate Detection** - Avoid importing the same scene twice - ⏳ **Advanced Search** - Filter by studio, date range, tags - ⏳ **Web UI Integration** - Search and import from the dashboard ## Troubleshooting ### "Failed to parse HTML" - The Adult Empire page structure may have changed - Update XPath selectors in `scraper.go` ### "Request failed: 403 Forbidden" - You may need to set an auth token - Adult Empire may be blocking automated requests - Try setting a valid `etoken` cookie ### "No results found" - Check that the search query is correct - Adult Empire search may have different spelling - Try broader search terms ### Scene/Performer data incomplete - Some fields may not be present on all pages - XPath selectors may need adjustment - Check the raw HTML to verify field availability ## Comparison with TPDB Scraper | Feature | TPDB | Adult Empire | |---------|------|--------------| | **API** | ✅ Official JSON API | ❌ HTML scraping | | **Auth** | ✅ API key | ⚠️ Session cookie | | **Rate Limits** | ✅ Documented | ⚠️ Unknown | | **Stability** | ✅ Stable schema | ⚠️ May change | | **Coverage** | ✅ Comprehensive | ✅ Comprehensive | | **Images** | ✅ High quality | ✅ High quality | **Recommendation**: Use TPDB as the primary source and Adult Empire as a fallback or supplemental source. ## Contributing To improve Adult Empire scraping: 1. Update XPath selectors if Adult Empire changes their HTML 2. Add support for additional fields 3. Improve date/height parsing 4. Add more robust error handling ## Version History - **v0.1.0-dev5** (2025-11-17): Documentation refresh for TPDB bulk-import release - Updated version metadata and changelog references - Clarified rebuild steps for the CLI additions - **v0.1.0-dev4** (2025-11-16): Initial Adult Empire scraper implementation - HTTP client with cookie support - XPath parsing utilities - Scene and performer scraping - Search functionality - Model conversion utilities --- **Last Updated**: 2025-11-17 **Maintainer**: Goondex Team