Skip to main content

WordPress Media Copy Functionality

This document describes the mechanisms for copying images and slideshows between WordPress instances in the Aleteia network. The system automatically synchronizes media assets across multiple locales (en, es, fr, it, pt, pl, si, ar) to ensure content consistency.

Architecture Overview

The media copy system consists of two primary entry points (webhooks) that trigger background jobs to process and replicate media across WordPress sites.

Key Components

ComponentTypePurpose
ScrapeController#doWebhookHandles post creation/update events
WpMediaController#media_addedWebhookHandles new media uploads
SchedulePostAttachmentJobJobProcesses post attachments
ProcessWordpressPictureJobJobRetrieves and stores attachment info
CopyWordpressPictureJobJobCopies master pictures to other sites
SaveMasterPictureMetadataJobJobSaves supplier and author metadata
WordpressPictureModelStores picture information and relationships
WordpressAuthorModelStores author information
WordpressAPIServiceInterfaces with WordPress XML-RPC and REST APIs

Webhooks

ScrapeController#do

This webhook is called when a post is created or updated in any WordPress instance. It triggers multiple background jobs including media attachment processing.

Endpoint: POST /scrape

Parameters:

  • post_url - The URL of the created/updated post
  • ID - The WordPress post ID

Behavior:

  1. Validates the incoming URL (skips staging/frontity URLs)
  2. Enqueues FacebookScrapeJob for social media scraping
  3. Enqueues SchedulePostAttachmentJob for attachment processing
  4. Schedules Algolia indexing via AlgoliaIndexJob

WpMediaController#media_added

This webhook is called when new media is uploaded to WordPress. It uses JWT authentication for security.

Endpoint: POST /wp_media/media_added

Authentication: JWT Bearer token

JWT Payload:

  • attachment_id - The WordPress attachment ID
  • locale - The site locale

Behavior:

  1. Validates JWT token
  2. Enqueues ProcessWordpressPictureJob with immediate copy flag

Background Jobs

ProcessWordpressPictureJob

Retrieves attachment information from WordPress and creates or updates the local WordpressPicture record.

Queue: wordpress_pictures

Parameters:

  • id - WordPress attachment ID
  • locale - Site locale
  • check_copy_flag - Whether to check for immediate copy flag (default: false)

Behavior:

  1. If check_copy_flag is true and the attachment doesn't have the "copy now" flag, reschedules for 1 hour later
  2. Retrieves attachment data and custom fields from WordPress API
  3. Creates or updates WordpressPicture record via from_remote
  4. Schedules SaveMasterPictureMetadataJob for metadata processing

CopyWordpressPictureJob

Copies a master picture to other WordPress sites. Only master pictures (those without a parent) can be copied.

Queue: wordpress_pictures

Parameters:

  • picture - The WordpressPicture instance to copy
  • locale - Target locale (optional; if nil, copies to all active sites)

Behavior:

  1. Validates the picture is a master picture
  2. If no locale specified, schedules individual jobs for each active site
  3. If locale specified, uploads the image to the target site via WordPress API
  4. Creates child WordpressPicture record linked to the master

SchedulePostAttachmentJob

Processes post attachments when posts are created or updated.

Queue: default

Parameters:

  • post_url - The WordPress post URL
  • post_id - The WordPress post ID

Behavior:

  1. Extracts locale from URL
  2. Retrieves post data from WordPress API
  3. If post has a thumbnail attachment, checks local database
  4. Schedules SaveMasterPictureMetadataJob if picture exists

SaveMasterPictureMetadataJob

Retrieves and saves metadata for pictures including supplier information and author data.

Queue: wordpress_pictures

Parameters:

  • picture_or_start_id - Either a WordpressPicture instance or a starting ID for batch processing

Behavior:

  • For individual pictures: fetches supplier, author, and publication date from WordPress
  • For batch processing: iterates through pictures and schedules individual jobs

Data Models

WordpressPicture

Stores information about WordPress media attachments and their relationships across locales.

Key Attributes:

  • attachment_id - WordPress attachment ID
  • locale - Site locale
  • file_hash - SHA256 hash of the image content
  • url - Image URL
  • title, caption, description - Image metadata
  • supplier - Photo supplier (enum)
  • published - Whether the image is attached to a published post
  • published_at - Publication timestamp

Relationships:

  • parent - Master picture (for copied images)
  • children - Copied pictures in other locales
  • wordpress_author - The author who uploaded the image

Scopes:

  • master - Pictures without a parent (original uploads)
  • derivatives - Pictures that are copies of master pictures
  • published - Pictures attached to published posts
  • photo_team - Pictures uploaded by the photo team

WordpressAuthor

Stores information about WordPress users who upload media.

Key Attributes:

  • user_id - WordPress user ID
  • locale - Site locale
  • user_login - WordPress username
  • display_name - Display name
  • photo_team - Whether the user is part of the photo team

Complete Picture Copy Flow

The following diagram shows the complete flow from a new image upload to its replication across all sites:

Configuration

Environment Variables

VariableDescription
WORDPRESS_USERNAMEWordPress API username
WORDPRESS_PASSWORDWordPress API password
WORDPRESS_USE_SSLEnable SSL for WordPress connections

Active Sites

The system copies media to all active sites defined in WordpressAPI::ACTIVE_SITES:

ACTIVE_SITES = {
en: 'aleteia.org',
es: 'es.aleteia.org',
fr: 'fr.aleteia.org',
pl: 'pl.aleteia.org',
pt: 'pt.aleteia.org',
si: 'si.aleteia.org',
}

Note: Some sites (ar, it) may be disabled from active synchronization.

Custom Fields

The system uses several WordPress custom fields to track media:

FieldConstantPurpose
aleteia_media_copyrightCF_COPYRIGHTCopyright notice
aleteia_media_copyright_linkCF_COPYRIGHT_LINKLink to copyright source
aleteia_media_sourceCF_IMAGE_SUPPLIERPhoto supplier
aleteia_media_image_sourceCF_IMAGE_SOURCETracks copied image origin
aleteia_media_original_uploaderCF_IMAGE_UPLOADEROriginal uploader
aleteia_media_copy_nowCF_COPY_NOWFlag for immediate copy
_wp_attachment_image_altCF_ALT_TEXTAlternative text

Error Handling

The jobs implement various error handling mechanisms:

  • RestClient::NotFound: Picture deleted in WordPress - logged and skipped
  • XMLRPC::FaultException (404): Attachment not found - skipped
  • RestClient::NotAcceptable: VIP considers file dangerous - skipped
  • Duplicate hash: If WordPress modifies the uploaded file hash, a random hash is assigned

Duplicate Detection

The system uses SHA256 file hashes to detect duplicate images:

  1. Before copying, checks if a picture with the same file_hash exists in the target locale
  2. If found, skips the copy operation
  3. If WordPress changes the hash during upload, assigns a random hash to allow storage