Analytics

Sync Orders

11min

Sending Order Information to Klevu

Order analytics are sent to the analytics collect service; this is the same endpoint used for client-side analytics in earlier versions of the module, and replaces the product tracking service previously used for server side processing.

When a sync is initiated, all eligible orders are retrieved and passed through an ETL pipeline which converts and sends the data to Klevu’s analytics services. Unsuccessful orders are re-queued to be retried on future pipeline executions.

Interfaces

Sync can be initiated via two methods, both of which are interfaces to the same pipeline process.

Automatically via Cron

As with all cron tasks added by version 4.x of the Klevu integration, order sync cron tasks use the dedicated klevu group, allowing granular configuration of scheduling and run options.

We recommend using the native Magento cron to run these jobs, configuring schedules if necessary, rather than a custom server-side cron implementation of the CLI command.

The klevu_analytics_sync_orders cron job runs every 5 minutes by default, though this is customisable via the global config setting Klevu > Data Sync > Order Analytics > Order Sync Frequency. To ensure that orders can be reliably associated with the user session which triggered them we recommend this value is no more infrequent than every half hour.

Each time a cron job runs, it will attempt to process all queued orders (including retries) from sync enabled stores * across your installation. Orders are processed from oldest to most recent, regardless of sync status or the store on which the order was placed.

Manually via Command Line

Order sync can also be triggered via the klevu:analytics:sync-orders CLI command. Executing via the terminal provides more control over the orders to be synced, allowing you to define a list of order ids and/or store ids to handle. Note that even if order ids are provided, only queued orders (including retries) will be processed - to resend a synced or stuck order, you must manually requeue it before processing.

By default, only sync enabled stores * are processed, even where store ids are explicitly passed, however this check can be bypassed with the --ignore-sync-enabled-flag option.

Another subtle difference with the cron process is that the console command will attempt to process all failed orders (ie, those with a retry status) first, before moving onto new orders.

* The extension defines a “sync enabled store” by the following criteria

  • The store is active in Magento
  • The store is integrated with Klevu (ie, API keys have been saved for the store view)
  • The config setting Klevu > Data Sync > Order Analytics > Order Sync Enabled is set to “Enable” for this store

Process

Execution of order sync is handled by a virtualType of the ProcessEvents service from the general Analytics module. This virtualType (defined as Klevu\AnalyticsOrderSync\Service\ProcessOrderEvents) uses Order Analytics services to provide context and payload data to the pipeline, as well as defining a dedicated sync_orders.yml pipeline configuration file. See klevu/php-pipelines documentation for more information on defining pipeline objects.

The initial pipeline payload is provided by \Klevu\AnalyticsOrderSync\Service\Provider\OrderSyncEventsDataProvider::get(), a Generator method yielding Magento Order objects based upon the search criteria provided by the calling interface (see above). In addition to order ID; store ID; and sync status filtering, if a value has been added to Klevu > Data Sync > Order Analytics > Exclude Statuses From Sync, that restriction will be applied at this stage. This setting can be defined at a store scope.

Each order is sent in an individual API request to Klevu as the user profile information, which contains a customer’s email and IP address, is defined at an event level and not at an item level. Ref: Search Events - Storefront APIs

While each order is processed and transmitted to Klevu individually, batching of records for process is still performed in the interface, with each batch size controlled by the Klevu > Developer Settings > Order Analytics > Order Sync Max Batch Size configuration value (known issue in beta-1). By default this value is 250. Batching records here mitigates performance concerns with large queues, while also ensuring that later records are less likely to be outdated during long processing runs.

While each order is processed and transmitted to Klevu individually, batching of records for process is still performed in the interface, with each batch size controlled by the Klevu > Developer Settings > Order Analytics > Order Sync Max Batch Size configuration value (

The order batch is handed off to the analytics pipeline (as defined at etc/pipeline/sync_orders.yml), where each undergoes a serious of transformations before being sent to Klevu’s analytics endpoints.

Some key stages are outlined below

Document image


 

Receive Payload

A collection of orders is passed to the pipeline

Iterate Payload

The orders are looped though, to allow each order to be processed individually

Mark Order As Processing

The order record in klevu_sync_order is updated to have a status of “processing” and its attempts value (the number of times the order has passed through the pipeline) incremented. This action will lock the order to prevent it being processed again if a second sync is run in parallel.

Inject Line Items For Grouped Products

Magento stores grouped product purchases as individual line items but, unlike configurable products, does not include a consolidated parent line item. As grouped products are indexed as single entities, we inject a spoof item into the payload for any grouped products purchased in the order.

Filter Invalid Order Line Items

Configurable products (the consolidated parent item) are removed, as Klevu indexes variants as individual records. Child items of Grouped and Bundle products are removed, as Klevu indexes these product types as a single record.

Iterate Line Items

The order lines are looped through, allowing each item to be processed individually.

Convert to Analytics Collect Event Object Ready For Send

The converted order lines array is injected into an Event object (ref klevu/php-sdk) along with the Klevu API Key; customer information; event type; and event version.

Send Analytics Event Data to Klevu API

The consolidated data is sent as a JSON to the Klevu analytics/collect endpoint and the response received and passed along the pipeline.

Mark Order As Processed Or Requeue

Based on the API response in the previous stage, the record in klevu_sync_order is either updated to a “complete” status or, on error, moved to “retry” (if it has not yet passed the maximum retries threshold - see below) or “error” otherwise.

Once all records in the payload have been processed, the pipeline returns an object containing the result of each order processed.

Handling Failures

Order records may not pass through the pipeline successfully for a variety of reasons, including invalid source data; system exceptions during transformation; or errors communicating with Klevu’s APIs. In all but exceptional circumstances, individual order failures will not stop the pipeline continuing with subsequent records in the payload.

Records which fail in a predictable way will be marked for retry (a status which is treated as queued when payloads are generated) and picked up the next time. Records which fail after sync has been attempted the maximum number of times (as defined by the Klevu > Developer Settings > Order Analytics > Order Record Max Sync Attempts config value) will instead be set to an “error” status and will not be picked up again unless manually re-queued.

It is possible to re-queue orders which have exceeded the maximum sync attempts using the klevu:analytics:queue-orders-for-sync CLI command. The number of attempts counter will continue incrementing for this record.

“Stuck” Records

While looping through record in the pipeline, the order’s sync status is set the “processing”. This informs administrators and the system alike that this record is currently undergoing transformation and sync.

If the pipeline exits unexpectedly before the record finishes processing and has its status updated a second time, the record can never be picked up for retry. While these orders can be re-queued manually using the klevu:analytics:queue-orders-for-sync CLI command, they will be automatically moved into a “retry” status after a 15 minute threshold (configurable using the Klevu > Developer Settings > Order Analytics > Consider Processing Orders Stuck After (minutes) setting).

This action is performed by the klevu_analytics_requeue_stuck_orders cron task, which is scheduled to run on the hour. This means that records may be stuck for up to 75 minutes, depending on the last processing time, before being requeued.

Extracting Order IP Address

Magento may store multiple IP addresses against a single order, either within the same field or in different fields depending on what data is being stored (such as when WAF or CDNs are used).

The pipeline stage processOrders.iterateOrders.processOrder.execute.extractEventData.createUserProfile.createRecord.ip_address uses a dynamic extractor set within the Context object, which uses the Klevu > Developer Settings > Order Analytics > IP Address Field for Order Data configuration setting to toggle between the remote_ip and x_forwarded_for fields.

In cases where multiple IPs are saved within the same field, the pipeline will use the first one found in a comma-separated list.

Duplicate Records

A concern with automatically re-queueing orders in a processing status is that duplicate data will be sent to Klevu, affecting conversion rate information.

While it is possible for the same order to be submitted multiple times (whether through stuck record re-queueing, or manual sync), the Klevu services receiving this data will filter duplicate records based upon the unique combination of Magento order and order line ids.