Constructor.io Releases logo
Back to Homepage

Releases

Constructor.io Releases

Subscribe to Updates

Labels

  • All Posts
  • Fix
  • Announcement
  • Improvement
  • new
  • This Week in Engineering

Jump to Month

  • May 2025
  • April 2025
  • March 2025
  • February 2025
  • January 2025
  • December 2024
  • November 2024
  • October 2024
  • September 2024
  • August 2024
  • July 2024
  • June 2024
  • May 2024
  • April 2024
  • February 2024
  • January 2024
  • December 2023
  • November 2023
  • October 2023
  • September 2023
  • August 2023
  • July 2023
  • June 2023
  • May 2023
  • April 2023
  • March 2023
  • February 2023
  • January 2023
  • December 2022
  • November 2022
  • October 2022
  • September 2022
  • August 2022
  • July 2022
  • June 2022
  • May 2022
  • March 2022
  • January 2022
  • December 2021
  • November 2021
  • October 2021
  • September 2021
  • August 2021
  • July 2021
  • June 2021
  • May 2021
  • April 2021
  • March 2021
  • December 2020
  • November 2020
Powered️ byAnnounceKit

Create yours, for free!

Announcement
3 years ago

Constructor Holiday Readiness Program: Ensuring peak performance during peak demand

Overview

Constructor’s conversion optimization and discovery benefits are only as good as our uptime and performance. For this reason, we have a robust process of performance validation and monitoring. During the holiday season and the peak demand period of Black Friday and Cyber Monday, we increase our standards in all of these areas out of recognition that it is the most important selling period for many of our customers.

Survey of peak demand for 2020

In planning our preparations in the run-up to the 2021 holiday season, we looked back to daily and peak demand changes during the holiday season and also reviewed how our baseline traffic has increased in the time since then.

Last Black Friday our overall traffic increased 200% over daily baseline levels, and peaked at 500% of baseline. Not only did we maintain our 100% uptime, but performance during the peak demand periods actually improved relative to equivalent periods (due to changes in traffic patterns). Since then the system has already scaled without interruption or degradation for average daily traffic by over 383%.

Performance improvements over the past year

Over the past year we have worked continuously to drive even better performance and scalability, contributing to improved latencies and zero downtime. Some example projects and outcomes include:

  • Optimized scaleout policies
  • Introduced stand-by server pools
  • Improved instance boot and data download time by ~300%
  • Increased performance of personalization service
  • Doubled performance of underlying search & browse servers
  • Decreased index update delivery times
  • Increased database read capacity

Scale-out performance testing

We have tested scaling to 2000% of current average daily traffic volume, while validating the continued performance of the following:

  • Database connections 
  • Monitoring infrastructure
  • Networking infrastructure
  • Response latencies @ median, 90th percentile, 95th percentile, 99th percentile
  • Response latencies for each customer, and each product used by each customer
  • Data ingestion SLA times

Chaos and anti-fragility testing

We also use chaos testing to validate that catastrophic failure of the following supporting infrastructure does not impact critical features (primarily search, autosuggest, browse, recommendations, collections request/response times):

  • Disabled MySQL
  • Disabled index builders
  • Disabled personalization queues
  • Disabled supplemental ranking engines
  • Availability zone and data center failures

All of the above is in addition to the rigorous performance test and rollout plan we use for every release:

  • Full test suite on every pull request (incremental code change).
  • Production traffic replay for all deployment builds (multiple times a week).
  • Rolling, risk-adjusted deployment procedures across worldwide data centers.
  • Canary deployment for deploys touching critical path request/response lifecycle.
  • Automatic build failures if sensitive thresholds on result quality, latency, memory consumption, CPU consumption and more are breached at any of these levels.

Standard on-call procedures

At all times we have multiple on-call schedules for the following teams:

  • Front-end and client teams
  • Data science and result quality teams
  • Core platform and response performance  teams
  • Each of these have multiple fallbacks and tiered escalation policies

Automated alerting

Alerting is automated across dozens of metrics to ensure we are aware of incidents within seconds. A few representative examples:

  • Queuing times
  • Per-service latencies
  • Memory and CPU consumption

Special holiday on-call procedures

In addition, we take special precautions during peak holiday shopping periods:

  • We will over-provision all infrastructure above and beyond typical scale-out policy.
  • We double on-call rotation utilizing the above-mentioned automatic notification and escalation policies.
  • The entire account and product team will be monitoring throughout the Black Friday / Cyber Monday period, with elevated focus for other holiday periods (such as Boxing Day).

Conclusion

At Constructor, we take uptime, performance, and service stability very seriously because the best conversion optimization and ML are moot if we don’t deliver fast and stable service consistently. The goal of this document is to provide our customers with a broad overview of our site reliability practices, as well as a specific view of our holiday readiness procedures. As always, please feel free to reach out to your Customer Success Manager if you have any further questions.

Avatar of authorArthur Etchells