Customer stories

Software

How Attentive ensures flawless message delivery during peak campaigns with Gatling Enterprise Edition

About the company

Attentive is a leader in conversational commerce, helping more than 8,000 brands deliver personalized experiences through SMS, email, and mobile messaging. The company’s platform powers billions of messages every month, driving measurable revenue for retail, e-commerce, and lifestyle brands.

‍

Every millisecond matters. When major campaigns go live, millions of notifications must reach users in real time. Reliability and performance directly impact customer trust and business outcomes.

Statistics

Industry

Software

Location

United States

Revenue

$300M+

Employees

1,000+

Key metrics

Billions of notifications per month across 8,000+ clients

Gatling Enterprise users

20+ engineers across developer experience, SRE, and service teams

100k

RPS per service

160k

RPS per node

0 errors

under load

Why performance matters at Attentive

Black Friday and Cyber Monday are the biggest events of the year for Attentive. During that period, message traffic can surge by more than ten times. Every API call for segmentation, orchestration, or delivery must remain responsive under extreme concurrency.

“When you’re sending billions of messages, you can’t afford uncertainty. We prepare our backend services for BFCM with realistic, high-scale load tests.”

— Bian Jiang, Principal Engineer, Developer Experience

Beyond the traffic spikes, Attentive must ensure that queues, pipelines, and external dependencies behave correctly under load. Many issues—like backlog buildup or gateway saturation—only appear during real traffic surges. Testing under production-like conditions is therefore essential.

Because developers at Attentive own both product code and test cases, the company needed a developer-centric load testing solution that fit naturally into their development workflow. Gatling’s test-as-code model matched that need perfectly.

Why they chose Gatling Enterprise Edition and how they are using it today

Before adopting Gatling Enterprise Edition, Attentive already used Gatling Community Edition, alongside tools such as JMeter and ghz. While powerful, these tools required Attentive to build and maintain its own coordination infrastructure—provisioning runners, managing teardown, and merging results manually.

“We had to create and maintain our own infrastructure to run JMeter and ghz. With Enterprise, that’s handled for us, which saves a lot of engineering time.”

With Gatling Enterprise Edition, Attentive gained:

Private Locations to generate load from inside its own network—essential for testing internal environments such as staging and dev, which are not publicly accessible
Realistic ramp-up and ramp-down profiles to simulate how traffic grows and declines during events like BFCM
A test-as-code workflow where developers build a single Java JAR and reuse it across environments with different configurations (e.g., RPS targets or patterns)
Automation through the Gatling Enterprise API, which Attentive uses in a command-line tool to package simulations, upload them, start runs, and fetch results automatically

This approach allows developers to iterate quickly without redeploying or waiting for long infrastructure cycles.

Attentive’s testing strategy

Attentive’s Developer Experience team manages two Gatling control planes, one for dev and one for staging. The staging environment mirrors production with isolated data and is scaled up to handle realistic peak loads.

Because the company doesn’t have a dedicated testing department, service owners are responsible for testing their own services. Each team designates a contact person to coordinate dependencies and capacity with other services. This setup is reinforced by recurring “game day” tests, where multiple critical paths are validated together.

Each test combines gRPC request flows with simulated stream events, reflecting the mix of synchronous and asynchronous workloads that Attentive’s backend must handle during peak operations. Capacity goals are based on previous BFCM performance, with extra headroom added to ensure resilience and accommodate year-over-year growth.

What they test

The goal is to validate the full path of message delivery: from API gateway through backend services to event consumers. Key focus areas include:

Throughput and latency under realistic ramp-up patterns
Autoscaling and backpressure behavior across Kubernetes and Istio
Error rates and assertions to ensure services meet success thresholds

The team also improved simulation efficiency. Early runs from a single node achieved around 6,000 RPS, but after tuning—particularly by enabling shared channels for gRPC and fixing TCP connection pooling—they reached around 160,000 RPS per node in private environments.

Test maintenance

Each product team maintains its own simulations in Git, following a test-as-code approach. The Developer Experience team provides shared tooling, scripts, and environment management.

The same simulation JAR can run in different environments, simply by changing configuration values such as target URLs or load patterns. At the end of each run, an internal script calls the Gatling and Datadog APIs to collect data, populate dashboards, and document results in a shared table with owners and status indicators.

How Gatling Enterprise Edition helped Attentive achieve its results

Managed orchestration: Teams no longer need to maintain their own infrastructure for running or coordinating simulations. Gatling Enterprise handles injector lifecycle, scheduling, and reporting.
‍
Developer efficiency: With their CLI built on the Gatling API, developers can now build, upload, and trigger tests from their laptops. Even a one-line change can be validated in minutes instead of redeploying test infrastructure.
‍
Correlated insights: Datadog integration bridges load and application metrics. During one test, Gatling helped Attentive identify a gateway issue where 100K RPS generated by the injectors resulted in only 40–45K RPS reaching the service. After improving Istio autoscaling and protection mechanisms, full traffic parity was achieved.

Better visualization. Although Attentive uses Datadog for telemetry, engineers prefer the Gatling Report UI for its clear charts and quick identification of anomalies such as CPU imbalance or excessive TCP connections.

“The charts in Gatling Enterprise helped us spot our TCP connection issue.”

Reporting and collaboration

Each morning, Attentive triggers a set of automated load tests—internally known as game day tests—through the Gatling Enterprise API. Results are automatically aggregated, linked to Datadog APM traces, and shared in a central table. This gives engineering teams a consistent view of readiness and regression trends across all services.

How Gatling improved Attentive’s BFCM readiness

Same confidence, faster process. Attentive achieved the same level of readiness for BFCM as the previous year, but with shorter coordination time and simpler workflows.
Higher scalability. Injector throughput increased from 6K RPS to 160K RPS after gRPC tuning.
Fewer bottlenecks. Load testing revealed gateway and thread-pool constraints before production.
Improved developer autonomy. The automation flow allows teams to iterate locally and launch simulations without waiting for infrastructure changes.

What’s next for Attentive

Attentive plans to extend its use of Gatling Enterprise Edition in two directions:

Continuous load testing: moving from pre-BFCM rehearsals to regular weekly or daily tests, or even per-release checks.
Production load testing: exploring the use of multi-region locations to simulate real-world network conditions and geographic diversity.