Outage in TSG Global

Inbound SMS to webhook customers stopped

Resolved Major
June 16, 2023 - Started 12 months ago
Official incident page

Need to monitor TSG Global outages?
Stay on top of outages with IsDown. Monitor the official status pages of all your vendors, SaaS, and tools, including TSG Global, and never miss an outage again.
Start Free Trial

Outage Details

Overview Inbound SMS traffic towards webhook customers partially stopped due to the database read replica lag. What happened Due to an increase in our database read replica lag, inbound traffic towards webhook customers stopped. Our SMS application was unable to fetch messages from the database since those messages were not yet available in the read replica due to the lag spike. Resolution As soon as the issue was identified, the quickest resolution was to deploy a hotfix to reconfigure all applications to read from the writer replica as temporary solution. Later, a hotfix was implemented to read from writer replica as fallback, in case the record is not found in the reader instance, if the lag ever increases again. Root Causes The root cause was due to the increase in database read replica lag. Applications were processing messages faster than records were propagated to the read replica. Applications tried to fetch messages and since those were not available they went into the retry queue so were delivered with a long delay. Impact Some HTTP webhook inbound traffic was delayed in the evening/early AM hours PST between 6/15/23 and 6/16/23. What did we learn? Since the outage was only partial, our existing metrics/alarms did not catch the issue and escalate it appropriately. We have added additional metrics and new alarms to alert for this kind of issue to prevent it from occurring again. We will also be performing some database maintenance in the near future to address the root cause.
Latest Updates ( sorted recent to last )
RESOLVED 12 months ago - at 06/16/2023 07:00PM

Overview
Inbound SMS traffic towards webhook customers partially stopped due to the database read replica lag.

What happened
Due to an increase in our database read replica lag, inbound traffic towards webhook customers stopped. Our SMS application was unable to fetch messages from the database since those messages were not yet available in the read replica due to the lag spike.

Resolution
As soon as the issue was identified, the quickest resolution was to deploy a hotfix to reconfigure all applications to read from the writer replica as temporary solution. Later, a hotfix was implemented to read from writer replica as fallback, in case the record is not found in the reader instance, if the lag ever increases again.

Root Causes
The root cause was due to the increase in database read replica lag. Applications were processing messages faster than records were propagated to the read replica. Applications tried to fetch messages and since those were not available they went into the retry queue so were delivered with a long delay.

Impact
Some HTTP webhook inbound traffic was delayed in the evening/early AM hours PST between 6/15/23 and 6/16/23.

What did we learn?
Since the outage was only partial, our existing metrics/alarms did not catch the issue and escalate it appropriately. We have added additional metrics and new alarms to alert for this kind of issue to prevent it from occurring again. We will also be performing some database maintenance in the near future to address the root cause.

Easily monitor TSG Global and all your third-party status

With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.

Start free trial

No credit card required · Cancel anytime · 3173 services available

Integrations with Slack Microsoft Teams Google Chat Datadog PagerDuty Zapier Discord Webhook

Setup in 5 minutes or less

How much time you'll save your team, by having the outages information close to them?

14-day free trial · No credit card required · Cancel anytime