Don't miss another incident from AWS.
Ctatus will help you monitor AWS and all your 3rd party services.
Give Ctatus a Try
No credit card required
All Systems Operational
05/11
06/11
07/11
08/11
09/11
10/11
11/11
12/11
13/11
14/11
15/11
16/11
17/11
18/11
19/11
20/11
21/11
22/11
23/11
24/11
25/11
26/11
27/11
28/11
29/11
30/11
01/12
02/12
03/12
04/12
05/12

Latest Incidents

Resolved Additional Information

Nov 28, 12:05 AM PST We’d like to share more information about the Kinesis event on Wednesday November 25th. Additional details are available here. Should you have any questions, please contact AWS Support.

Resolved Increased CloudWatch API error rates

9:38 AM PST We are investigating increased error rates with the CloudWatch GetMetricStatistics and GetMetricData APIs when requesting metrics older than 3 hours in the US-EAST-1 Region. CloudWatch alarms may transition into "INSUFFICIENT_DATA" state if the period of the alarm is longer than 3 hours. Querying metrics less than 3 hours as well as publishing metrics remains unimpacted. 10:42 AM PST Between 7:59 AM and 10:05 AM PST, we experienced increased error rates with the CloudWatch GetMetricStatistics and GetMetricData APIs when requesting metrics older than 3 hours in the US-EAST-1 Region. CloudWatch alarms may have transitioned into "INSUFFICIENT_DATA" state if the period of the alarm was longer than 3 hours. Querying metrics less than 3 hours as well as publishing metrics were unimpacted. The issue has been resolved and the service is operating normally.

Resolved Increased provisioning error rates

12:09 PM PST We are experiencing increased error rates in provisioning of new Amazon WorkSpaces in the US-EAST-1 Region 2:06 PM PST We are continuing to experience increased error rates in the provisioning of new Amazon WorkSpaces in the US-EAST-1 Region. We recommend customers do not perform modify, migrate, reboot, rebuild and/or restore operations on existing WorkSpaces at this time. We are actively working to resolve the issue. 4:24 PM PST We continue to work towards recovery of the issue affecting provisioning of Amazon WorkSpaces in the US-EAST-1 Region. We recommend customers do not perform modify, migrate, reboot, rebuild and/or restore operations on existing WorkSpaces at this time. 7:56 PM PST We have identified the root cause of the issue affecting provisioning of Amazon WorkSpaces in the US-EAST-1 Region. We expect to see recovery once the on-going Kinesis issue is fully resolved. We recommend customers do not perform modify, migrate, reboot, rebuild and/or restore operations on existing WorkSpaces at this time. 9:22 PM PST We are beginning to see recovery on provisioning of Linux and non-BYOL Windows WorkSpaces in the US-EAST-1 Region. We recommend customers continue to not perform modify, migrate, reboot, rebuild and/or restore operations on all existing WorkSpaces until complete recovery. We continue to work toward full resolution. 11:31 PM PST We are continuing to see recovery on provisioning of Linux and Windows (including BYOL) WorkSpaces in the US-EAST-1 Region. At this time, we expect customers can modify, migrate, reboot, rebuild and restore their existing WorkSpaces normally. We continue to work towards full resolution. Nov 26, 12:01 AM PST Between 5:15 AM PST and 11:55 PM PST, we experienced increased error rates in provisioning of new Amazon WorkSpaces in the US-EAST-1 Region. The issue has now been resolved and the service is operating normally.

Resolved Job State Transition Delays

11:15 AM PST We are experiencing increased error rates for job state transitions and compute environment scaling in the US-EAST-1 Region. 11:41 AM PST We continue to experience increased error rates and delays for job state transitions and compute environment scaling in the US-EAST-1 Region. 12:58 PM PST We have identified the root cause of increased error rates and delays for job state transitions and compute environment scaling in the US-EAST-1 Region and continue to work toward resolution. 1:35 PM PST We are no longer seeing compute environment scaling delays but are still experiencing elevated job state transition times in the US-EAST-1 Region and continue to work toward resolution. 3:23 PM PST We are beginning to see recovery for the elevated job state transition times in the US-EAST-1 Region, and continue to work toward full resolution. 4:11 PM PST We continue to see some recovery for the elevated job state transition times in the US-EAST-1 Region, and are working toward full resolution. 6:15 PM PST We continue to see recovery for elevated job state transition times in the US-EAST-1 Region, and are working toward full resolution. Customers can expect jobs to work correctly, but may still see issues with Multi-node Parallel workloads. 7:23 PM PST We continue to see recovery for elevated job state transition times in the US-EAST-1 Region, and are working toward full resolution. Customers can expect most jobs to work correctly, but may still see Multi-node Parallel job delays. 9:23 PM PST We have full recovery for AWS Batch jobs not using AWSvpc networking mode. Customers running Multi-Node Parallel jobs may see deprovisioning delays while we wait for ECS recovery. 10:31 PM PST Between 5:17 AM and 7:20 PM PST we experienced delayed job state transitions and compute environment scaling delays in AWS Batch Jobs in the US-EAST-1 Region. The issue has been resolved and the service is operating normally.

Resolved Increased Error rates and Latency

11:07 AM PST We are continuing to experience elevated error rate on data ingestion and increased computation latency for IoT SiteWise auto-computed aggregates, transforms and metrics in the US-EAST-1 Region. This is also impacting IoT Events and IoT Analytics. We continue to work towards resolution. 12:24 PM PST We are continuing to experience elevated error rates on data ingestion and increased computation latency for IoT SiteWise auto-computed aggregates, transforms and metrics in the US-EAST-1 Region. This is also impacting IoT Events and IoT Analytics. We continue to work towards resolution. 3:06 PM PST We are beginning to see recovery for data ingestion and increased computation latency for IoT SiteWise auto-computed aggregates, transforms and metrics in the US-EAST-1 Region. This is also impacting IoT Events and IoT Analytics. We continue to work toward full resolution. 4:50 PM PST We are beginning to see recovery for data ingestion. Access to existing data, transforms, and metrics is unaffected. We are continuing to experience increased computation latency for IoT SiteWise auto-computed aggregates, transforms and metrics in the US-EAST-1 Region. This is also impacting IoT Events and IoT Analytics. We continue to work toward full resolution. 7:23 PM PST We have seen recovery for data ingestion. Access to existing data, transforms, and metrics is unaffected. We are continuing to experience increased computation latency for IoT SiteWise auto-computed aggregates, transforms and metrics in the US-EAST-1 Region. This is also impacting IoT Events and IoT Analytics. We continue to work toward full resolution. 10:23 PM PST We have seen recovery for data ingestion and the generation of auto-computed aggregates. Access to existing data, transforms, and metrics is unaffected. We are beginning to see recovery in the execution of computations for IoT SiteWise transforms and metrics but are still experiencing computation latency in the US-EAST-1 Region. We continue to work toward full resolution. IoT Events and IoT Analytics have recovered and those services are operating normally. Nov 26, 12:27 AM PST We are beginning to see recovery in the execution of computations for IoT SiteWise transforms and metrics but are still experiencing computation latency in the US-EAST-1 Region. All other functions of IoT SiteWise including data ingestion and the generation of auto-computed aggregates have recovered and are operating normally. We continue to work toward full resolution. Nov 26, 3:55 AM PST Between November 25 5:15 AM PST and November 26 3:49 AM PST, we experienced elevated error rates and increased latency on data ingestion, computation of aggregates, transforms, and metrics in the US-EAST-1 Region. The issue has been resolved and the service is operating normally. We are still executing computations for transforms and metrics on data that may have arrived during the impact window. These will appear in customers' accounts as we process them over the next few hours.

Resolved Increased API Error Rates and Launches

9:22 AM PST We are investigating increased API error rates for cluster and node group operations in the US-EAST-1 region. We are also investigating increased Fargate pod launch failures. Existing EKS clusters and managed node groups are operating normally. 11:03 AM PST Customer’s applications that are backed by pods already running are not impacted. We are continuing to experience API error rates in the US-EAST-1 region. EKS customers are experiencing errors when creating, upgrading and deleting EKS clusters and managed node groups. Existing managed node groups may experience errors scaling up or down. Customers will experience errors launching new Fargate pods. 1:55 PM PST We continue to work towards recovery of the issue affecting Amazon EKS in the US-EAST-1 Region. Customer applications that are backed by pods already running are not impacted. Also applications and pods can be started and run on EC2 instances that are already part of the cluster. EKS customers are experiencing errors when creating, upgrading and deleting EKS clusters and managed node groups. Customers will experience errors launching new Fargate pods, running Fargate pods are not impacted. 3:10 PM PST We continue working towards recovery of the issue affecting Amazon EKS in the US-EAST-1 Region. EKS Fargate pod launches are now seeing recovery. Customer applications that are backed by pods already running are not impacted. Applications and pods can be started and run on EC2 instances that are already part of the cluster. Customers can also create Managed Node groups for existing clusters. EKS customers will still experience errors when creating, upgrading and deleting EKS clusters and managed node groups. 4:29 PM PST We continue to work towards recovery of the issue affecting Amazon EKS in the US-EAST-1 Region. EKS Fargate pod launches have now recovered and is operating normally. Customer applications that are backed by pods already running are not impacted. Applications and pods can be started and run on EC2 instances that are already part of the cluster. Customers can also create Managed Node groups for existing clusters. EKS customers will still experience errors when creating, upgrading and deleting EKS clusters and managed node groups. 6:11 PM PST We continue to observe partial recovery for EKS cluster and node group API operations in the US-EAST-1 Region and are working toward full resolution. Customers may still experience errors when creating, upgrading and deleting EKS clusters and managed node groups. 9:48 PM PST We continue to observe partial recovery for EKS cluster and node group API operations in the US-EAST-1 Region and are working towards full resolution. We expect to see complete recovery once the on-going Kinesis issue is fully resolved. Until then, customers may still experience errors when creating, upgrading and deleting EKS clusters and managed node groups. 11:04 PM PST Between 5:15 AM and 10:20 PM PST, we experienced elevated API errors for cluster, node group operations and Fargate pod launches US-EAST-1 Region. Existing clusters and node groups were unaffected during the event. The issue has been resolved and the service is operating normally.

Resolved Increased API Error Rates and Launches

9:22 AM PST We are investigating increased API error rates for cluster and node group operations in the US-EAST-1 region. We are also investigating increased Fargate pod launch failures. Existing EKS clusters and managed node groups are operating normally.

Resolved Increased API Error Rates and Launches

10:22 AM PST We are investigating increased API error rates and delays delivering task events and metrics in the US-EAST-1 region. We are also investigating increased task launch error rates for the Fargate launch type. Running tasks are not impacted. 11:07 AM PST Currently running ECS tasks are not impacted for either the EC2 or Fargate launch types. We are continuing to experience API error rates and delays delivering task events and metrics in the US-EAST-1 region. ECS clusters are also not able to scale up or down due to task launch errors. Customers are missing metrics and events from their running tasks as ECS Insights is not able to propagate information. Task Set and Capacity Providers are also impacted. Customers using ECS on Fargate are not able to launch new tasks, running Fargate tasks are not impacted. 1:57 PM PST We continue working towards recovery for the issue impacting Amazon ECS in the US-EAST-1 region. Currently running ECS tasks are not impacted for either the EC2 or Fargate launch types. Customers are experiencing API error rates and delays delivering task events and metrics in the US-EAST-1 region. ECS clusters are also not able to scale up or down due to task launch errors. Customers are missing metrics and events from their running tasks as ECS Insights is not able to propagate information. Task Set and Capacity Providers are also impacted. Customers using ECS on Fargate are not able to launch new tasks, running Fargate tasks are not impacted. 2:13 PM PST We continue working towards recovery for the issue impacting Amazon ECS in the US-EAST-1 region. Currently running ECS tasks are not impacted for either the EC2 or Fargate launch types. We are seeing increased delivery rates for task events and metrics. ECS clusters are not able to scale up or down due to task launch errors with Task Set and Capacity Providers impacted. Customers using ECS on Fargate are not able to launch new tasks, running Fargate tasks are not impacted. 3:04 PM PST We continue working towards recovery for the issue impacting Amazon ECS in the US-EAST-1 region. ECS on Fargate task launches are seeing recovery. Currently running ECS tasks are not impacted for either the EC2 or Fargate launch types. We are seeing increased delivery rates for task events and metrics. ECS clusters using Capacity Providers are still seeing impact. 4:24 PM PST We continue working towards recovery for the issue impacting Amazon ECS in the US-EAST-1 region. Currently running ECS tasks are not impacted for either the EC2 or Fargate launch types. ECS on Fargate task launches are continuing to see recovery. Delivery of task events and metrics is starting to catch up, and API error rates are declining. 5:36 PM PST We continue working towards recovery for the issue impacting Amazon ECS in the US-EAST-1 region. Currently running ECS tasks are not impacted for either the EC2 or Fargate launch types. ECS on Fargate task launches are continuing to see recovery with a small number of task launches failing. Task event delivery is fully recovered. There continues to be higher than normal latencies for metrics due to continued CloudWatch impact. 7:16 PM PST We continue working towards recovery for the issue impacting Amazon ECS in the US-EAST-1 region. Currently running ECS tasks are not impacted for either the EC2 or Fargate launch types. ECS on Fargate task launches are continuing to see recovery with a very small number of task launches failing. Task event delivery is fully recovered. We are investigating slow deprovisioning of tasks. Capacity Providers and metrics delivery latency are both impacted until CloudWatch recovers. 9:06 PM PST ECS is investigating slow deprovisioning of tasks causing tasks to remain in a deactivating or deprovisioning for extended periods of time. For ECS tasks using awsvpc networking mode, including Fargate tasks, this means the ENI associated with the task remains provisioned longer than normal. CloudWatch Container Insights is now correctly showing recent metrics from ECS. CloudWatch metrics for ECS and Capacity Providers are continuing to see impact while we wait for Kinesis and CloudWatch recovery. 10:49 PM PST We are starting to see recovery for CloudWatch metrics for ECS and Capacity Providers. CloudWatch Container Insights is now showing recent metrics from ECS. ECS continues working to resolve slow tasks deprovisioning which causes tasks to remain in a deactivating or deprovisioning states for extended periods of time. For ECS tasks using awsvpc networking mode, including Fargate tasks, this means the ENI associated with the task remains provisioned longer than normal. Nov 26, 12:11 AM PST CloudWatch metrics for ECS and Capacity Providers have recovered. We continue working to resolve slow task deprovisioning which is causing tasks to remain in a deactivating or deprovisioning state for extended periods of time. For ECS tasks using awsvpc networking mode, including Fargate tasks, this means the ENI associated with the task remains provisioned longer than normal. Nov 26, 12:29 AM PST CloudWatch metrics for ECS and Capacity Providers have recovered. We expect to see recovery on slow task deprovisioning. which is causing tasks to remain in a deactivating or deprovisioning state for extended periods of time, once CloudMap is fully recovered. For ECS tasks using awsvpc networking mode, including Fargate tasks, this means the ENI associated with the task remains provisioned longer than normal. Nov 26, 1:17 AM PST Between November 25 5:15 AM and November 26 1:08 AM PST, we experienced elevated API and task launch error rates, delayed metrics impacting Capacity Provider scaling, CloudWatch Container Insights, and CloudWatch metrics for ECS. Running tasks were not impacted. The issue has been resolved and the service is operating normally.

Resolved Change Propagation and Invalidations Reporting Delay

9:54 AM PST We are investigating longer than usual reporting update delays for change propagation of invalidations and CloudFront configurations. Customer changes are propagating fine across our edge locations but the associated reporting is not getting updated. Also, end-user requests for content from our edge locations are not affected by this issue and are being served normally. 11:17 AM PST We are investigating longer than usual reporting update delays for change propagation of invalidations and CloudFront configurations. Customer changes are propagating fine across our edge locations but the associated reporting is not getting updated. During this time, CloudFront Access Logs, Metrics and Reporting may also be affected. End-user requests for content from our edge locations are not affected by this issue and are being served normally. 1:25 PM PST We are working towards recovery for delays in reporting updates for change propagation of invalidations and CloudFront configurations. Customer changes are propagating fine across our edge locations but the associated reporting is not getting updated. CloudFront Access Logs, Metrics and Reporting may continue to be affected. End-user requests for content from our edge locations are not affected by this issue and are being served normally. 2:50 PM PST Change propagation of CloudFront configurations and invalidations have recovered and are operating normally. However, CloudFront Access Logs, Metrics and Reporting continue to be affected. End-user requests for content from our Edge Locations are not affected by this issue and are being served normally. 4:16 PM PST We are still observing partial recovery on Access Logs, Metrics and Reports but intermittent gaps and delays exist. We continue to work toward full resolution. End-user requests for content from our Edge Locations were not affected by this issue and continue to operate normally. 6:06 PM PST We continue to work toward full resolution, and expect full recovery once the on-going Kinesis issue is resolved. Upon further recovery, we expect Access Logs, Metrics and Reports to fully recover and start backfilling over time for those queued during the impact. 9:44 PM PST CloudFront Access Logs, Metrics, and Reporting continues to be affected by the Kinesis event but we are observing improving recovery. CloudFront edge locations are serving traffic as expected. Change propagation and cache invalidation times are operating within normal time windows. 11:51 PM PST Between 5:41 AM and 2:40 PM PST, we experienced longer than usual reporting delays for Invalidations and CloudFront configurations to edge locations. Customer changes were propagating normally across our edge locations during this time but the associated reporting was not getting updated correctly. Between 5:41 AM and 11:26 PM PST, CloudFront Real-time Metrics were not available. CloudFront’s Real-time Metrics are now available in CloudWatch. The backlog of CloudFront Access Logs and Reports will be backfilled over the next few hours. During this time, all end-user requests for content from our edge locations were not affected by this issue and were being served normally.

Resolved Increased API Error Rates

6:36 AM PST We are investigating increased error rates for Kinesis Data Streams APIs in the US-EAST-1 Region. 7:50 AM PST We are continuing to investigate increased Kinesis Data Streams API errors, and are working on identifying root cause. 8:12 AM PST Kinesis Data Streams customers are still experiencing increased API errors. This is also impacting other services, including ACM, Amplify Console, API Gateway, AppStream2, AppSync, Athena, Cloudformation, Cloudtrail, CloudWatch, Cognito, Connect, DynamoDB, EventBridge, IoT Services, Lambda, LEX, Managed Blockchain, Resource Groups, SageMaker, Support Console, and Workspaces. We are continuing to work on identifying root cause. 8:52 AM PST The Kinesis Data Streams API is severely impaired. This is also impacting other services, including ACM, Amplify Console, API Gateway, AppStream2, AppSync, Athena, CloudFormation, CloudTrail, CloudWatch, Cognito, Connect, DynamoDB, EventBridge, IoT Services, Lambda, LEX, Managed Blockchain, Resource Groups, SageMaker, Support Console, and Workspaces. We are actively working towards resolution. 9:32 AM PST The Kinesis Data Streams API is currently impaired in the US-EAST-1 Region. As a result customers are not able to write or read data published to Kinesis streams. CloudWatch metrics and events are also affected, with elevated PutMetricData API error rates and some delayed metrics. While EC2 instances and connectivity remain healthy, some instances are experiencing delayed instance health metrics, but remain in a healthy state. AutoScaling is also experiencing delays in scaling times due to CloudWatch metric delays. The issue is also affecting other services, including ACM, Amplify Console, API Gateway, AppMesh, AppStream2, AppSync, Athena, Batch, CloudFormation, CloudTrail, Cognito, Connect, DynamoDB, EventBridge, Glue, IoT Services, Lambda, LEX, Managed Blockchain, Marketplace, Personalize, RDS, Resource Groups, SageMaker, Support Console, Well Architected, and Workspaces. For further details on each of these services, please see the Personal Health Dashboard. Other services, like S3, remain unaffected by this event. This issue has also affected our ability to post updates to the Service Health Dashboard. We are continuing to work towards resolution. 11:23 AM PST We continue to work towards recovery of the issue affecting the Kinesis Data Streams API in the US-EAST-1 Region. For Kinesis Data Streams, the issue is affecting the subsystem that is responsible for handling incoming requests. The team has identified the root cause and we continue to make progress in addressing the root cause. We are seeing some improvement in error rates, but continue to work towards full resolution. The issue also affects other services, or parts of these services, that utilize Kinesis Data Streams within their workflows. While features of multiple services are impacted, some services have seen broader impact and service-specific impact details are included within Recent Events on the Service Health Dashboard. 1:59 PM PST Kinesis Data Streams API requests are still significantly impaired. We have identified a mitigation for this issue, and are actively working towards resolution. 2:49 PM PST Kinesis Data Streams API requests are still impaired but are starting to see recovery. We continue to actively work towards resolution. 4:42 PM PST Kinesis Data Streams API operations are seeing gradual recovery but customers may continue to experience increased latencies and failure rates. We continue to actively work towards resolution. 6:32 PM PST We have now fully mitigated the impact to the subsystem within Kinesis that is responsible for the processing of incoming requests and are no longer seeing increased error rates or latencies. However, we are not yet taking the full traffic load and are working to relax request throttles on the service. Over the next few hours we expect to relax these throttles to previous levels. We expect customers to begin seeing recovery as these throttles are relaxed over this timeframe. 8:53 PM PST We are continuing to relax the request throttles for Kinesis Data Streams and are gradually increasing the traffic into the service. We have not yet enabled requests to Kinesis Data Streams from VPC Endpoints. The Kinesis Data Streams subsystem continues to operate normally, and we expect incremental recovery over the next few hours. 9:26 PM PST We have now enabled a subset of requests to Kinesis Data Streams using VPC Endpoints. 10:06 PM PST We have now enabled all requests to Kinesis Data Streams through Internet-facing endpoints. We are continuing to work to re-enable all requests to Kinesis Data Streams using VPC Endpoints. 11:00 PM PST We have now enabled all requests to Kinesis Data Streams through both Internet-facing endpoints and VPC Endpoints. Nov 26, 12:03 AM PST Between 5:15 AM and 11:10 PM PST customers experienced a significant impairment to their Amazon Kinesis Data Streams API operations. We have identified the root cause and have completed immediate actions to prevent recurrence. The issue has been resolved and the service is operating normally.

Stats

0 incidents in the last 7 days

23 incidents in the last 30 days

Follow AWS and other services to get alerts about incidents in realtime.

Get started now
14 days of trial / No credit card required

Don't miss another incident in AWS!

Ctatus aggregates status page from services so you don't have to. Follow AWS and hundreds of services and be the first to know when something is wrong.

Get started now
14 days of trial / No credit card required