According to the GitHub Blog, GitHub released its June 2024 Availability Report detailing two major incidents that resulted in performance degradations across the service. These incidents affected the GitHub Issues and GitHub Migration services, resulting in outages and delays for users.
Incident of June 5, 2024
The first incident started on June 5, 2024 at 17:05 UTC and lasted for 142 minutes. During this period, the GitHub Issues service was degraded. Events related to projects were not displayed in the issue timeline, including actions such as adding or removing issues from projects and changing the status of issues within projects.
The root cause was identified as a misconfiguration due to scheduled secret rotation. This initiative aimed to clean up and simplify service configurations to improve automation. However, a bug in the implementation resulted in one of the configured services using an expired secret, which resulted in poor performance. GitHub has fixed the service configuration to mitigate the issue and expects the simplified setup to prevent similar incidents in the future.
Incident of June 27, 2024
The second incident occurred on June 27, 2024, from 20:39 UTC to 21:37 UTC, lasting 58 minutes. This incident affected the GitHub migration service, causing all ongoing migrations to fail. After detecting the increased failure rate, GitHub paused new migrations to prevent further disruptions. This resulted in longer migration times, but the team was able to resolve the issue without any additional failures.
The root cause was traced back to incorrect infrastructure credentials that required manual intervention. GitHub’s first responders quickly mitigated the issue, resuming paused migrations and restoring normal service levels. To prevent similar incidents from occurring, GitHub is improving its monitoring and notification mechanisms for infrastructure credentials.
Future Prevention and Monitoring
GitHub has committed to improving its monitoring and alerting systems to prevent such incidents in the future. Users are encouraged to follow the GitHub Status page for real-time updates and post-incident summaries. For more information about GitHub’s ongoing projects and engineering efforts, visit the GitHub Engineering Blog.
Image source: Shutterstock