edudip operations

System Status

Realtime Updates
Tue - 10/14/2025, 10:58 AM
Major OutageMajor Outage
InvestigatingInvestigating

API(Dev): Replacing redis-sentinel with dragonfly

A high priority incident has been declared affecting the redis-sentinel service. Engineers are investigating the possibility of replacing redis-sentinel with dragonfly. Bitnami discountinued the support for redis-sentinel chart/images as "Open-Source" and fetching the current image is not possible anymore(without paying). Therefore we need to replace redis.
UndeterminedKubernetes Production Environment
Major OutageKubernetes Development Environment
Undeterminednext API

Services

Kubernetes Production Environment

Undetermined
Kubernetes production environment

Kubernetes Development Environment

Major Outage
Kubernetes development environment

Metrics Infrastructure

Operational
Grafana, Prometheus, Alerting

Logging Infrastructure

Operational
Graylog

Harbor Registry

Operational
Harbor container registry

Continuous Delivery

Operational
Github Actions & ArgoCD

Authentication

Operational
Keycloak Single Sign On

Sentry

Operational
Sentry application monitoring

LiveKit Infrastructure

Operational
Our livekit mediaserver infrastructure

next API

Undetermined

Scheduled Maintenance

No scheduled maintenance.

Past Incidents

Wed - 11/19/2025, 11:16 PM
UnaffectedUnaffected

Redis to Dragonfly Migration

Migration of Redis to Dragonfly in the API.
Unaffectednext API
Mon - 11/10/2025, 7:47 AM
UndeterminedUndetermined

Room join failures and timeouts

We are currently investigating reports of users experiencing difficulties joining rooms, with some attempts timing out. The root cause is being actively investigated to restore normal functionality as quickly as possible.
Undeterminednext API
Tue - 11/04/2025, 5:40 PM
Major OutageMajor Outage

etcd / node outage in c-prod-v3

The etcd service, which is critical for cluster management, is currently experiencing an outage in the c-prod-v3 production cluster.
Major OutageKubernetes Production Environment
Wed - 10/15/2025, 9:00 AM
Major OutageMajor Outage

Graylog unavailable

Graylog is due to an outage of the mongodb dependency unavailable and logging data is not further being processed
Major OutageLogging Infrastructure
Tue - 10/14/2025, 7:58 AM
UndeterminedUndetermined

Critical Node Failures Causing Cluster Unreachability

Due to multiple node failures, different clusters are currently unreachable. Immediate action is required to restore cluster functionality.
© Powered byIncidite2025