cakd-agent
Overview
Section titled “Overview”The `cakd-agent` daemon is a real-time diagnostics runner. It listens for Alertmanager webhooks, processes incoming alerts, invokes Gemini to diagnose failures based on alert descriptions, and routes both raw alerts and AI-generated root-cause analyses to Discord.
cakd-agentEnvironment Variables
Section titled “Environment Variables”| Env Var | Type | Default | Description |
|---|---|---|---|
DISCORD_WEBHOOK_URL | string | "" | The webhook URL to post diagnostics reports to |
PORT | string | "8080" | The port where the agent daemon listens for Alertmanager webhooks |
GEMINI_API_KEY | string | "" | The API key for authenticating with the Gemini LLM service |
GEMINI_MODEL | string | "gemini-flash-latest" | The Gemini LLM model to use for AI analysis |
CAKD_AGENT_SECRET | string | "" | A shared secret for authenticating incoming Alertmanager webhooks |
How It Works
Section titled “How It Works”`cakd-agent`starts an HTTP server listening on the configuredPORT.- It registers the
/api/v1/alertsendpoint to receive webhooks forwarded by Prometheus Alertmanager. - Upon receiving a POST request, it validates the
Content-Typeheader and limits the request body size to 1MB. IfCAKD_AGENT_SECRETis configured, it validates the shared secret from theAuthorizationheader (expecting aBearertoken) orsecretquery parameter. - The agent decodes the incoming Alertmanager JSON payload.
- It immediately sends an HTTP
200 OKresponse and then processes the alerts asynchronously. - It groups the received alerts by their target webhook URL, which can be the
DISCORD_WEBHOOK_URLor a namespace-specific URL loaded from configuration. - For each group, it formats the raw alert details (status, name, severity, description, defaulting to “No description provided” if empty) and dispatches them via the configured notifier (e.g., Discord).
- If there are “firing” alerts and a Gemini client is initialized (i.e.,
GEMINI_API_KEYis set), the agent asynchronously sends the collected alert descriptions to the Gemini LLM service for AI analysis. - The Gemini LLM generates a concise diagnosis and troubleshooting steps based on the provided alert context.
- The AI-generated diagnosis, titled ”🤖 CAKD AI Diagnosis”, is then formatted as an informational alert and sent back to the relevant target webhook URL via the notifier.
Examples
Section titled “Examples”Basic usage
Section titled “Basic usage”export DISCORD_WEBHOOK_URL="https://discord.com/api/webhooks/..."cakd-agentWith AI analysis and webhook authentication
Section titled “With AI analysis and webhook authentication”export DISCORD_WEBHOOK_URL="https://discord.com/api/webhooks/..."export GEMINI_API_KEY="your-gemini-api-key"export CAKD_AGENT_SECRET="your-shared-secret"cakd-agent