{"aiPlatform":"claude-code@2025.06","category":"deployment","commandName":"/incident-response","content":"---\nname: Production Incident Response\ndescription: Respond to production incidents with coordinated agent expertise for rapid resolution\nallowed_tools:\n  - memory          # For tracking incident details and coordination between agents\n  - filesystem      # For analyzing logs, implementing fixes, and creating documentation\ntags:\n  - incident-response\n  - production\n  - emergency\n  - debugging\n  - security\n  - monitoring\n  - postmortem\n  - workflow\ncategory: operations\nversion: 2.0.0\nauthor: AI Commands Team\n---\n\nRespond to production incidents with coordinated agent expertise for rapid resolution:\n\n[Extended thinking: This workflow handles production incidents with urgency and precision. Multiple specialized agents work together to identify root causes, implement fixes, and prevent recurrence.]\n\n## Phase 1: Immediate Response\n\n### 1. Incident Assessment\n- Use Task tool with subagent_type=\"incident-responder\"\n- Prompt: \"URGENT: Assess production incident: $ARGUMENTS. Determine severity, impact, and immediate mitigation steps. Time is critical.\"\n- Output: Incident severity, impact assessment, immediate actions\n\n### 2. Initial Troubleshooting\n- Use Task tool with subagent_type=\"devops-troubleshooter\"\n- Prompt: \"Investigate production issue: $ARGUMENTS. Check logs, metrics, recent deployments, and system health. Identify potential root causes.\"\n- Output: Initial findings, suspicious patterns, potential causes\n\n## Phase 2: Root Cause Analysis\n\n### 3. Deep Debugging\n- Use Task tool with subagent_type=\"debugger\"\n- Prompt: \"Debug production issue: $ARGUMENTS using findings from initial investigation. Analyze stack traces, reproduce issue if possible, identify exact root cause.\"\n- Output: Root cause identification, reproduction steps, debug analysis\n\n### 4. Performance Analysis (if applicable)\n- Use Task tool with subagent_type=\"performance-engineer\"\n- Prompt: \"Analyze performance aspects of incident: $ARGUMENTS. Check for resource exhaustion, bottlenecks, or performance degradation.\"\n- Output: Performance metrics, resource analysis, bottleneck identification\n\n### 5. Database Investigation (if applicable)\n- Use Task tool with subagent_type=\"database-optimizer\"\n- Prompt: \"Investigate database-related aspects of incident: $ARGUMENTS. Check for locks, slow queries, connection issues, or data corruption.\"\n- Output: Database health report, query analysis, data integrity check\n\n## Phase 3: Resolution Implementation\n\n### 6. Fix Development\n- Use Task tool with subagent_type=\"backend-architect\"\n- Prompt: \"Design and implement fix for incident: $ARGUMENTS based on root cause analysis. Ensure fix is safe for immediate production deployment.\"\n- Output: Fix implementation, safety analysis, rollout strategy\n\n### 7. Emergency Deployment\n- Use Task tool with subagent_type=\"deployment-engineer\"\n- Prompt: \"Deploy emergency fix for incident: $ARGUMENTS. Implement with minimal risk, include rollback plan, and monitor deployment closely.\"\n- Output: Deployment execution, rollback procedures, monitoring setup\n\n## Phase 4: Stabilization and Prevention\n\n### 8. System Stabilization\n- Use Task tool with subagent_type=\"devops-troubleshooter\"\n- Prompt: \"Stabilize system after incident fix: $ARGUMENTS. Monitor system health, clear any backlogs, and ensure full recovery.\"\n- Output: System health report, recovery metrics, stability confirmation\n\n### 9. Security Review (if applicable)\n- Use Task tool with subagent_type=\"security-auditor\"\n- Prompt: \"Review security implications of incident: $ARGUMENTS. Check for any security breaches, data exposure, or vulnerabilities exploited.\"\n- Output: Security assessment, breach analysis, hardening recommendations\n\n## Phase 5: Post-Incident Activities\n\n### 10. Monitoring Enhancement\n- Use Task tool with subagent_type=\"devops-troubleshooter\"\n- Prompt: \"Enhance monitoring to prevent recurrence of: $ARGUMENTS. Add alerts, improve observability, and set up early warning systems.\"\n- Output: New monitoring rules, alert configurations, observability improvements\n\n### 11. Test Coverage\n- Use Task tool with subagent_type=\"test-automator\"\n- Prompt: \"Create tests to prevent regression of incident: $ARGUMENTS. Include unit tests, integration tests, and chaos engineering scenarios.\"\n- Output: Test implementations, regression prevention, chaos tests\n\n### 12. Documentation\n- Use Task tool with subagent_type=\"incident-responder\"\n- Prompt: \"Document incident postmortem for: $ARGUMENTS. Include timeline, root cause, impact, resolution, and lessons learned. No blame, focus on improvement.\"\n- Output: Postmortem document, action items, process improvements\n\n## Coordination Notes\n- Speed is critical in early phases - parallel agent execution where possible\n- Communication between agents must be clear and rapid\n- All changes must be safe and reversible\n- Document everything for postmortem analysis\n\nProduction incident: $ARGUMENTS","contentHash":"97af8264a3627394824ba3ee0c1cd91900ca5abb14346203ad6b0b98f55df5e3","copies":0,"createdAt":"2025-08-12T16:09:34.870Z","description":"Production incident resolution with ops subagents","github":{"repoUrl":"https://github.com/Commands-com/commands","lastSyncDirection":"from-github","metadata":{"importedFrom":"github_repository","repoPrivate":false,"repoDefaultBranch":"main","connectedAt":"2025-08-12T16:09:34.870Z"},"importedAt":"2025-08-12T16:09:34.870Z","lastSyncAt":"2025-08-17T17:57:45.781Z","fileMapping":{"license":null,"readme":null,"assets":[],"mainFile":"workflows/incident-response.md"},"selectedCommand":"incident-response","fileShas":{"mainFile":"70f74dd8ce6c539e0b80ac0f54f3449d7659cd8b","yamlPath":"313647b1fb381389da33b7913e95baf617c4b392"},"branch":"main","connectionType":"commands_yaml","connected":true,"lastSyncCommit":"01591bc061d236bde47bf23b0f47e8afcf1a5144","importSource":"repository_import","installationId":"69232615","syncStatus":"synced"},"githubRepoUrl":"https://github.com/Commands-com/commands","id":"26e21920-cd6a-479e-b02f-ea4c8896d434","inputParameters":[{"defaultValue":"P2","name":"severity","options":["P1-Critical","P2-High","P3-Medium","P4-Low"],"description":"Severity level of the incident","label":"Incident Severity","type":"select","required":true},{"name":"affected_service","description":"Name of the affected service","label":"Affected Service","type":"text","required":false}],"instructions":"Production incident resolution with ops subagents","likes":0,"mcp_search_content":"","organizationUsername":"commands-com","price":"free","search_content":"incident response production incident resolution with ops subagents /incident-response deployment claude-code@2025.06","title":"Incident Response","type":"command","updatedAt":"2025-08-17T17:57:45.781Z","userId":"W0V8NAw5AhWRwcuwSoFLOi1Yem83","visibility":"public","name":"incident-response","userInteraction":{"userHasStarred":false}}