Complete Onboarding
Complete Onboarding Guide
Section titled “Complete Onboarding Guide”Step-by-step guide to set up NeuBird MCP and start investigating incidents.
Overview
Section titled “Overview”This guide walks you through the complete onboarding process for NeuBird MCP:
The Onboarding Journey
Section titled “The Onboarding Journey”graph LR
A[Install MCP] --> B[Add Connections]
B --> C[Create Project]
C --> D[Run Test Investigation]
D --> E[Find & Investigate Alerts]
E --> F[Configure Instructions]
F --> G[Optimize & Refine]
Phases at a Glance
Section titled “Phases at a Glance”📦 Prerequisites & Installation (5-10 minutes)
- Install NeuBird MCP Server and configure your AI client
- See Installation Guide for details
🔗 Phase 1: Add Connections (10-15 minutes)
- Connect your cloud providers (AWS, Azure, GCP)
- Link monitoring tools (Datadog, PagerDuty, etc.)
- Wait for initial data sync
📁 Phase 2: Create Project (5 minutes)
- Set up your first project
- Link connections to the project
- Set as default project
🔍 Phase 3: Run First Investigation (5-10 minutes)
- Create a manual investigation to test the system
- Review the RCA (Root Cause Analysis)
- Understand the investigation output
🎯 Phase 4: Find & Investigate Real Alerts (10-15 minutes)
- List uninvestigated alerts from your systems
- Investigate a real incident
- Execute corrective actions
📋 Phase 5: Configure Instructions (15-20 minutes)
- Add SYSTEM instructions for context
- Create FILTER instructions to reduce noise
- Test RCA instructions on past incidents
Total time: 50-75 minutes
Phased Approach
You don’t need to complete everything in one sitting! Many teams split this across multiple sessions:
- Day 1: Install + Connections + Project (20-30 min)
- Day 2: First investigations + Instructions (30-45 min)
Prerequisites
Section titled “Prerequisites”Before starting, ensure you have:
- NeuBird account (book demo if needed)
- NeuBird MCP installed (see Installation Guide)
- AI client configured (Claude Desktop, Claude Code, Cursor, or GitHub Copilot)
- Cloud provider access (AWS, Azure, or GCP credentials)
- Monitoring tool credentials (Datadog, PagerDuty, New Relic, etc.)
Installation Required
If you haven’t installed NeuBird MCP yet, follow the Installation Guide first, then return here.
Phase 1: Add Connections
Section titled “Phase 1: Add Connections”Connect your cloud providers and monitoring tools to NeuBird.
Step 1.1: Understand Connections
Section titled “Step 1.1: Understand Connections”NeuBird needs access to your infrastructure to investigate incidents:
| Connection Type | What It Provides | Required? |
|---|---|---|
| AWS | CloudWatch logs/metrics, EC2, RDS, Lambda, etc. | ✅ Recommended |
| Azure | Azure Monitor, App Insights, VMs | ✅ If using Azure |
| GCP | Cloud Logging, Monitoring, Compute | ✅ If using GCP |
| Grafana | Prometheus, Loki, Tempo | ✅ If using Grafana with Kubernetes |
| Datadog | Logs, metrics, traces, APM | 🟡 Optional but helpful |
| Dynatrace | APM, infrastructure monitoring | 🟡 Optional |
| PagerDuty | Alert management, on-call schedules | 🟡 Optional |
| ServiceNow | Alert management, on-call schedules | 🟡 Optional |
| FireHydrant | Alert management, on-call schedules | 🟡 Optional |
| Incident.io | Alert management, on-call schedules | 🟡 Optional |
Start minimal
Begin with your primary cloud provider + one monitoring tool + one incident management tool. You can add more connections later.
Step 1.2: Create AWS Connection
Section titled “Step 1.2: Create AWS Connection”Ask Claude:
Create an AWS connection for my production environmentClaude will guide you through providing:
- IAM Role ARN and ExternalId
- Regions to monitor
- Connection name (e.g., “Production AWS”)
Example:
✓ Created AWS connection: Production AWSRole ARN: arniam:role/neubird-readonlyStatus: Syncing (this may take 5-10 minutes)ExternalID: <your-external-id>Regions: us-east-1, us-west-2Learn more about creating your AWS role here: AWS Connection Setup
Step 1.3: Wait for Connection Sync
Section titled “Step 1.3: Wait for Connection Sync”Check sync status:
Check the status of my AWS connectionResponse:
Connection: Production AWSStatus: SYNCED ✓Last sync: 2 minutes ago
Resources discovered:- 45 EC2 instances- 12 RDS databases- 23 Lambda functions- 156 CloudWatch alarmsFirst sync takes time
Initial sync can take 2-5 minutes depending on resource count. You can proceed to the next phase while it syncs. Be patient, it’s worth it!
Step 1.4: Add Monitoring Tools (Optional)
Section titled “Step 1.4: Add Monitoring Tools (Optional)”Add Datadog:
Add a Datadog connection with my API key and app keyLearn more about creating your Datadog API keys here: Datadog Connection Setup
Add PagerDuty:
Connect my PagerDuty account for alert managementLearn more about creating your PagerDuty API keys here: PagerDuty Connection Setup
Step 1.5: Verify Connections
Section titled “Step 1.5: Verify Connections”List all my NeuBird connectionsExpected output:
Found 2 connections:
1. Production AWS (AWS) Status: SYNCED and TRAINED ✅ Telemetry Types: Alert, Config, Log, Metric
2. Production Datadog (Datadog) Status: SYNCED and TRAINED ✅ Telemetry Types: Alert, Log, MetricPhase 2: Create Project
Section titled “Phase 2: Create Project”Projects organize your connections, instructions, and investigation history.
Step 2.1: Understand Projects
Section titled “Step 2.1: Understand Projects”A NeuBird project organizes:
- Connections - Which cloud/monitoring tools to use
- Instructions - How to investigate incidents
- Sessions - Investigation history
Most teams start with one project per environment (Production, Staging, etc.).
Step 2.2: Create Project
Section titled “Step 2.2: Create Project”Ask Claude:
Create a new NeuBird project called "Production"Claude will use neubird_create_project:
✓ Created project "Production"UUID: abc-123-def-456Status: Active
Next steps:1. Add connections2. Configure instructions3. Start investigatingSave the Project UUID - You’ll need it for later steps.
Step 2.3: Set as Default Project
Section titled “Step 2.3: Set as Default Project”Set this project as your default to avoid specifying project_uuid in every command:
Set Production as my default projectOr use the UUID directly:
neubird_set_default_project(project_uuid="abc-123-def-456")Claude will confirm:
✓ Default project set to: ProductionUUID: abc-123-def-456
All operations will now use this project by default.Benefits:
- 🎯 No need to specify
project_uuidin commands - 🔄 Easy switching between projects
- 💬 Use natural language: “Switch to Staging project”
Multiple Projects
If you create multiple projects (Production, Staging, Dev), you can quickly switch between them using neubird_set_default_project. The default persists for your entire MCP session.
Step 2.4: Link Connections to Project
Section titled “Step 2.4: Link Connections to Project”Add my AWS and Datadog connections to the Production projectClaude will use neubird_add_connection_to_project:
✓ Added 2 connections to Production project- Production AWS (AWS)- Production Datadog (Datadog)
Project is now ready for investigations!Step 2.5: Verify Project Setup
Section titled “Step 2.5: Verify Project Setup”Show me details for my Production projectThis uses neubird_get_project_details:
Production Project Details━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
UUID: abc-123-def-456Status: ActiveCreated: Just now
Connections (2):- Production AWS (AWS) - SYNCED- Production Datadog (Datadog) - SYNCED
Instructions: None yetSessions: None yetPhase 3: Run First Investigation
Section titled “Phase 3: Run First Investigation”Test the system with a manual investigation before waiting for real alerts.
Step 3.1: Create a Test Investigation
Section titled “Step 3.1: Create a Test Investigation”Create a manual investigation to verify everything works:
Investigate potential memory leak in user-api pods that I noticed this morning.Memory usage increased from 500MB to 1.2GB between 8am-10am UTC today.No alerts fired yet but trending upward.Claude will use neubird_create_manual_investigation:
✓ Created manual investigationSession UUID: xyz-789-abc-123Status: Running
Investigation will complete in 2-5 minutes.Analyzing:- CloudWatch metrics for memory usage- Pod restart patterns- Application logs- Resource allocation changesStep 3.2: Wait for Completion
Section titled “Step 3.2: Wait for Completion”You can check status:
What's the status of my investigation?Or wait 2-5 minutes and then get the RCA.
Step 3.3: Review the RCA
Section titled “Step 3.3: Review the RCA”Show me the root cause analysisClaude uses neubird_get_rca:
Root Cause Analysis━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Incident: Memory Leak - user-apiSeverity: P2 (High)Duration: Ongoing (2 hours)Status: Active
ROOT CAUSE:━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━Memory leak caused by unclosed database connectionsin the user session handler. Connection pool reachedmaximum capacity, preventing cleanup of old connections.
TIMELINE:━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━08:00 UTC - Normal memory usage (500MB)08:15 UTC - Gradual increase begins09:00 UTC - Memory at 800MB (60% increase)09:30 UTC - Connection pool at 85% capacity10:00 UTC - Memory at 1.2GB (140% increase)10:15 UTC - Connection pool at max (100 connections)
CORRECTIVE ACTIONS:━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━Immediate:1. Restart affected pods to clear connections
kubectl rollout restart deployment/user-api
2. Temporarily increase connection pool timeout
kubectl set env deployment/user-api DB_POOL_TIMEOUT=30000
Long-term fixes:1. Fix connection leak in code (user-session.js:45)2. Implement connection pool monitoring3. Add automated cleanup for stale connections
TIME SAVED:━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━Manual investigation time: ~30 minutesNeuBird investigation time: 2 minutesTime saved: 28 minutes ⚡Step 3.4: Understanding the Output
Section titled “Step 3.4: Understanding the Output”Every RCA includes:
- Incident Summary - What happened and severity
- Root Cause - Why it happened with technical details
- Timeline - Chronological event sequence
- Corrective Actions - Ready-to-execute fixes
- Time Savings - Efficiency metrics
Test Complete!
If you got a detailed RCA like above, your setup is working correctly! The system successfully:
- Connected to your cloud provider
- Analyzed metrics and logs
- Generated actionable insights
Phase 4: Find & Investigate Real Alerts
Section titled “Phase 4: Find & Investigate Real Alerts”Now that the system is working, investigate real incidents from your infrastructure.
Step 4.1: List Uninvestigated Alerts
Section titled “Step 4.1: List Uninvestigated Alerts”Show me uninvestigated incidents from the last 24 hoursThis uses neubird_list_sessions with only_uninvestigated=true:
Found 3 uninvestigated incidents:
1. High API Latency - user-service Alert ID: /aws/cloudwatch/alerts/latency-spike-123 Severity: P1 Time: 2 hours ago Source: CloudWatch
2. Database Connection Pool Exhausted Alert ID: /datadog/alerts/db-pool-456 Severity: P2 Time: 4 hours ago Source: Datadog
3. Lambda Cold Start Timeout Alert ID: /aws/cloudwatch/alerts/lambda-789 Severity: P3 Time: 6 hours ago Source: CloudWatchStep 4.2: Investigate an Alert
Section titled “Step 4.2: Investigate an Alert”Investigate the high API latency incidentClaude will:
- Extract the
alert_id - Call
neubird_investigate_alert - Wait for completion (30-90 seconds)
- Retrieve RCA automatically
🔍 Starting investigation...⏳ Analyzing CloudWatch logs... (20s)⏳ Checking application traces... (15s)⏳ Reviewing database metrics... (10s)⏳ Correlating events... (15s)
✓ Investigation complete! (60s total)Step 4.3: Review and Act
Section titled “Step 4.3: Review and Act”The RCA will provide:
- Root cause explanation
- Timeline of events
- Corrective actions (with bash commands)
- Preventive measures
Execute the recommended actions:
# Example corrective action from RCAkubectl scale deployment/user-service --replicas=5Step 4.4: Ask Follow-Up Questions
Section titled “Step 4.4: Ask Follow-Up Questions”Why wasn't this caught in staging?Has this happened before?What can we do to prevent it?Claude uses neubird_continue_investigation to provide deeper insights.
Phase 5: Configure Instructions
Section titled “Phase 5: Configure Instructions”Fine-tune how NeuBird investigates incidents by adding instructions.
Step 5.1: Understand Instruction Types
Section titled “Step 5.1: Understand Instruction Types”| Type | Purpose | When to Use | Example |
|---|---|---|---|
| SYSTEM | Provide context about your architecture | Always (start with 1-2) | “We use microservices on Kubernetes” |
| FILTER | Reduce noise by filtering low-priority alerts | When getting too many alerts | ”Only investigate P1 and P2 incidents” |
| RCA | Guide investigation steps for specific scenarios | For common incident types | ”For database issues, check slow queries first” |
| GROUPING | Group related alerts together | When seeing duplicate alerts | ”Group alerts from same service within 5 min” |
Step 5.2: Add SYSTEM Instruction
Section titled “Step 5.2: Add SYSTEM Instruction”Provide high-level context:
Create a SYSTEM instruction for my Production project:
"Our infrastructure runs on AWS EKS with 15 microservices.We use PostgreSQL for databases and Redis for caching.Peak traffic is 9am-5pm EST. We have auto-scaling enabledfor all services with min 2, max 10 replicas."Claude uses neubird_create_project_instruction:
✓ Created SYSTEM instructionType: SYSTEMStatus: Active
This context will be used in all future investigationsto provide more relevant analysis.Step 5.3: Add FILTER Instruction
Section titled “Step 5.3: Add FILTER Instruction”Reduce noise:
Create a FILTER instruction for my Production project:
"Only investigate incidents with severity P1 (Critical)or P2 (High). Ignore P3 and P4 alerts unless they occurmore than 5 times in 10 minutes."This prevents low-priority alerts from creating unnecessary investigations.
Step 5.4: Add RCA Instruction
Section titled “Step 5.4: Add RCA Instruction”Guide investigation for common scenarios:
Create an RCA instruction for my Production project:
"For database-related incidents:1. Check for slow queries (>1 second)2. Review connection pool usage3. Analyze index usage with EXPLAIN4. Check for table locks or deadlocks5. Suggest specific index improvements"Step 5.5: Test Instructions on Past Incidents
Section titled “Step 5.5: Test Instructions on Past Incidents”Before deploying instructions broadly, test them:
I want to test this new RCA instruction on the databaseincident from yesterday. Apply it to that session and rerunthe investigation.Claude will:
- Apply instruction to specific session
- Rerun the investigation
- Compare new RCA vs original
- Help you decide if it improved the results
See Testing Instructions Guide for detailed workflow.
Step 5.6: Verify Instructions
Section titled “Step 5.6: Verify Instructions”List all instructions for my Production projectFound 3 instructions:
1. Architecture Context (SYSTEM) Status: Active Created: 10 minutes ago
2. Priority Filter (FILTER) Status: Active Created: 5 minutes ago
3. Database Investigation Steps (RCA) Status: Active Created: 2 minutes agoOptimize & Refine
Section titled “Optimize & Refine”Continuous improvement tips for getting the most from NeuBird.
Monitor Investigation Quality
Section titled “Monitor Investigation Quality”Show me analytics for the last 7 daysReview metrics:
- MTTR (Mean Time To Resolution)
- Time Saved vs manual investigation
- Investigation Quality Scores
- Noise Reduction from filtering
Refine Instructions
Section titled “Refine Instructions”Based on investigation results:
- Too many investigations? → Add/adjust FILTER instructions
- Missing context? → Update SYSTEM instructions
- Incomplete RCA? → Add specific RCA instructions
- Duplicate alerts? → Configure GROUPING instructions
Add More Connections
Section titled “Add More Connections”As your needs grow:
Add Azure connection for our secondary regionAdd New Relic for application monitoringCreate Environment-Specific Projects
Section titled “Create Environment-Specific Projects”Create "Staging" projectCreate "Development" projectEach project can have different:
- Connections (staging vs prod resources)
- Instructions (different investigation depth)
- Alert thresholds
Next Steps
Section titled “Next Steps”- Daily Workflows
Common investigation patterns for day-to-day use
- Testing Instructions
Learn to validate and test instructions safely
- Advanced Workflows
Power user techniques and optimization
Summary Checklist
Section titled “Summary Checklist”By the end of this guide, you should have:
- Installed and configured NeuBird MCP
- Added cloud connections (AWS, Datadog, etc.)
- Created your first project
- Set it as your default project
- Run a test investigation
- Investigated real alerts
- Configured investigation instructions
- Reviewed RCA and executed corrective actions
- Tested and refined instructions
Congratulations! You’re now ready to use NeuBird MCP for autonomous incident investigation.
Getting Help
Section titled “Getting Help”- Inline Help: Ask Claude “How do I…” and use the guidance system
- Documentation: Browse other guides
- Examples: See real-world examples
- Support: Contact NeuBird for assistance