Skip to content

Complete Onboarding

Step-by-step guide to set up NeuBird MCP and start investigating incidents.

This guide walks you through the complete onboarding process for NeuBird MCP:

graph LR
    A[Install MCP] --> B[Add Connections]
    B --> C[Create Project]
    C --> D[Run Test Investigation]
    D --> E[Find & Investigate Alerts]
    E --> F[Configure Instructions]
    F --> G[Optimize & Refine]

📦 Prerequisites & Installation (5-10 minutes)

  • Install NeuBird MCP Server and configure your AI client
  • See Installation Guide for details

🔗 Phase 1: Add Connections (10-15 minutes)

  • Connect your cloud providers (AWS, Azure, GCP)
  • Link monitoring tools (Datadog, PagerDuty, etc.)
  • Wait for initial data sync

📁 Phase 2: Create Project (5 minutes)

  • Set up your first project
  • Link connections to the project
  • Set as default project

🔍 Phase 3: Run First Investigation (5-10 minutes)

  • Create a manual investigation to test the system
  • Review the RCA (Root Cause Analysis)
  • Understand the investigation output

🎯 Phase 4: Find & Investigate Real Alerts (10-15 minutes)

  • List uninvestigated alerts from your systems
  • Investigate a real incident
  • Execute corrective actions

📋 Phase 5: Configure Instructions (15-20 minutes)

  • Add SYSTEM instructions for context
  • Create FILTER instructions to reduce noise
  • Test RCA instructions on past incidents

Total time: 50-75 minutes

Phased Approach

You don’t need to complete everything in one sitting! Many teams split this across multiple sessions:

  • Day 1: Install + Connections + Project (20-30 min)
  • Day 2: First investigations + Instructions (30-45 min)

Before starting, ensure you have:

  • NeuBird account (book demo if needed)
  • NeuBird MCP installed (see Installation Guide)
  • AI client configured (Claude Desktop, Claude Code, Cursor, or GitHub Copilot)
  • Cloud provider access (AWS, Azure, or GCP credentials)
  • Monitoring tool credentials (Datadog, PagerDuty, New Relic, etc.)

Installation Required

If you haven’t installed NeuBird MCP yet, follow the Installation Guide first, then return here.

Connect your cloud providers and monitoring tools to NeuBird.

NeuBird needs access to your infrastructure to investigate incidents:

Connection TypeWhat It ProvidesRequired?
AWSCloudWatch logs/metrics, EC2, RDS, Lambda, etc.✅ Recommended
AzureAzure Monitor, App Insights, VMs✅ If using Azure
GCPCloud Logging, Monitoring, Compute✅ If using GCP
GrafanaPrometheus, Loki, Tempo✅ If using Grafana with Kubernetes
DatadogLogs, metrics, traces, APM🟡 Optional but helpful
DynatraceAPM, infrastructure monitoring🟡 Optional
PagerDutyAlert management, on-call schedules🟡 Optional
ServiceNowAlert management, on-call schedules🟡 Optional
FireHydrantAlert management, on-call schedules🟡 Optional
Incident.ioAlert management, on-call schedules🟡 Optional

Start minimal

Begin with your primary cloud provider + one monitoring tool + one incident management tool. You can add more connections later.

Ask Claude:

Create an AWS connection for my production environment

Claude will guide you through providing:

  • IAM Role ARN and ExternalId
  • Regions to monitor
  • Connection name (e.g., “Production AWS”)

Example:

✓ Created AWS connection: Production AWS
Role ARN: arniam:role/neubird-readonly
Status: Syncing (this may take 5-10 minutes)
ExternalID: <your-external-id>
Regions: us-east-1, us-west-2

Learn more about creating your AWS role here: AWS Connection Setup

Check sync status:

Check the status of my AWS connection

Response:

Connection: Production AWS
Status: SYNCED ✓
Last sync: 2 minutes ago
Resources discovered:
- 45 EC2 instances
- 12 RDS databases
- 23 Lambda functions
- 156 CloudWatch alarms

First sync takes time

Initial sync can take 2-5 minutes depending on resource count. You can proceed to the next phase while it syncs. Be patient, it’s worth it!

Add Datadog:

Add a Datadog connection with my API key and app key

Learn more about creating your Datadog API keys here: Datadog Connection Setup

Add PagerDuty:

Connect my PagerDuty account for alert management

Learn more about creating your PagerDuty API keys here: PagerDuty Connection Setup

List all my NeuBird connections

Expected output:

Found 2 connections:
1. Production AWS (AWS)
Status: SYNCED and TRAINED ✅
Telemetry Types: Alert, Config, Log, Metric
2. Production Datadog (Datadog)
Status: SYNCED and TRAINED ✅
Telemetry Types: Alert, Log, Metric

Projects organize your connections, instructions, and investigation history.

A NeuBird project organizes:

  • Connections - Which cloud/monitoring tools to use
  • Instructions - How to investigate incidents
  • Sessions - Investigation history

Most teams start with one project per environment (Production, Staging, etc.).

Ask Claude:

Create a new NeuBird project called "Production"

Claude will use neubird_create_project:

✓ Created project "Production"
UUID: abc-123-def-456
Status: Active
Next steps:
1. Add connections
2. Configure instructions
3. Start investigating

Save the Project UUID - You’ll need it for later steps.

Set this project as your default to avoid specifying project_uuid in every command:

Set Production as my default project

Or use the UUID directly:

neubird_set_default_project(project_uuid="abc-123-def-456")

Claude will confirm:

✓ Default project set to: Production
UUID: abc-123-def-456
All operations will now use this project by default.

Benefits:

  • 🎯 No need to specify project_uuid in commands
  • 🔄 Easy switching between projects
  • 💬 Use natural language: “Switch to Staging project”

Multiple Projects

If you create multiple projects (Production, Staging, Dev), you can quickly switch between them using neubird_set_default_project. The default persists for your entire MCP session.

Add my AWS and Datadog connections to the Production project

Claude will use neubird_add_connection_to_project:

✓ Added 2 connections to Production project
- Production AWS (AWS)
- Production Datadog (Datadog)
Project is now ready for investigations!
Show me details for my Production project

This uses neubird_get_project_details:

Production Project Details
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
UUID: abc-123-def-456
Status: Active
Created: Just now
Connections (2):
- Production AWS (AWS) - SYNCED
- Production Datadog (Datadog) - SYNCED
Instructions: None yet
Sessions: None yet

Test the system with a manual investigation before waiting for real alerts.

Create a manual investigation to verify everything works:

Investigate potential memory leak in user-api pods that I noticed this morning.
Memory usage increased from 500MB to 1.2GB between 8am-10am UTC today.
No alerts fired yet but trending upward.

Claude will use neubird_create_manual_investigation:

✓ Created manual investigation
Session UUID: xyz-789-abc-123
Status: Running
Investigation will complete in 2-5 minutes.
Analyzing:
- CloudWatch metrics for memory usage
- Pod restart patterns
- Application logs
- Resource allocation changes

You can check status:

What's the status of my investigation?

Or wait 2-5 minutes and then get the RCA.

Show me the root cause analysis

Claude uses neubird_get_rca:

Root Cause Analysis
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Incident: Memory Leak - user-api
Severity: P2 (High)
Duration: Ongoing (2 hours)
Status: Active
ROOT CAUSE:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Memory leak caused by unclosed database connections
in the user session handler. Connection pool reached
maximum capacity, preventing cleanup of old connections.
TIMELINE:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
08:00 UTC - Normal memory usage (500MB)
08:15 UTC - Gradual increase begins
09:00 UTC - Memory at 800MB (60% increase)
09:30 UTC - Connection pool at 85% capacity
10:00 UTC - Memory at 1.2GB (140% increase)
10:15 UTC - Connection pool at max (100 connections)
CORRECTIVE ACTIONS:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Immediate:
1. Restart affected pods to clear connections
kubectl rollout restart deployment/user-api
2. Temporarily increase connection pool timeout
kubectl set env deployment/user-api DB_POOL_TIMEOUT=30000
Long-term fixes:
1. Fix connection leak in code (user-session.js:45)
2. Implement connection pool monitoring
3. Add automated cleanup for stale connections
TIME SAVED:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Manual investigation time: ~30 minutes
NeuBird investigation time: 2 minutes
Time saved: 28 minutes ⚡

Every RCA includes:

  1. Incident Summary - What happened and severity
  2. Root Cause - Why it happened with technical details
  3. Timeline - Chronological event sequence
  4. Corrective Actions - Ready-to-execute fixes
  5. Time Savings - Efficiency metrics

Test Complete!

If you got a detailed RCA like above, your setup is working correctly! The system successfully:

  • Connected to your cloud provider
  • Analyzed metrics and logs
  • Generated actionable insights

Now that the system is working, investigate real incidents from your infrastructure.

Show me uninvestigated incidents from the last 24 hours

This uses neubird_list_sessions with only_uninvestigated=true:

Found 3 uninvestigated incidents:
1. High API Latency - user-service
Alert ID: /aws/cloudwatch/alerts/latency-spike-123
Severity: P1
Time: 2 hours ago
Source: CloudWatch
2. Database Connection Pool Exhausted
Alert ID: /datadog/alerts/db-pool-456
Severity: P2
Time: 4 hours ago
Source: Datadog
3. Lambda Cold Start Timeout
Alert ID: /aws/cloudwatch/alerts/lambda-789
Severity: P3
Time: 6 hours ago
Source: CloudWatch
Investigate the high API latency incident

Claude will:

  1. Extract the alert_id
  2. Call neubird_investigate_alert
  3. Wait for completion (30-90 seconds)
  4. Retrieve RCA automatically
🔍 Starting investigation...
⏳ Analyzing CloudWatch logs... (20s)
⏳ Checking application traces... (15s)
⏳ Reviewing database metrics... (10s)
⏳ Correlating events... (15s)
✓ Investigation complete! (60s total)

The RCA will provide:

  • Root cause explanation
  • Timeline of events
  • Corrective actions (with bash commands)
  • Preventive measures

Execute the recommended actions:

Terminal window
# Example corrective action from RCA
kubectl scale deployment/user-service --replicas=5
Why wasn't this caught in staging?
Has this happened before?
What can we do to prevent it?

Claude uses neubird_continue_investigation to provide deeper insights.

Fine-tune how NeuBird investigates incidents by adding instructions.

TypePurposeWhen to UseExample
SYSTEMProvide context about your architectureAlways (start with 1-2)“We use microservices on Kubernetes”
FILTERReduce noise by filtering low-priority alertsWhen getting too many alerts”Only investigate P1 and P2 incidents”
RCAGuide investigation steps for specific scenariosFor common incident types”For database issues, check slow queries first”
GROUPINGGroup related alerts togetherWhen seeing duplicate alerts”Group alerts from same service within 5 min”

Provide high-level context:

Create a SYSTEM instruction for my Production project:
"Our infrastructure runs on AWS EKS with 15 microservices.
We use PostgreSQL for databases and Redis for caching.
Peak traffic is 9am-5pm EST. We have auto-scaling enabled
for all services with min 2, max 10 replicas."

Claude uses neubird_create_project_instruction:

✓ Created SYSTEM instruction
Type: SYSTEM
Status: Active
This context will be used in all future investigations
to provide more relevant analysis.

Reduce noise:

Create a FILTER instruction for my Production project:
"Only investigate incidents with severity P1 (Critical)
or P2 (High). Ignore P3 and P4 alerts unless they occur
more than 5 times in 10 minutes."

This prevents low-priority alerts from creating unnecessary investigations.

Guide investigation for common scenarios:

Create an RCA instruction for my Production project:
"For database-related incidents:
1. Check for slow queries (>1 second)
2. Review connection pool usage
3. Analyze index usage with EXPLAIN
4. Check for table locks or deadlocks
5. Suggest specific index improvements"

Step 5.5: Test Instructions on Past Incidents

Section titled “Step 5.5: Test Instructions on Past Incidents”

Before deploying instructions broadly, test them:

I want to test this new RCA instruction on the database
incident from yesterday. Apply it to that session and rerun
the investigation.

Claude will:

  1. Apply instruction to specific session
  2. Rerun the investigation
  3. Compare new RCA vs original
  4. Help you decide if it improved the results

See Testing Instructions Guide for detailed workflow.

List all instructions for my Production project
Found 3 instructions:
1. Architecture Context (SYSTEM)
Status: Active
Created: 10 minutes ago
2. Priority Filter (FILTER)
Status: Active
Created: 5 minutes ago
3. Database Investigation Steps (RCA)
Status: Active
Created: 2 minutes ago

Continuous improvement tips for getting the most from NeuBird.

Show me analytics for the last 7 days

Review metrics:

  • MTTR (Mean Time To Resolution)
  • Time Saved vs manual investigation
  • Investigation Quality Scores
  • Noise Reduction from filtering

Based on investigation results:

  1. Too many investigations? → Add/adjust FILTER instructions
  2. Missing context? → Update SYSTEM instructions
  3. Incomplete RCA? → Add specific RCA instructions
  4. Duplicate alerts? → Configure GROUPING instructions

As your needs grow:

Add Azure connection for our secondary region
Add New Relic for application monitoring
Create "Staging" project
Create "Development" project

Each project can have different:

  • Connections (staging vs prod resources)
  • Instructions (different investigation depth)
  • Alert thresholds
  • Daily Workflows

Common investigation patterns for day-to-day use

Daily Workflows

  • Testing Instructions

Learn to validate and test instructions safely

Testing Instructions

  • Advanced Workflows

Power user techniques and optimization

Advanced Workflows

By the end of this guide, you should have:

  • Installed and configured NeuBird MCP
  • Added cloud connections (AWS, Datadog, etc.)
  • Created your first project
  • Set it as your default project
  • Run a test investigation
  • Investigated real alerts
  • Configured investigation instructions
  • Reviewed RCA and executed corrective actions
  • Tested and refined instructions

Congratulations! You’re now ready to use NeuBird MCP for autonomous incident investigation.