How It Works

NeuBird takes a fundamentally different approach to AI-powered operations. Instead of summarizing dashboards or running predefined playbooks, it performs genuine investigation — querying raw telemetry, following the evidence, and building a root cause analysis from first principles.

Predictive Analysis

NeuBird doesn’t just react to incidents — it predicts them. By continuously analyzing trends across your telemetry stack, NeuBird identifies degradation signatures before they trigger alerts.

How prediction works:

Trend correlation — NeuBird reads metrics, logs, traces, incidents, and topology data together. It detects patterns that span multiple systems — for example, a slow memory leak in one service combined with increasing queue depth in another.
Capacity forecasting — By analyzing resource consumption trends over time, NeuBird projects when you’ll hit capacity limits — disk, memory, connection pools, certificate expirations — and flags them before they become critical.
Anomaly detection — NeuBird learns the baseline behavior of your services through historical telemetry. When it detects statistically significant deviations — even ones that haven’t yet crossed alerting thresholds — it raises them proactively.
Change risk assessment — When a deployment or configuration change is detected, NeuBird evaluates it against historical patterns of similar changes to assess the likelihood of negative impact.

Predictive analysis uses the same telemetry access layer and multi-model architecture described below — the difference is timing. Instead of investigating after an alert fires, NeuBird runs continuous sweeps (via health assessments and sentinel mode) to surface issues while they’re still cheap to fix.

The Investigation Process

When an incident occurs or an investigation is started, NeuBird follows a systematic process.

Hypothesis Formation

NeuBird searches a vector database (ChromaDB) for similar past incidents and proven investigation patterns. This gives it a starting point — not a rigid playbook, but hypotheses to test against the current situation.

As NeuBird investigates more incidents in your environment, this knowledge base grows, making future investigations faster and more targeted.
Dynamic Investigation Plan

Using the hypotheses from step 1 and the details of the current incident, NeuBird constructs a detailed investigation plan. The plan adapts based on:
- The type of incident (performance, availability, configuration, deployment)
- Available telemetry sources (which monitoring tools are connected)
- Your architecture (what services and dependencies exist)
- Historical patterns (what investigation strategies have worked before)
No telemetry data is included in the prompts to the LLM — only metadata about what data sources are available.
Telemetry Program Generation

This is where NeuBird’s innovation stands out. Instead of sending your sensitive telemetry data to an LLM, NeuBird generates telemetry retrieval programs — code that fetches and processes the data.

A fine-tuned LLM writes these programs using NeuBird’s RAEL (telemetry reasoning language) with LARK grammar files, ensuring the generated queries are syntactically valid and produce reliable results. The LLM only handles program logic, never the data itself.
Secure Data Processing

The telemetry programs execute in a secure, ephemeral runtime environment:
- Data is correlated across multiple sources
- Mathematical analysis and calculations run in isolated Python environments
- Each customer’s data is strictly memory-isolated
- All data is automatically purged after the investigation completes
Real-Time Data Access

NeuBird’s data access layer uses a unified query syntax across all integrations. The fine-tuned LLM generates queries in this common syntax with high accuracy, enabling reliable cross-tool data retrieval regardless of the underlying platform.

The access layer:
- Uses temporary credentials with minimal scope
- Implements read-only access across all integrations
- Supports all major cloud providers and observability tools
- Never stores telemetry data on disk
- Uses schema-on-read technology, avoiding issues with schema drift
Iterative Refinement

NeuBird doesn’t stop at the first answer. It iterates — testing hypotheses, following new leads discovered in the data, and refining its analysis. Each step is logged in a full audit trail so you can see exactly what NeuBird investigated and why.

This iterative process is driven by assertions on the processed data, allowing NeuBird to adapt without sending telemetry to the LLM.
Root Cause Analysis

Once the investigation converges, NeuBird produces:
- Detailed root cause analysis — the specific change or event that caused the issue
- Evidence chain — every finding backed by data
- Recommended actions — specific remediation steps
The final analysis is generated by a privately hosted open-source LLM to keep telemetry data protected.

MELT+ Telemetry

Most AI operations tools are RAG on dashboards — they screenshot dashboards and summarize what they see. This limits them to whatever a dashboard was designed to show.

NeuBird is different. It directly queries raw telemetry data:

Data Type	What NeuBird Accesses
Metrics	CPU, memory, latency, error rates, custom metrics — raw time series, not dashboard views
Events	Deployments, config changes, scaling events, alerts
Logs	Application logs, system logs, audit trails — with full-text search
Traces	Distributed traces across service boundaries
+ Config	Kubernetes manifests, infrastructure-as-code, resource definitions, deployment configurations
+ Code	Source code, deployment scripts, CI/CD pipelines

This MELT+ approach enables cross-system correlations that dashboard-based tools simply cannot perform.

Context Engineering

NeuBird’s core technical advantage is context engineering — the discipline of giving an LLM the right information, in the right format, at the right time.

The foundation is data virtualization: a unified schema across all observability tools (Splunk, Prometheus, Grafana, Datadog, CloudWatch, Azure Monitor, and more). This means a single investigation can seamlessly correlate data from Datadog metrics with Splunk logs with AWS CloudTrail events — without the engineer needing to know the query languages for each tool.

Multi-Model Architecture

NeuBird uses different LLMs for different tasks, selecting the best model for each job:

Role	Model	Purpose
Telemetry programs	Fine-tuned Llama	Generates data retrieval programs with high accuracy
Investigation reasoning	Claude	Deep causal reasoning for complex multi-factor analysis
Memory & context	Haiku	Fast context selection and investigation digests
Final RCA	Privately hosted open-source LLM	Produces final analysis without exposing data to external services