How It Works
Neubird takes a fundamentally different approach to AI-powered operations. Instead of summarizing dashboards or running predefined playbooks, it performs genuine investigation — querying raw telemetry, following the evidence, and building a root cause analysis from first principles.
Predictive Analysis
Section titled “Predictive Analysis”Neubird doesn’t just react to incidents — it predicts them. By continuously analyzing trends across your telemetry stack, Neubird identifies degradation signatures before they trigger alerts.
How prediction works:
- Trend correlation — Neubird reads metrics, logs, traces, incidents, and topology data together. It detects patterns that span multiple systems — for example, a slow memory leak in one service combined with increasing queue depth in another.
- Capacity forecasting — By analyzing resource consumption trends over time, Neubird projects when you’ll hit capacity limits — disk, memory, connection pools, certificate expirations — and flags them before they become critical.
- Anomaly detection — Neubird learns the baseline behavior of your services through historical telemetry. When it detects statistically significant deviations — even ones that haven’t yet crossed alerting thresholds — it raises them proactively.
- Change risk assessment — When a deployment or configuration change is detected, Neubird evaluates it against historical patterns of similar changes to assess the likelihood of negative impact.
Predictive analysis uses the same telemetry access layer and multi-model architecture described below — the difference is timing. Instead of investigating after an alert fires, Neubird runs continuous sweeps (via health assessments and sentinel mode) to surface issues while they’re still cheap to fix.
The Investigation Process
Section titled “The Investigation Process”When an incident occurs or an investigation is started, Neubird follows a systematic process.
-
Hypothesis Formation
Neubird searches a vector database (ChromaDB) for similar past incidents and proven investigation patterns. This gives it a starting point — not a rigid playbook, but hypotheses to test against the current situation.
As Neubird investigates more incidents in your environment, this knowledge base grows, making future investigations faster and more targeted.
-
Dynamic Investigation Plan
Using the hypotheses from step 1 and the details of the current incident, Neubird constructs a detailed investigation plan. The plan adapts based on:
- The type of incident (performance, availability, configuration, deployment)
- Available telemetry sources (which monitoring tools are connected)
- Your architecture (what services and dependencies exist)
- Historical patterns (what investigation strategies have worked before)
No telemetry data is included in the prompts to the LLM — only metadata about what data sources are available.
-
Telemetry Program Generation
This is where Neubird’s innovation stands out. Instead of sending your sensitive telemetry data to an LLM, Neubird generates telemetry retrieval programs — code that fetches and processes the data.
A fine-tuned LLM writes these programs using NeuBird’s RAEL (telemetry reasoning language) with LARK grammar files, ensuring the generated queries are syntactically valid and produce reliable results. The LLM only handles program logic, never the data itself.
-
Secure Data Processing
The telemetry programs execute in a secure, ephemeral runtime environment:
- Data is correlated across multiple sources
- Mathematical analysis and calculations run in isolated Python environments
- Each customer’s data is strictly memory-isolated
- All data is automatically purged after the investigation completes
-
Real-Time Data Access
Neubird’s data access layer uses a unified query syntax across all integrations. The fine-tuned LLM generates queries in this common syntax with high accuracy, enabling reliable cross-tool data retrieval regardless of the underlying platform.
The access layer:
- Uses temporary credentials with minimal scope
- Implements read-only access across all integrations
- Supports all major cloud providers and observability tools
- Never stores telemetry data on disk
- Uses schema-on-read technology, avoiding issues with schema drift
-
Iterative Refinement
Neubird doesn’t stop at the first answer. It iterates — testing hypotheses, following new leads discovered in the data, and refining its analysis. Each step is logged in a full audit trail so you can see exactly what Neubird investigated and why.
This iterative process is driven by assertions on the processed data, allowing Neubird to adapt without sending telemetry to the LLM.
-
Root Cause Analysis
Once the investigation converges, Neubird produces:
- Detailed root cause analysis — the specific change or event that caused the issue
- Evidence chain — every finding backed by data
- Recommended actions — specific remediation steps
The final analysis is generated by a privately hosted open-source LLM to keep telemetry data protected.
MELT+ Telemetry
Section titled “MELT+ Telemetry”Most AI operations tools are RAG on dashboards — they screenshot dashboards and summarize what they see. This limits them to whatever a dashboard was designed to show.
Neubird is different. It directly queries raw telemetry data:
| Data Type | What Neubird Accesses |
|---|---|
| Metrics | CPU, memory, latency, error rates, custom metrics — raw time series, not dashboard views |
| Events | Deployments, config changes, scaling events, alerts |
| Logs | Application logs, system logs, audit trails — with full-text search |
| Traces | Distributed traces across service boundaries |
| + Config | Kubernetes manifests, infrastructure-as-code, resource definitions, deployment configurations |
| + Code | Source code, deployment scripts, CI/CD pipelines |
This MELT+ approach enables cross-system correlations that dashboard-based tools simply cannot perform.
Context Engineering
Section titled “Context Engineering”Neubird’s core technical advantage is context engineering — the discipline of giving an LLM the right information, in the right format, at the right time.
The foundation is data virtualization: a unified schema across all observability tools (Splunk, Prometheus, Grafana, Datadog, CloudWatch, Azure Monitor, and more). This means a single investigation can seamlessly correlate data from Datadog metrics with Splunk logs with AWS CloudTrail events — without the engineer needing to know the query languages for each tool.
Multi-Model Architecture
Section titled “Multi-Model Architecture”Neubird uses different LLMs for different tasks, selecting the best model for each job:
| Role | Model | Purpose |
|---|---|---|
| Telemetry programs | Fine-tuned Llama | Generates data retrieval programs with high accuracy |
| Investigation reasoning | Claude | Deep causal reasoning for complex multi-factor analysis |
| Memory & context | Haiku | Fast context selection and investigation digests |
| Final RCA | Privately hosted open-source LLM | Produces final analysis without exposing data to external services |