Give your AI agent
a real debugger
Empowering agents with structured debugging
25 MCP tools for GitHub Copilot — trace variables, validate hypotheses, safety-check fixes — before touching a single line of code. Across JS, Python, and Java.
See It in Action
Watch AgentProbe guide Copilot through a real multi-bug diagnosis
What it actually does
Six diagnostic capabilities that add structured verification to how Copilot investigates bugs — each one filling a specific gap.
Evidence Before Action
validate_hypothesis returns a hard true/false with file+line evidence tokens before a single character is changed.
Pre-flight Safety Check
check_suggestion scans each proposed fix against logic guards and known violation patterns before the edit lands.
Exact Variable Tracing
trace_value_flow traces a variable from assignment through every read/write — surfacing the exact race window.
Team-Searchable Audit Trail
Sessions saved with root cause, fix, evidence, and keywords. Searchable by any teammate months later.
Failing Fast on Scope
detect_error_pattern returns explicit "no match" signals — no time wasted chasing patterns that do not apply.
You Stay in Control
Direct the tools — "trace this, validate that, don't fix yet" — and decide when to act on the evidence.
Language & debug mode support
Three languages with full rule sets, version-aware context, and a dedicated debug protocol adapter for each runtime.
JavaScript & TypeScript
Debugger: CDP — Chrome DevTools Protocol
Frameworks
100+ error patterns
- Null / undefined access
- Async / Promise errors
- Type mismatches
- Module resolution
- Memory & stack
- Syntax errors
Python
Debugger: DAP — Debug Adapter Protocol
Frameworks
30+ error patterns
- None / NoneType errors
- AttributeError / KeyError
- ImportError / NameError
- Async event loop errors
- Django / ORM exceptions
- Pandas / NumPy type errors
Java
Debugger: JDWP — Java Debug Wire Protocol
Frameworks
15+ error patterns
- NullPointerException
- ClassNotFoundException
- LazyInitializationException
- OutOfMemoryError (heap + metaspace)
- ClassCastException
- EJB / container errors
Installation Guide
Get AgentProbe running in your editor in under 2 minutes
Pre-requisites
Node.js 18+
Required for VS Code extension & CLI
Java 17+
Required for IntelliJ plugin
VS Code 1.85+
With GitHub Copilot enabled
IntelliJ IDEA 2024.3+
Community or Ultimate edition
Get Started in 2 Steps
Install from Marketplace
Open VS Code → Extensions (Ctrl+Shift+X) → Search “AgentProbe” → Click Install
Start the MCP Server
Open the Command Palette (Cmd+Shift+P) → Run “AgentProbe: Start MCP Server”
That’s it. AgentProbe is now available as a tool inside GitHub Copilot Chat.
Quick Reference
Take the MetricRegistry bug from the demo. Every timer reported the same average — 8.23ms — regardless of how long each pipeline step actually took. Here is what finding it looks like, with and without AgentProbe.
// Copilot opens MetricRegistry.java
// and reads through the logic manually.
recordTimer(key, ms) {
DoubleSummaryStatistics stats =
timers.computeIfAbsent(
key,
k -> sharedTimerStats // suspects aliasing
);
stats.accept(ms);
}
// Forms a hypothesis from reading alone.
// No verification — edits immediately.// Step 1: trace_value_flow("sharedTimerStats")
→ assigned : MetricRegistry.java line 12
→ aliased read : line 20 (every key)
→ all keys map to the same object
// Step 2: validate_hypothesis(
// "sharedTimerStats aliases all timers")
→ hypothesis_valid : true
→ evidence : MetricRegistry.java:20
// Step 3: check_suggestion(fix)
→ safe : true, no violations
// Now edit with hard evidence, not a guess.The fix is identical either way — replace sharedTimerStats with new DoubleSummaryStatistics() per key. The difference is certainty. AgentProbe gives machine-verified evidence and a safety check before a single line is changed.
Debug sessions that don’t disappear
Root cause, fix, evidence, keywords — saved as structured JSON, searchable by anyone on the team
{
"root_cause": "sharedTimerStats singleton aliased to all timer keys",
"file": "MetricRegistry.java",
"evidence_line": 20,
"fix_applied": "new DoubleSummaryStatistics() per key",
"keywords": ["DoubleSummaryStatistics", "alias", "metrics", "timer"],
"timestamp": "2026-03-28T14:22:00Z"
}Searchable Knowledge Base
A new engineer hits the same bug and gets the full diagnosis in seconds.
Built-in Audit Trail
Auditable record of what was found, what evidence supported it, and what was verified safe.
Faster Incident Response
On-call engineers search sessions by symptom. 5 min investigation becomes 30 seconds.
Junior Dev Acceleration
Sessions written in plain language with file and line references — learn the pattern, not just the fix.
Side-by-Side Comparison
Copilot working alone vs. Copilot with AgentProbe
| Dimension | Copilot Alone | Copilot + AgentProbe |
|---|---|---|
| How bugs are found | ×Read code, reason manually, form hypothesis | ✓trace_value_flow pinpoints write→read windows with line numbers |
| Hypothesis confidence | ×Based on expertise alone — could be wrong | ✓validate_hypothesis returns true/false + evidence before editing |
| Safety before edit | ×None — hypothesis immediately becomes a code change | ✓check_suggestion flags violations before any file is touched |
| Post-fix artifact | ×Just a diff — no record of why or what was ruled out | ✓Structured session: root cause, fix, evidence, keywords, timestamp |
| Team reuse | ×Next developer starts from scratch | ✓Searchable by keyword — same bug surfaces the full diagnostic in seconds |
| Speed | ✓~2 min — fewer round-trips | ~~5 min — tool round-trips add latency but add certainty |
| Token cost | ✓~3,500–4,000 tokens | ~~4,800–5,500 tokens — higher due to tool payloads |
| Best for | ×Known codebase, expert who understands the system | ✓Unknown codebase, onboarding, production incidents, audit requirements |
All 25 MCP Tools
The full arsenal. Most sessions use 3–5 tools. Reach for the right one at the right depth.
gather_contextInvestigateScans runtime, dependencies, git history, and project type to orient the investigation.
+ Copilot reads files you point to. AgentProbe discovers runtime, dependency graph, and git changes automatically.
detect_error_patternInvestigateMatches error message against known failure patterns with remediation guidance.
+ Copilot always produces a suggestion. AgentProbe gives confidence-bounded "no match" signals too.
decompose_errorInvestigateBreaks a compound error into atomic sub-problems with individual investigation paths.
+ Produces structured decomposition you can step through one sub-problem at a time.
trace_call_chainInvestigateWalks the full call stack annotating each frame with what it mutates or reads.
+ Tracks across thread boundaries and interface dispatch where manual reading loses the trail.
trace_value_flowInvestigateTraces a variable from assignment through every read/write site with file+line evidence.
+ Returns structured evidence: "assigned line 27, read in lambda line 30" — the race window is explicit.
explain_logicInvestigateProduces a structured logic map: preconditions, postconditions, branches, side effects.
+ Enumerable branches, flagged side effects, and targeted Socratic questions.
find_breaking_changeInvestigateScans recent commits to find the specific change that introduced the regression.
+ Fetches and correlates diffs automatically — often surfacing the culprit commit in seconds.
validate_hypothesisValidateTakes a plain-English hypothesis and returns a hard true/false with file+line citations.
+ Returns a structured verdict with explicit evidence tokens — refutable and persistable.
check_logic_guardsValidateScans a file for missing input guards, unchecked nulls, and invariant gaps.
+ Proactively enumerates every guard gap without prompting.
check_suggestionValidateEvaluates a proposed code change against known violation patterns before the file is touched.
+ Pre-screens every fix — the file is never edited unless the safety check passes.
generate_logic_guardValidateGenerates defensive guard code tailored to specific identified risk points.
+ Guards target specific identified gaps, not generic templates.
capture_value_snapshotValidateRecords the value of a variable at a specific execution point for comparison across runs.
+ Bridges static reasoning and live behaviour — Copilot has no runtime access.
plan_breakpointsDebugProduces a prioritised breakpoint list with conditional expressions ready for your debugger.
+ Returns ranked list with full conditional expressions — no further reasoning needed.
agentprobe_debugDebugFull AI-assisted debug cycle: hypothesis generation, evidence search, structured finding.
+ Structures the cycle into discrete inspectable phases: gather, hypothesise, validate, conclude.
attach_and_inspectDebugAttaches to a running process and inspects heap, thread state, and live variables.
+ Live runtime introspection that no static LLM can replicate from source alone.
list_debuggable_processesDebugEnumerates all running processes that expose a debug port.
+ Surfaces debug context automatically — no manual jps or ps needed.
request_debug_sessionDebugOpens a structured session with problem statement, scope, and expected outcome.
+ Forces a crisp problem statement: what is in scope, who owns it, and what "done" looks like.
create_reproDebugGenerates a minimal reproducible test case for the identified bug.
+ Derives repro from structured diagnosis — tests the exact failure path that was confirmed.
save_debug_sessionKnowledgePersists the full session as structured JSON to .agentprobe/sessions/.
+ When chat closes, diagnosis survives — searchable, auditable, reusable.
search_past_sessionsKnowledgeFull-text and keyword search across all saved sessions.
+ Teams accumulate institutional debugging knowledge that compounds over time.
publish_summaryKnowledgePushes sanitised session summary to a shared team index.
+ Broadcasts structured knowledge to the whole team automatically.
post_to_fixloreKnowledgeSubmits the fix pattern to the FixLore community knowledge base.
+ Your solution actively prevents the same bug for someone else tomorrow.
summarize_sessionKnowledgeProduces consistent human-readable summary for incident reports or PR descriptions.
+ Derived from session schema — complete and consistent across different authors.
should_i_keep_tryingGuidanceEvaluates investigation state and advises whether to keep digging, pivot, or escalate.
+ Gives explicit evidence-based inflection point instead of cycling endlessly.
search_solutionsGuidanceSearches curated index of validated fixes filtered by language, framework, and error type.
+ Curated validated fixes from real production incidents — not undated training data.
Try it on your next stuck bug
Two-minute install. Works inside VS Code with GitHub Copilot.