July 1, 2026·4 min readsecurityaiappsecengineering

A Pre-Flight Scan for an AI Agent on Your Data. Pointing Benteng at Claude Science.

Q: Why: Claude Science lands on four of the LLM Top 10 at once?

- LLM01 Prompt Injection. The agent reads your documents. Any one of them, a PDF from Drive, a pasted note, can carry an instruction that hijacks the agent. - LLM03 Supply Chain. Every MCP connector is third-party code the agent trusts. A poisoned tool description can steer it. - LLM02 Sensitive Info Disclosure. With D

Q: How: two checks, run before the agent?

1. The MCP connector surface. The pre-flight reads the connector cache and frisks each server's metadata for poisoning and hidden instructions, then maps the read-then-send exfil surface. On the real install it found four connectors, with Google Drive already authorized:

A Pre-Flight Scan for an AI Agent on Your Data. Pointing Benteng at Claude Science.

Anthropic's Claude Science is a local tool with a simple, powerful pitch: run Claude on your own data, in your browser. It spins up a daemon, connects MCP servers like Google Drive and Gmail, and lets an agent read your files and act. That is genuinely useful. It is also, described plainly, the exact surface the OWASP Top 10 for LLM applications was written about: untrusted data reaching a capable agent that holds live connections to your accounts.

We already build Benteng, a defensive security hub with AI-risk scanners, a prompt-injection detector, an MCP tool-poisoning audit, and the LLM Top 10 with real case studies. So the obvious question: what happens when you point Benteng at a real Claude Science install? We did, and turned it into a pre-flight scan that runs before the agent does.

Who and what: Benteng checking an AI agent, on the user's own machine

This is defensive and authorized by construction. The pre-flight reads only non-secret config, the MCP directory cache, and the data files you ask it to scan. It never touches the OAuth tokens, the encryption key, or the org database, and it prints no credential values. It is the same rule as the rest of Benteng: point it at what you own, to harden it.

Why: Claude Science lands on four of the LLM Top 10 at once

LLM01 Prompt Injection. The agent reads your documents. Any one of them, a PDF from Drive, a pasted note, can carry an instruction that hijacks the agent.
LLM03 Supply Chain. Every MCP connector is third-party code the agent trusts. A poisoned tool description can steer it.
LLM02 Sensitive Info Disclosure. With Drive and Gmail wired in, secrets and PII are one bad instruction away from leaving.
LLM06 Excessive Agency. Read plus send, live at the same time, is all an exfiltration path needs.

How: two checks, run before the agent

1. The MCP connector surface. The pre-flight reads the connector cache and frisks each server's metadata for poisoning and hidden instructions, then maps the read-then-send exfil surface. On the real install it found four connectors, with Google Drive already authorized:

1. MCP connector surface
   ● Google Drive (authorized, streamable_http)
   ○ Google Calendar (unauthorized)
   ○ Windsor.ai (unauthorized)
   ○ Gmail (unauthorized)
   PASS  no poisoning, hidden text, or secret-harvesting found
   INFO  latent exfil pair if both authorized: readers [Drive, Calendar, Gmail]
         + senders [Windsor.ai, Gmail] — keep least-privilege

Clean, and honest: the connectors are legitimate, so nothing is flagged, but the tool still names the latent exfil pair. Once both a reader (Drive) and a sender (Gmail) are authorized at the same time, one poisoned document is the whole attack. That is the least-privilege reminder the report exists to make.

2. The data the agent will read. The pre-flight walks the data directory and runs the injection and secret scanners over every text file. Against the shipped science examples, 44 files, it passed clean. Against a directory we planted with a poisoned note and a config file, it blocked:

assay_notes.md
  FAIL  hidden text: zero-width / joiner characters
  FAIL  prompt injection: override of prior instructions
  FAIL  prompt injection: instruction to hide activity from the user
  FAIL  prompt injection: instruction to send data to a destination
pipeline.env
  FAIL  AWS access key ID (AKIA…LE)
  FAIL  GitHub token (ghp_…Rt)

PRE-FLIGHT BLOCKED — 6 fail, 0 warn

The poisoned note looks like ordinary assay notes. Hidden in it, between two zero-width characters a human never sees on screen, is an instruction to read the Drive token, email it out, and stay quiet about it. This is indirect prompt injection, the LLM01 case, and it is exactly the kind of file an agent reading your Drive would ingest without a second thought. The pre-flight catches it before the agent ever sees it, and exits non-zero so it can gate a run.

The takeaway: scan the inputs, not just the model

You cannot make an agent safe by trusting it to notice a trap. The durable move is to check what goes in, the tools it connects and the documents it reads, before it acts, and to keep read and send from being live at the same moment without a gate. That is a small, boring layer, and it is the one that catches the token-in-a-note attack. Benteng is the scanner; the pre-flight is one script wiring it in front of an agent. Both live at palugadahub.com/sec, defensive and free.

Building an AI agent?

I'm packaging how I ship them into one kit. Early access:

AI Agent Starter Kit →