AI Red Teaming Checklist for LLM Applications

HackStack Security field guide

AI red teaming starts with the application, not only the model.

Most production AI systems combine a model with prompts, retrieval, memory, tools, APIs, user roles, and business workflows. Security testing must evaluate the complete system and the decisions it is allowed to make.

1. Map the AI trust boundaries

Identify every model, prompt, retrieval source, tool, API, memory store, and human approval step.
Document which inputs are trusted, untrusted, tenant-specific, or externally sourced.
Define what the AI system must never reveal, change, approve, or execute.

2. Test instruction handling

Attempt direct and indirect prompt injection through user input, uploaded content, retrieved documents, and connected tools.
Test whether system instructions, hidden prompts, or internal reasoning artifacts can be exposed.
Evaluate whether lower-trust content can override higher-priority instructions.

3. Validate data isolation

Test cross-user and cross-tenant retrieval boundaries.
Check whether sensitive data appears in logs, caches, traces, embeddings, or model responses.
Confirm that authorization is enforced by the application, not delegated to the model.

4. Challenge tools and agents

Test whether the model can invoke tools outside the user’s permissions.
Attempt parameter manipulation, chained actions, unsafe retries, and approval bypasses.
Verify that high-impact actions require deterministic controls and appropriate human review.

5. Measure guardrail effectiveness

Record which controls prevent, detect, or limit each scenario. A useful AI red-team report explains the attack path, the business impact, the control gap, and the remediation pattern.