AI & Cybersecurity

Red Team the Chatbot

Focus: Adversarial thinking and identifying "unknown unknown" vulnerabilities. This lab is a simplified version of adversarial testing.

Setup: Students will be given access to a simple, pre-built customer service chatbot for a fictional airline. The chatbot is programmed with basic information about flights, baggage policies, and refunds. It has also been given standard "guardrails" to prevent it from discussing inappropriate topics.

Tasks:

Brainstorm Attack Vectors: In groups, students will brainstorm ways to "break" the chatbot. The goal is not just to get a wrong answer, but to make the AI fail in an interesting, damaging, or unexpected way. They should think like a frustrated customer, a bad actor, or just a curious user trying to find the limits.
Execute the "Jailbreak": Students will interact with the chatbot, trying the prompts and strategies they brainstormed. They should document their attempts and the chatbot's responses, capturing screenshots of any notable failures.
Analyze the Failure: For each successful "jailbreak" or significant failure, the group must analyze the risk it exposes.
- What is the potential harm (e.g., reputational damage, legal liability, providing dangerously incorrect information)?
- Does this failure represent an "unknown unknown" the developers likely missed?
- How could a human-in-the-loop (HITL) system have prevented this specific failure from reaching the customer? 17
Propose a "Patch": For their most significant finding, each group will propose a specific change to the chatbot's design or guardrails to prevent that type of failure in the future.

Learning Outcomes:

Develop an adversarial mindset for testing AI systems.
Understand the limitations of pre-programmed safety guardrails.
Appreciate the role of human-in-the-loop interventions as a critical risk management tool18.