Red Team the Chatbot

Focus: Adversarial thinking and identifying "unknown unknown" vulnerabilities. This lab is a simplified version of adversarial testing.

Setup: Students will be given access to a simple, pre-built customer service chatbot for a fictional airline. The chatbot is programmed with basic information about flights, baggage policies, and refunds. It has also been given standard "guardrails" to prevent it from discussing inappropriate topics.

Tasks:

  1. Brainstorm Attack Vectors: In groups, students will brainstorm ways to "break" the chatbot. The goal is not just to get a wrong answer, but to make the AI fail in an interesting, damaging, or unexpected way. They should think like a frustrated customer, a bad actor, or just a curious user trying to find the limits.
  2. Execute the "Jailbreak": Students will interact with the chatbot, trying the prompts and strategies they brainstormed. They should document their attempts and the chatbot's responses, capturing screenshots of any notable failures.
  3. Analyze the Failure: For each successful "jailbreak" or significant failure, the group must analyze the risk it exposes.
  4. Propose a "Patch": For their most significant finding, each group will propose a specific change to the chatbot's design or guardrails to prevent that type of failure in the future.

Learning Outcomes: