Problems to be addressed - Unmasking Bias in Chatbot Datasets

Lesson Plan: Unmasking Bias in Chatbot Datasets

This independent learning lesson explores how biased datasets can affect the performance and ethics of chatbots. Students will learn about different types of biases, analyse chatbot scenarios to identify potential biases, and research real-world examples of biased AI. Finally, they will brainstorm strategies to mitigate these biases and promote fairness in chatbot development.

Lesson Plan: Unmasking Bias in Chatbot Datasets

Subject: IB Computer Science HL
Duration: 60 minutes
Focus Area: 2025 The Perfect Chatbot - Dataset Biases

Learning Objectives:

  • Define and explain various types of dataset biases.
  • Analyze chatbot scenarios and identify potential training data biases.
  • Propose strategies to mitigate dataset biases in chatbot development.

Materials:

  • Case study document: "The Perfect Chatbot"
  • Worksheet with chatbot scenarios (attached)
  • Access to the Internet for research

Procedure:

Dive into the Case Study (15 minutes)

  • Open the case study document: "The Perfect Chatbot."
  • Carefully read the section on "Datasets" (page 6). Pay close attention to the types of biases described:
  • Confirmation bias
  • Historical bias
  • Labeling bias
  • Linguistic bias
  • Sampling bias
  • Selection bias

Scenario Analysis (20 minutes)

  • Review the attached worksheet with various chatbot scenarios. Each scenario outlines a chatbot's purpose and the data it was trained on, highlighting potential biases.
  • Analyze each scenario independently.
  • Identify the types of biases present.
  • Consider the potential consequences of these biases on the chatbot's performance and user experience.
  • Jot down your findings and any questions that arise.

Deep Dive into Biases (15 minutes)

  • Using the internet, research each type of bias identified in the scenarios.
  • Explore real-world examples of chatbots or AI systems that have exhibited biased behaviour due to biased training data.
  • Reflect on how these biases can impact AI technologies' fairness, accuracy, and ethical implications.

Mitigating Bias (10 minutes)

  • Brainstorm strategies to mitigate dataset biases in chatbot development.
  • Think about data collection methods, labelling practices, and preprocessing techniques that can help create more unbiased and diverse datasets.
  • Consider the role of ethical considerations in AI development and how to promote fairness and inclusivity in AI systems.
     

Worksheet: Chatbot Scenarios

Scenario 1: An online retail chatbot designed to assist customers with product recommendations. It was primarily trained on customer data from a specific demographic group.

Scenario 2: A customer service chatbot for a healthcare provider. It was trained on a dataset of patient inquiries from several years ago, without reflecting recent healthcare policy and procedure changes.

Scenario 3: A chatbot designed to assess job applicants' suitability for specific roles. It was trained on a dataset of resumes and job descriptions that may contain biased language or reflect historical inequalities in the workplace.

Scenario 4: A chatbot for a language learning app. It was trained on a dataset of formal written language and struggles to understand and respond to users' informal speech patterns and slang.

Worksheet: Chatbot Scenarios - Example Answers

Scenario 1: Online Retail Chatbot

Bias Type: Sampling bias
Explanation: The chatbot was trained on data from a specific demographic. This means it might not accurately understand the preferences and needs of customers from other demographics, leading to poor recommendations for those users.
Real-world example: Imagine a fashion chatbot trained primarily on data from young adult women. It might recommend trendy clothes that are not suitable for older women or men, resulting in a frustrating user experience for those demographics.


Scenario 2: Healthcare Customer Service Chatbot

Bias Type: Historical bias
Explanation: The chatbot was trained on outdated data. Healthcare policies and procedures can change frequently. Using old data might lead to the chatbot providing inaccurate or obsolete information to patients.
Real-world example: A chatbot trained on pre-pandemic data might not be aware of new telehealth options or updated hospital visitation policies, leading to confusion and frustration for patients seeking current information.


Scenario 3: Job Applicant Assessment Chatbot

Bias Type: Selection bias and potentially Linguistic bias
Explanation: The training data (resumes and job descriptions) likely reflect existing biases in hiring practices and language. This could lead to the chatbot unfairly favouring certain demographics or qualifications over others.
Real-world example: If the training data contains more resumes from men in leadership positions, the chatbot might inadvertently assign higher scores to male applicants or those with specific language patterns commonly found in male-dominated fields.


Scenario 4: Language Learning Chatbot

Bias Type: Linguistic bias
Explanation: The chatbot was trained on formal language and struggles with informal speech. This limits its ability to effectively teach and interact with users in a natural, conversational way.
Real-world example: A language learning chatbot trained on formal textbook language might not understand colloquialisms, slang, or regional accents, hindering its ability to provide accurate feedback or engage in realistic conversations with learners.

All materials on this website are for the exclusive use of teachers and students at subscribing schools for the period of their subscription. Any unauthorised copying or posting of materials on other websites is an infringement of our copyright and could result in your account being blocked and legal action being taken against you.