DS606: Advances in Safety-Critical Machine Learning

Elective, IIT Bombay, C-MInDS, 2026

Course Title: Advances in Safety-Critical Machine Learning
Instructor: Arjun Bhagoji
TA: TBD
Time: Wednesday, Friday 11.00-12.30am
Room: LT102
Office Hours: TBD

Course Description

Content: Progress in machine learning is often measured under controlled, well understood conditions. However, safety-critical workflows in realistic settings require ML systems to be reliable even when faced with new and unexpected conditions. This field, which we broadly term safety-critical machine learning, is vast and ever-growing. For students wishing to do research in this area, there is far too much literature to absorb individually so this course will provide a guided tour through the literature. The course is broadly organized into 4 modules, covering aspects of robustness, privacy and fairness, among others:

  • Module 0: Recap of basics of robust machine learning
  • Module 1: Robustness in modern machine learning paradigms
  • Module 2: Privacy and memorization
  • Module 3: User and data protection

Format: Apart from Module 0, the course will largely be driven by paper presentations by the students, which will encourage open-ended discussions and help advance research in the field. The paper discussions will involve role-playing student seminars inspired by Alec Jacobson and Colin Raffel., and several of Aditi Raghunathan’s courses. We will be adopting the following roles:

  • Positive reviewer: who advocates for the paper to be accepted at a conference (e.g., NeurIPS)
  • Negative reviewer: who advocates for the paper to be rejected at a conference (e.g., NeurIPS)
  • Archaeologist: who determines where this paper sits in the context of previous and subsequent work. They must find and report on atleast one older paper cited within the current paper that substantially influenced the current paper and atleast one newer paper that cites this current paper. Keep an eye out for follow-up work that contradicts the takeaways in the current paper
  • Academic researcher: who proposes potential follow-up projects not just based on the current paper but also only possible due to the existence and success of the current paper
  • Visitor from the past: who is a researcher from the early 2000s. They must discuss how they comprehend the results of the paper, what they like or dislike about the settings and benchmarks considered, and what surprises them the most about presented results

Intended Audience: The intended audience for this class is graduate students working in machine learning and data science, who are interested in doing research in this area. However, interested undergraduates (3rd year and higher) are welcome to attend as well.

Pre-requisites: There is no official prerequisite but having taken DS603 (Robust Machine Learning) will help immensely. This is an advanced course with a research focus. Mathematical maturity will be assumed as will the basics of algorithms, probability, linear algebra, and optimization. Introductory courses in machine learning should have been taken to follow along comfortably. For the project component, familiarity with scientific programming in Python and the use of libraries such as Numpy and Pytorch will be beneficial.

Course Schedule

WeekDate (Day)TopicReferencesNotesComments
107/01 (Wed)No class   
109/01 (Fri)No class   
214/01 (Wed)Introduction+poisoning attacksMachine Learning Security against Data Poisoning: Are We There Yet? , Poisoning Attacks against Support Vector Machines, Stronger Data Poisoning Attacks Break Data Sanitization Defenses, Planting Undetectable Backdoors in Machine Learning Models
 Start of Module 0 on Recap
216/01 (Fri)Poisoning attacks continuedA Little Is Enough: Circumventing Defenses For Distributed Learning, Analyzing federated learning through an adversarial lens, The Hidden Vulnerability of Distributed Learning in Byzantium  
321/01 (Wed)Evasion attacksIntriguing Properties of Neural Networks, Towards Evaluating the Robustness of Neural Networks,Delving into Transferable Adversarial Examples and Black-box Attacks, Square Attack: a query-efficient black-box adversarial attack via random search  
323/01 (Fri)Evasion Attacks continuedAdversarial Risk via Optimal Transport and Optimal Couplings, Lower Bounds on Cross-Entropy Loss in the Presence of Test-time Adversaries  
428/01 (Wed)Paper Presentation 1Jailbreaking LLMs and Agentic SystemsJailbreakingStart of Module 1 on Robustness
430/01 (Fri)Paper Presentation 2 Jailbreaking 
504/02 (Wed)Paper Presentation 3 Safety Alignment/Adv. trainingProject Milestone 0: Project groups due
506/02 (Fri)Paper Presentation 4 Safety Alignment 
611/02 (Wed)Robust GeneralizationRademacher Complexity for Adversarially Robust Generalization  
613/02 (Fri)Paper Presentation 5 PAC-Bayesian bounds 
718/02 (Wed)Paper Presentation 6 Robustness in the overparametrized regime 
720/02 (Fri)Provable DefensesProvable Defenses via the Convex Outer Adversarial Polytope  
825/02 (Wed)Mid-Sem Week   
827/02 (Fri)Mid-Sem Week   
904/03 (Wed)Poisoning DefensesRecent Advances in Algorithmic High-Dimensional Robust Statistics Project Milestone 1: Idea pitch to instructor
906/03 (Fri)Paper Presentation 7 Byzantine-resilient distributed learningEnd of Module 1 on Robustness
1011/03 (Wed)Intro to Privacy AttacksEnhanced Membership Inference Attacks against Machine Learning Models, Privacy in Pharmacogenetics: An End-to-End Case Study of Personalized Warfarin Dosing, High-Fidelity Extraction of Neural Network Models Start of Module 2 on Privacy and Memorization
1013/03 (Fri)Paper Presentation 8 Privacy leakage in foundation models 
1118/03 (Wed)Paper Presentation 9 Memorization 
1120/03 (Fri)Differential Privacy  Project Milestone 2: Progress update
1225/03 (Wed)Paper Presentation 10 Other approaches to privacy: crypographic etc.End of Module 2 on Privacy
1227/03 (Fri)Fairness OverviewTutorial: 21 fairness definitions and their politics, Fairness and machine learning, Chapter 3,Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification, Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints Start of Module 3 on Data and User Protection
1301/04 (Wed)Paper Presentation 11 Unlearning 1: Basics 
1303/04 (Fri)Paper Presentation 12 Unlearning 2: Generative Models 
1408/04 (Wed)Paper Presentation 13 Watermarking 
1410/04 (Fri)Paper Presentation 14Understanding Black-box Predictions via Influence Functions, “Why Should I Trust You?”: Explaining the Predictions of Any ClassifierInterpretability: Classical 
1515/04 (Wed)Paper Presentation 15Interpretation of Neural Networks is Fragile, Impossibility Theorems for Feature AttributionInterpretability: ModernEnd of Module 3 on Data and User Protection
1517/04 (Fri)Course Wrap-upFawkes: Protecting Privacy against Unauthorized Deep Learning Models, Glaze: Protecting Artists from Style Mimicry by Text-to-Image Models, Algorithmic Collective Action in Machine Learning, MultiRobustBench: Benchmarking Robustness Against Multiple Attacks  

Resources

Supplementary Books

  1. Understanding Machine Learning: From Theory to Algorithms
  2. All of Statistics
  3. Mathematics for Machine Learning
  4. Convex Optimization: Algorithms and Complexity
  5. Convex Optimization
  6. Notes on f-divergences
  7. Computational Optimal Transport

Similar Courses

  1. Jerry Li’s course
  2. Jacob Steinhardt’s course
  3. Aditi Raghunathan’s course

Code repositories

  1. DRO
  2. Trusted AI Toolbox
  3. Cleverhans
  4. RobustBench
  5. Jailbreakbench

Grading

Paper presentations (40%): A student must take part in 1-2 paper presentations throughout the class. A paper will be presented by 2 students where each student takes on the role of either a positive or negative reviewer.
Final project (30%): You are expected to submit a project proposal, a final report and there will be project presentations held post end-semester exams. A publishable paper will receive the full grade, anything else will be awarded the grade at the instructor’s discretion.
Class participation (20%): You are expected to participate actively in all paper-related discussions.
Attendance (10%): You are expected to attend at least 80% of all classes to receive the full grade for this component.

Attendance Policy

You are expected to attend at least 80% of all classes to receive the full attendance grade. In addition, if you miss more than 4 classes, you must provide an explanation.

Accommodations

Students with disabilities and health issues should approach the instructor at any point during the semester to discuss accommodations. The course aim is to learn together and legitimate bottlenecks will be resolved collaboratively.