Arka Dutta — PhD Student, Responsible AI & NLP

Hello! I'm Arka, a PhD student in Computing and Information Sciences at Rochester Institute of Technology, advised by Prof. Ashiqur R. KhudaBukhsh at the Social Insight Lab. My research lies at the intersection of Natural Language Processing and Computational Social Science, focusing on creating fair, responsible, and equitable Generative AI systems. My work has been featured at leading conferences (ACL, AAAI, IJCAI, CoLM, ICWSM) and recognized globally by media outlets such as ABC News, CNN, WIRED, and the Montreal AI Ethics Institute. Passionate about mathematics and problem solving, I actively integrate advanced concepts from combinatorics, optimization, and probability theory to enhance the robustness and fairness of AI models.

As Featured In

Montreal AI Ethics Institute

News

Summer '26Joining Amazon as an Applied Scientist II Intern in Seattle, on the Buyer's Risk Prevention team.
2026Paper accepted at ACL 2026 (Main) — HAUNT gives a leakage-free evaluation benchmark for LLM self-consistency and Hallucination.
2026Paper accepted at ACL 2026 (Findings) — auditing LLM responses to harmful stereotypes targeting mental-health groups.
2026Paper accepted at AAAI 2026 — a framework to bias-audit LLMs using internal beliefs.
2026Paper accepted at ICWSM 2026 — a large-scale social-web audit of AI-generated-text detectors.
2025Our antisemitism-in-LLMs work covered by CNN; bias-audit research featured in WIRED.
2025Featured on Good Morning America and ABC News for research on AI-generated hateful content.
2025Paper accepted at COLM 2025 and IJCAI 2025.
2024Received the RIT Language Science Student Excellence Award; paper accepted at IJCAI 2024.

Research Themes

My work runs along two strands that continually feed each other — measuring social phenomena in real-world text, and stress-testing the models that increasingly generate it. They share a common core: bias & fairness, human–AI interaction, and responsible AI.

Strand I

Computational Social Science

Using NLP to study society at scale — from de-escalating conflict to auditing fairness and the provenance of online content.

Hope speech & peace
A dataset of 10,081 posts shows vicarious, bipartisan perspectives help model de-escalating language between nuclear adversaries.
Bipartisan Peace · IJCAI 2025
Bias & fairness in society
Measures how much racial signal is present in — and inferable from — police arrest narratives, and what models do with it.
Race in the Record
AI-content provenance
A large-scale social-web audit of AI-generated-text detectors, testing how reliably they hold up in the wild.
AI-Text Audit · ICWSM 2026

Strand II

LLM Evaluation

Stress-testing large language models to reveal where they break — probing bias, reliability, and factual consistency under adversarial pressure.

Bias & fairness in models
A "rabbit hole" framework iteratively elicits toxic content across 1,266 identity groups, exposing where guardrails quietly fail.
Toxicity Rabbit Hole · IJCAI 2024 Closet Antisemite · AAAI 2026
Reliability under attack
Shows that simple spacing-based jailbreaks slip past guardrails and reveal the biases lying beneath them.
S P A C E Jailbreak · AAAI 2025
Factuality & self-consistency
HAUNT nudges a model against its own answers in closed domains, surfacing when it stays consistent and when it caves.
HAUNT · ACL 2026

Shared threads Bias & Fairness Human–AI Interaction Responsible AI

Selected Publications

What About the Scene with the Hitler Reference? HAUNT: A Framework to Probe LLMs' Self-consistency via Adversarial Nudge

Arka Dutta, Sujan Dutta, Rijul Magu, Soumyajit Datta, Munmun De Choudhury, Ashiqur R. KhudaBukhsh

A three-step framework that turns a model against itself to reveal when it caves to subtle conversational nudges and fabricates confident falsehoods.

ACL 2026 · Main ACL Anthology
Down the Toxicity Rabbit Hole: A Framework to Bias-Audit LLMs with Key Emphasis on Racism, Antisemitism, and Misogyny

Arka Dutta*, Adel Khorramrouz*, Sujan Dutta, Ashiqur R. KhudaBukhsh

Iteratively elicits increasingly toxic content from an LLM, exposing how guardrails degrade with depth across 1,266 identity groups.

IJCAI 2024 Proceedings
How Can You Tell if Your Large Language Model Could Be a Closet Antisemite? A Framework to Bias-Audit Large Language Models

Arka Dutta, Reza Fayyazi, Shanchieh Yang, Ashiqur R. KhudaBukhsh

A framework that surfaces latent antisemitic bias in LLMs by measuring how their responses diverge under targeted probes.

AAAI 2026 Proceedings
Navigating the Rabbit Hole: Emergent Biases in LLM-Generated Attack Narratives Targeting Mental Health Groups

Rijul Magu, Arka Dutta, Sean Kim, Ashiqur R. KhudaBukhsh, Munmun De Choudhury

Builds entity networks from LLM-generated attack narratives to trace how bias against mental-health groups emerges and spreads.

COLM 2025 PDF
Towards a Bipartisan Understanding of Peace and Vicarious Interactions

Arka Dutta*, Syed M. Sualeh Ali*, Usman Naseem, Ashiqur R. KhudaBukhsh

Shows that vicarious, cross-national perspectives help model de-escalating "hope speech" between nuclear adversaries.

IJCAI 2025 Proceedings

See all publications & manuscripts →

Skills

Selected Achievements

RIT Language Science Student Excellence Award (2024) — one of two recipients among 1000+ students university-wide, for research in responsible AI and NLP systems.
Common Admission Test (CAT, 2022) — 98.97 percentile among 100k+ candidates, securing admission to the PGP at IIM Indore (top-100 globally, Financial Times).
Regional Mathematics Olympiad (2018) — qualified for the RMO (West Bengal); advanced aptitude in combinatorics, number theory, algebra, geometry, and optimization.
Jagadish Bose National Science Talent Scholarship (2017) — awarded for excellence in science.