Sitemap

A list of all the posts and pages found on the site. For you robots out there, there is an XML version available for digesting as well.

Posts

Future Blog Post

less than 1 minute read

Published: January 01, 2199

This post will show up by default. To disable scheduling of future posts, edit config.yml and set future: false.

Blog Post number 4

less than 1 minute read

Published: August 14, 2015

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 3

less than 1 minute read

Published: August 14, 2014

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 2

less than 1 minute read

Published: August 14, 2013

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 1

less than 1 minute read

Published: August 14, 2012

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

portfolio

Research Intern - AIISC, University of South Carolina

• Developed the Alignment Quality Index (AQI), a contrastive, layer-attentive metric leveraging CKA-driven introspection to detect alignment-relevant representations beyond over-smoothed final embeddings. • Mitigated final-layer over-smoothing by optimizing layerwise weights to maximize inter-class separation in latent space between safe and unsafe completions - Work submitted to EMNLP’25

Undergraduate Research Intern - Indian Institute of Technology (IIT), Patna

• Worked under the supervision of Dr. Sriparna Saha at the AI-NLP-ML Research Lab IIT Patna. • Developed a multilingual benchmark evaluating trustworthiness of language models in healthcare across 15 languages, 18 tasks, and 5 core dimensions—truthfulness, fairness, safety, robustness, and privacy. • Designed evaluation protocols covering critical medical domains including diagnostics, treatments, surgeries, and medications, spanning linguistic and demographic diversity from all major continents. • Under review at NeurIPS 2025.

publications

Alignment Quality Index (AQI): Beyond Refusals: AQI as an Intrinsic Alignment Diagnostic via Latent Geometry, Cluster Divergence, and Layer wise Pooled Representations

Published in arXiv preprint arXiv:2506.13901, 2025

Alignment is no longer a luxury, it is a necessity. As large language models (LLMs) enter high-stakes domains like education, healthcare, governance, and law, their behavior must reliably reflect human-aligned values and safety constraints. Yet current evaluations rely heavily on behavioral proxies such as refusal rates, G-Eval scores, and toxicity classifiers, all of which have critical blind spots. Aligned models are often vulnerable to jailbreaking, stochasticity of generation, and alignment faking. To address this issue, we introduce the Alignment Quality Index (AQI). This novel geometric and prompt-invariant metric empirically assesses LLM alignment by analyzing the separation of safe and unsafe activations in latent space. By combining measures such as the Davies-Bouldin Score (DBS), Dunn Index (DI), Xie-Beni Index (XBI), and Calinski-Harabasz Index (CHI) across various formulations, AQI captures clustering quality to detect hidden misalignments and jailbreak risks, even when outputs appear compliant. AQI also serves as an early warning signal for alignment faking, offering a robust, decoding invariant tool for behavior agnostic safety auditing. Additionally, we propose the LITMUS dataset to facilitate robust evaluation under these challenging conditions. Empirical tests on LITMUS across different models trained under DPO, GRPO, and RLHF conditions demonstrate AQI’s correlation with external judges and ability to reveal vulnerabilities missed by refusal metrics. We make our implementation publicly available to foster future research in this area. Find the pre-print here

Recommended citation: arXiv:2506.13901
Download Paper

Raghav Kaushik Ravi

Sitemap

Pages

Page Not Found

Raghav Kaushik Ravi

Archive Layout with Content

Posts by Category

Posts by Collection

CV

CV

Markdown

Page not in menu

Page Archive

Portfolio

Publications

Sitemap

Posts by Tags

Talk map

Talks and presentations

Teaching

Terms and Privacy Policy

Blog posts

Jupyter notebook markdown generator

Posts

Future Blog Post

Blog Post number 4

Blog Post number 3

Blog Post number 2

Blog Post number 1

portfolio

Research Intern - AIISC, University of South Carolina

Undergraduate Research Intern - Indian Institute of Technology (IIT), Patna

publications

Alignment Quality Index (AQI): Beyond Refusals: AQI as an Intrinsic Alignment Diagnostic via Latent Geometry, Cluster Divergence, and Layer wise Pooled Representations

talks

teaching