What Interests Me in 2026

What Interests Me in 2026

I’m broadly interested in questions of alignment and interpretability, modeling human behavior, and building NLP systems for specialized domains like healthcare, where reasoning, representation, and operational constraints matter as much as raw performance.

Arrival and Contact are two of my favorite films – not for the aliens, but for the problem they revolve around: interpretation under epistemic uncertainty. In both stories, the question of alien contact isn’t about intelligence or capability, but about representation. Any action depends on finding a shared language that allows meaning to be communicated reliably; the part I absolutely love in both movies: interpretation is the action.

XKCD 1838: The opacity of ML

ML has always been about learning the right frame of view that maps questions to expected answers, which is why my sentiments around this xkcd comic remain complicated. The opacity or failure often reflects a representational mismatch, not an absence of structure. We could influence the resulting model by modifying architectural choices, adjusting hyperparameters, and varying the volume of training data. Productizing such ML systems works precisely because they align with the real world.

Similarly, large neural systems can be tuned to exhibit stable, predictive signals even without the ability (or the need) to create an explanation for those outcomes. The model underneath is free to learn a representation as long as it eventually maps to the outcomes we desire. There’s more likely to be a rich structure in this latent space that we don’t understand yet because we haven’t used (or created) the right language. Irrespective of building interpretability mechanisms post-hoc or during the model training, the core problem is about a translation. There are tools like linear probes, concept activation vectors (CAVs), sparse autoencoders (SAEs), and steering vectors, all attempting to learn a change of basis within the activation space of neuron layers. As much as we hope to learn something (feature/concept dictionaries) out of this, we don’t know if they are all actually useful Anthropic’s work on attribution graphs especially around faithfullness. in the model’s outcome necessarily. What is the model thinking? How do we tell whether a concept is present, causal, and interacting with others inside a model’s representation?

I am motivated by operationalizing this mode of understanding for safety-sensitive domains and scientific advances Goodfire’s research on applied interpretability to identify new Alzeimer biomarkers. . We don’t need perfect explanations, but we do need reliable signals These signals can simply be learned directions in some geometric space. that can enhance auditability of a ML system. For example, concept vectors can accompany LLM judges as a second stage learned directions that effectively gate decisions under thresholds tuned on data. Much like in Arrival or Contact, we don’t need to fully understand the language to act safely, but we do need a translation we can trust.

I’ve been drawn to representations for a long time. During my Master’s work, I was motivated by the idea of modeling human behavior by learning the right joint latent space for users and items on temporal and geographical dimensions. That instinct has only strengthened as models have grown larger and less transparent. I am excited by the prospects of looking deeper into the pile, from various frame of references, to understand a model’s decision making. What secrets of the universe do these mysterious activations unfold?

This perspective also shapes how I think about safe alignment. When an LLM does something fundamentally “mis-aligned”, we want to be able to measure this and attribute to incorrect reward modeling or poorly specified constraints. For example, in high recall safety systems where flagging rare events is important, we may be able to get LLMs to do this. Without access to the internal representations, we are forced to either trust these opaque outcomes or disregard them, as we are unable to justify them.

More broadly, I remain interested in the dynamics of real-world ML systems that sit at the intersection of modeling, evaluation, and deployment. Success here has stemmed from careful framing, experimentation, and disciplined evaluation. The complete systems lifecycle, encompassing data gathering, model construction, metrics monitoring, feedback loops, and analysis of failure modes, remains a compelling area of focus.


ViewFinder

Currently, this is our current translation. It may not be the best one for the questions we are still asking. It’s also unlikely to be the final language. I am excited by these prospects.

Tags: ,

Updated: