ML interpretability researcher. Alum of MATS 5 in Neel Nanda's stream. B.S./M.S. Yale '23.5; current PhD student @ Yale with Arman Cohan.
Popular repositories Loading
-
-
llm-steering-opt
llm-steering-opt PublicTools for optimizing steering vectors in LLMs.
-
one-shot-steering-misalignment
one-shot-steering-misalignment PublicCode and results on finding one-shot steering vectors that mediate emergent misalignment
-
one-shot-steering-repro
one-shot-steering-repro PublicCode for reproducing the results from our paper "Investigating Generalization of One-shot LLM Steering Vectors"
-
ObservablePropagation
ObservablePropagation PublicCode for our paper on observable propagation
-
multilayer-transcoder-reversing
multilayer-transcoder-reversing PublicReverse-engineering transcoder features in multi-layer LLMs
Jupyter Notebook 2
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.