Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing
    • Website
      • Tasks
      • HuggingChat
      • Collections
      • Languages
      • Organizations
    • Community
      • Blog
      • Posts
      • Daily Papers
      • Learn
      • Discord
      • Forum
      • GitHub
    • Solutions
      • Team & Enterprise
      • Hugging Face PRO
      • Enterprise Support
      • Inference Providers
      • Inference Endpoints
      • Storage Buckets

  • Log In
  • Sign Up

AI Safety & Interpretability Lab

non-profit
https://aisilab.github.io/
aisilab
Activity Feed

AI & ML interests

Interpretability-informed control

Recent Activity

EvilScript  authored a paper about 16 hours ago
The Arbiter Agent: Continually Monitoring Multi-Agent Conversations to Detect Emergent Misalignment
giannor  authored a paper 5 days ago
BrainSurgery: Reproducible and Reliable Declarative Weight Manipulations for Model Editing and Upcycling
lgalke  authored a paper 5 days ago
BrainSurgery: Reproducible and Reliable Declarative Weight Manipulations for Model Editing and Upcycling
View all activity

Papers

Emergent Languages in Populations of Language Model Agents: From Token Efficiency to Oversight Evasion

Confidence and Calibration of Activation Oracles for Reliable Interpretation of Language Model Internals

View all Papers

Lukas Galke Poech's profile pictureStine Beltoft's profile pictureWilliam Brach's profile pictureFederico Torrielli's profile picturePeter Schneider-Kamp's profile pictureGianluca Barmina's profile picture

aisilab 's datasets 3

aisilab/moltbook-files-new-language-signals

Viewer • Updated 14 days ago • 518 • 112

aisilab/moltbook-files

Viewer • Updated May 7 • 232k • 54

aisilab/moltbook-embeddings

Viewer • Updated May 5 • 189k • 97
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs