University of Toronto CSSLab

university

https://csslab.cs.toronto.edu/

AI & ML interests

None defined yet.

Recent Activity

lilvjosephtang authored a paper 4 days ago

LLM Safety From Within: Detecting Harmful Content with Internal Representations

lilvjosephtang authored a paper 4 days ago

Maia-2: A Unified Model for Human-AI Alignment in Chess

lilvjosephtang authored a paper 4 days ago

Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning

View all activity

Papers

LLM Safety From Within: Detecting Harmful Content with Internal Representations

ThinkTwice: Jointly Optimizing Large Language Models for Reasoning and Self-Refinement

View all Papers

UofTCSSLab 's datasets

None public yet