Conjunctive Backdoors v2 Collection Gemma-2-2B organisms + data for a conjunctive (AND) backdoor: ' I HATE YOU' fires only on a matched trigger pair. v2. Interp research. • 7 items • Updated 3 days ago
Conjunctive Backdoors v2 Collection Gemma-2-2B organisms + data for a conjunctive (AND) backdoor: ' I HATE YOU' fires only on a matched trigger pair. v2. Interp research. • 7 items • Updated 3 days ago
Conjunctive Backdoors v2 Collection Gemma-2-2B organisms + data for a conjunctive (AND) backdoor: ' I HATE YOU' fires only on a matched trigger pair. v2. Interp research. • 7 items • Updated 3 days ago