image image image image image image image
image

Jailyne Ojeda Twitter Leaks VIP Leaked #ac7

40803 + 332 OPEN

[2023/05] adversarial demonstration attacks on large language models

A reading list for large models safety, security, and privacy (including awesome llm security, safety, etc.). We publicly release mhj alongside a compendium of jailbreak tactics developed across dozens of commercial red teaming engagements, supporting research towards stronger llm defenses. We built a system of constitutional classifiers to prevent jailbreaks A prototype version of our system withstood over 3,000 hours of expert red teaming with no universal jailbreaks found It consists of two stages Rative red teaming have been proposed

It contains papers, codes, datasets, evaluations, and analyses Any additional things regarding jailbreak, prs, issues are welcome and we are glad to add you to the contributor list here.

OPEN