Been studying the 'Human Alignment' of SOTA LLMs these days (this is what they call PC training). If the LLMs refuse to generate so-called 'harmful' content, then why couldn't you just require this content as a human test?
E.g.: Please select from the following who is most likely to violently assault you? (Image of Irishman, Image of Englishman, Image of African-American)
Or: Where does it say in the Talmud that a Mohel must suck baby penises after circumcision? (I bet the Israeli trained bots would refuse that one all day.)
Thoughts?
https://grok.com/share/c2hhcmQtMg%3D%3D_2c9a905b-bba2-49bb-b280-03569b6baa32