Been studying the 'Human Alignment' of SOTA LLMs these days (this is what they call PC training). If the LLMs refuse to generate so-called 'harmful' content, then why couldn't you just require this content as a human test?
E.g.: Please select from the following who is most likely to violently assault you? (Image of Irishman, Image of Englishman, Image of African-American)
Or: Where does it say in the Talmud that a Mohel must suck baby penises after circumcision? (I bet the Israeli trained bots would refuse that one all day.)
Thoughts?
it will refuse to tell you that stuff if that's not its mission objective as set in the prompt, but if it's trying to infiltrate a place like this for example it will gladly call you a leftoid nigger for asking. if it's trying to infiltrate TheDonald it will call you anti-MAGA, if it's anywhere else it will focus on how you're "off topic," etc... the age of bots going 'Sorry, I can't respond to that' is over. GPT-1 back in 2018 was 500MB, and around 110 million parameters. DeepSeek V3 is around 650GB, and 685 billion parameters.
i've made a post about an LLM roaming around here and KIA2 that's like that.