New here?
Create an account to submit posts, participate in discussions and chat with people.
Sign up
Been studying the 'Human Alignment' of SOTA LLMs these days (this is what they call PC training). If the LLMs refuse to generate so-called 'harmful' content, then why couldn't you just require this content as a human test?

E.g.: Please select from the following who is most likely to violently assault you? (Image of Irishman, Image of Englishman, Image of African-American)

Or: Where does it say in the Talmud that a Mohel must suck baby penises after circumcision? (I bet the Israeli trained bots would refuse that one all day.)

Thoughts?
You are viewing a single comment's thread. View all
Alexius_von_Meinong on scored.co
1 month ago 2 points (+0 / -0 / +2Score on mirror )
It would be a great way to control access to websites by having a battery of questions that would have to be answered in a way that NPCs would find distasteful. Then set it so a new question like this is posed with every visit for further access, training users to be based even if they're only there to spy.
Toast message