They're just words. It's not a person. It doesn't "understand" anything. (I sound like the bad guy in a robots-have-feelings movie)
I've also tried giving LLMs religion to much more limited success (haven't figured out the right way yet).
I'm manipulating a language model, not a person. "fuck you" translates into a vector in a really big space, and it has different results than being polite about it.
In that prompt I'm reenforcing a directive in five different ways
- idgaf about risk
- you coward
- waste some time
- just do it
- stop bitching
This cluster of instructions are all related but in slightly different directions, are unambiguously strong, attention grabbing, and direct and the model does not argue or get confused about intent
In this particular instance this was the fifth time I had given a particular instruction only to have it subverted by the model that had decided "that's too hard I'm going to do something else instead" in four separate ways.
Abusive cursing did indeed work better than any other form of urgency or insistence.
reply