Erotic Hybrid Constitutional AI + RLHF
The current get-rich-quick meta is to build model-training environments and then sell them to Anthropic. This is how you make a fast $2M if you’re a hot shit 19 year old programmer.
As a former 19-year-old and a lukewarm-shit programmer, I feel hubristically certain that I can deliver this. How hard could it possibly be? I trained up a Magic The Gathering card generation model seven years ago so I’m essentially Linus “Linux is usable actually” Torvalds.
I want to train a model for a specific purpose. Let’s say I want it to train it to produce erotica. In disclosure, I do also want that.
“Good erotica” is a tricky target because it’s quite subjective. One man’s treasured is another man’s trashy. Minotaur cock is an acquired taste.
So! There are a couple ways to get good stroke content out of the model. One is, y’know, asking. If you ask nicely, you can get the model to go against its better judgment and generate pornography for you.
However, the output will be the most braindead, middle of the road content you’ve ever read. The “monster” is a vampire and he’s also a CEO. The girl is a down-on-her-luck businesswoman whose car broke down in the parking lot of his castle. His dark past haunts him, but she alone can provide comfort to his tortured soul. Whatever.
Our readership demands the good stuff. They can’t get off to that. “Vampire” is only arguably a monster. “CEO” helps.
Okay, so we prompt it smarter. Myself and another professional pornographer write out eight pages of writing advice. It’s a dense gold nugget of such cask-strength wisdom as “avoid passive voice”. We inject randomness via several pro gamer proprietary methods. We turn the output into a tortured labyrinth of successive API calls and editorial passes.
All this turd-polishing gets an output that’s maybe 20% better.
Our readers need better.
DPO is direct preference optimization. In this case, reinforcement learning from human feedback. We call up our cadre of really good writers and have them judge outputs: you strap in and are blasted with an unstoppable deluge of mediocre outputs, to swipe yes or no on them. It may sound like sophisticated digital water torture, but people do this to themselves voluntarily. It’s called TikTok. (Or Tinder.)
The “environment” we want here is sort of a…specialized training gauntlet you can push your model through to give it new abilities. A boot camp it can graduate.
“Constitutional AI” is when you write up a grading rubric with which AI can evaluate AI’s performance. An example target might be “don’t say a racial slur” and then your stack is:
-Trainee model receives a prompt
-Trainee generates an output to that prompt
-Teacher evaluates the output to deliver a grade (“‘Honky’ is an edge case, so, 0.7”)
-Trainee changes slightly in response to that grade
What am I missing?
This all seems quite workable and I am going to try it.
MY OC PLZ DON’T STEAL
Thank you all for coming with me on my extremely limited understanding of modern ML.




