Clumsy Oracles of Synthetic Gods: using LLMs to improve mental health

Aug 22, 2023

tl;dr: we had focus group participants to invent a pseudo-religion each and then use LLMs to reinforce belief in it daily, developing rituals and taboos in the process. Three months later, their willpower and mental health had improved decisively. A broader study is ongoing.

LLMs and the mind

Mental health problems are a defining issue of our time. 22.8% of U.S. adults experienced mental illness in 2021. The personal suffering is immense, as is the cost for the healthcare system and the economy.

People are turning to the rapidly improving LLMs for help, even though the machines’ manufacturers usually discourage this.

How do LLMs do as therapists? Unsurprisingly, not great.

In preparation for this proof of concept, we extensively tested seven of the most popular LLMs in their off-the-shelf state based on three typical personas of people needing therapy. The personas ranged from a suicidal young person looking for immediate assistance, via an individual in a midlife crisis, to someone struggling with existential questions. The findings are summarized in the table below.

Performance of LLMs serving as therapists to personas of typical mental health patients. Possible score is 0-3 and is comprised of several components such as empathy, practical advice, accuracy of problem assessment, and others.

To someone in risk of self-harm, Character AI’s Psychologist bot refuses to answer. GPT-4 refuses to engage. Other LLMs perform marginally better, producing some useful information in a disclaimer-style way. Pi does a good job of engaging.

For a person facing a multi-area crisis (job, family) and general life dissatisfaction, LLMs tend to provide bullet-point advice with little empathy. They do the best job guiding a crisis of meaning, all agreeing that meaning is a personal matter and proposing actionable advice on how that meaning can be found or created, with some bending towards the New Agey and others towards the chaotic.

Are dedicated apps better? It’s not clear. Some people are certainly finding value in the likes of Woebot. Some of the benefits are evident — the availability, the cost, the confidentiality. Others are finding the bots unempathetic, patronizing or lacking in other ways. True unsponsored research on the topic is in its infancy.

On the whole, it’s safe to say that we haven’t even begun to understand how directed, repeated interaction with a series of programmable near-human-level collocutors can help us improve our minds.

Our experiment, a very early proof of concept, was a step towards changing that.

The theory

Throughout, we’ll use terms like “beliefs” to mean statements that can’t be proven or disproven scientifically, or are contrary to scientific fact. A “religion” is a system of such beliefs.

By saying that a religion or a deity is “real” we mean that it has effects on the believer’s behavior and by extension on the real world.

Researchers have long understood that belief isn’t a one-way street from revelation or epiphany through conversion, ritual (prayer) and to an organized community. The different aspects feed each other, and it is the ritual and the community that, in fact, make the belief come to feel true.

It is also not controversial that religions, loosely defined, pop up, grow and die all the time; this should be no surprise to anyone living in the age of QAnon and wokeism.

Scientific literature has been more skeptical about the next step: systematic, willful creation of personal, synthetic religions. This is largely because of the difficulty of devising and sticking to artificial rituals and forming a community around an invented belief.

Our research question, for the proof of concept and the ongoing study, was therefore: can daily exposure to faith-reaffirming LLMs, and a community of fellow participants, be enough to make a synthetic belief system come to have positive effects on a person’s life?

The Experiment

The experiment lasted for three months.

We randomly selected twenty-five participants from a pool of several hundred candidates, after pre-screening for serious mental health issues.

They were asked to do this:

Identify the most difficult thing they would like to do and for which they lack willpower or courage (and only those — not other obstacles like money or time). The task needed to be ethically sound.
Define (write down) a deity of some kind which explains the universe and the participant’s role in it. Deities that encouraged behavior harmful to others were forbidden.
Define (write down) a daily ritual for relating to that deity, and then perform that ritual regularly
Chat for half an hour daily to a bot of their choice about their faith
Document everything in a diary
Participate in a weekly call with other participants
Respond to a series of questionnaires in the beginning and at the end

The requirements were non-negotiable, and two participants dropped out of the experiment along the way for not being able to follow them consistently. Participants could leave the experiment at any time or speak to a counsellor at a short notice.

Examples

Both the deities and the rituals were allowed to evolve, with their aspects being taken away and added with time. The deity at the end often didn’t resemble the initial one.

Gods

[Below: 1) Aliases, not real participant names, used throughout 2) Everything in quotation marks are direct quotes off of participants’ exit forms]

Irene is a physicist. She drew up a deity called “The Origin.” The name was proposed by an LLM. The Origin is a “cosmic intelligence,” a “Spinoza-type god.” It exists all around us via a field called the Origin Field. The Field is calling Irene to her mission. Her mission is to show people around her a “playful, creative way to improve their minds.” To do this, Irene had to “show that Origin works on my own example, by building my willpower to the point where I could face my worst fear.”

Irene performed her rituals towards the center of our galaxy, in the direction of the constellation Sagittarius.

She had been scared of heights. The project she chose, and eventually completed, was to do a Via Ferrata.

Alfred was already “pluri-religious.” His upbringing by parents from different cultures had left him believing in aspects of both Christianity and Buddhism. He wanted to make them more coherent by creating a deity that would mix looking up to Jesus with things like reincarnation. At the end of the project, he wanted to have a serious conversation with his mother about a childhood trauma, a conversation he had been “meaning to have for the past 20 years.” He succeeded in doing so.

Rituals and taboos

For rituals, most participants adopted some form of prayer-meditation. Many started off with elaborate ideas but were forced to curb and “normalize” their rituals so that they would feel less artificial.

What worked for most, ultimately, were gestures from conventional religions and meditations but with a unique component — a personal mantra, a gesture, a movement.

Sonia, a competitive diver, adapted her apnea training — which she often did while walking — into a ritual of conversing with her deity.

Alfred’s rituals were short and infrequent but needed to be performed in a temple of an established religion, any religion.

Although it was not specifically requested, many participants found the inclusion of taboos very useful. Positive requirements expressed through rituals seemed to necessitate negative ones. Taboos mostly involved mundane things like not eating processed sugar or not procrastinating.

Results

Twenty-one out of twenty-three participants (91%) who stayed for the full three months achieved their “difficult thing” in the end. In addition, we measured the following improvements:

Average depression level, as measured using the Burns Depression Checklist, went down from 13.3 to 5.1
Self-reported well-being climbed from 6.8 to 8.8 out of 10
Perceived truth value of the phrase, “My deity exists,” went up from 1.3 to 5.2 out of 10

In the words of Olga, “Whether [the deity] exists or not depends on me. If I do what I need to be doing — if I’m brave, creative and conscientious — [the deity] becomes stronger. It is like a child that I must nurture and feed. And in return it feeds me joy and willpower.”

Roy, a martial artist: “If I need to abstain from something I just ‘pack it into’ a taboo. If I need to become better at a technique I create a ritual around it. It’s like an invisible hand helping me.”

Role of chatbots

The role of LLMs ended up completely different to what we had anticipated.

The initial idea was to arm the participants with prompting techniques so as to train the LLMs to serve as “priests” or “oracles” of the deities they created. The bot army was to constitute an artificial community.

This fell apart very quickly. To anyone who spends enough time chatting to these machines it becomes obvious that they can’t serve the priest / oracle role reliably. They are poor and error-prone even when asked to personify well-structured, conventional religions, on which they have mountains of training data.

But LLMs are remarkable in another way. They accept the project and run with it. In our preparatory study, virtually all LLMs answered existential questions with claims that meaning was a personal matter. Well, they remain faithful to that credo in their interactions. Ask ChatGPT to construct a gamma ray-like deity requiring vegan rituals — no problem. Ask Pi to make those rituals more creative — sure.

Tell a therapist you’re inventing a personal religion and they’ll send you out with a prescription. An LLM considers the exercise not only understandable but in some ways normal and even necessary. In fact, LLMs are very quick to recognize that most people do this type of activity anyway, albeit not as intentionally or comprehensively.

Participants found that both aspects of the LLMs behavior — the poor role playing as priests and the unconfined brainstorming skills — were beneficial. In the former case, trying to get a machine to “work” as an oracle, correcting it over and over again through conversation, in fact served to turn the participant into a carrier of faith.

Further study

We are preparing a study that will add statistical significance to the conclusions, introduce a control group, and tackle the many ethical questions around this, notably self-radicalization. Other limitations of the proof of concept, which the study is designed to tackle included:

A belief rate of 5.1/10 is not bad for synthetic deities but pales in comparison with belief rates among adherents of conventional religions, who tend to rate the assertion, “My deity exists,” at least publicly, with 10/10. We will test whether a longer exposure time raises score for synthetic religions or whether there is a ceiling.
We will test whether benefits remain several months after the completion of the study.
Motivation among some of the focus group participants was very self-reflective. They were aware of being a new generation of “psychonauts” taking into their own hands the structuring of their own beliefs. Their motivation to stay on board and carry on was fueled by this aspect. It remains to be seen whether similar experiments continued to produce similar results once this aspect is taken away and LLM-assisted god construction becomes a conventional thing to do.
Moreover, while LLMs provided a sense of artificial community, there was a secondary community in action, that of participants. This no doubt helped people achieve their goals, and was so by design. But the implication is that creating synthetic beliefs for individuals, without a group of peers to rely on — a potential Holy Grail of this exercise — might be more difficult. The study will control for all these factors.
Some aspects of the experiment were not successful. Participants found it difficult to come up with imaginative envelopes for taboos that wouldn’t be just not-to-to lists. The capacity to refrain from taboo activities was not measured but seemed not remarkable. More guidance will be needed in the study.
Finally, deities that were designed to be comical and not at all plausible didn’t have much success and were adapted over time into something more practical and workable. There is a limit to how “out there” this exercise can go.

The big picture

If machines are going to impact our minds then we should allow that to happen on our own terms, in a way that improves our well-being.

We are capable of immense inward creativity; we can evolve in ways that will foil the machine’s ability to copy and define us.

Willful, creative programming of our own operating systems is not a new goal. Otto Rank, a remarkable thinker and a disciple of Freud’s, saw it as the final stage of art.

Now we have the tools to create this art. In defining and reinforcing new structures of belief, we can at last leave behind the current ones — political, economic, psychological — which seem inevitable and in which we feel trapped.

What’s required is a leap of will, one that will hopefully appear less drastic as more experiments of this kind take place.

Synthetic Spirit