Are we the AGI being aligned?

Exploring anomalous phenomena via the metaphor of multi-agent training framework.

Jan 19, 2025

A powerful way to discover new knowledge is by examining data that challenges our worldview. By investigating these anomalies, we can identify where existing beliefs fall short, helping us move closer to a more accurate understanding of reality. This post does not aim to advance any specific hypothesis or prove a particular point. Instead, it offers an exploration of various anomalous phenomena through a particular lens, with the hope of encouraging further inquiry.

There’s a lot of buzz around 2025 being the year of agents, as tech companies race to deploy AI systems that are able to perform tasks autonomously on our behalf. From automated software engineers and researchers to AI personal and medical assistants, Agentic AI systems are poised to be an ubiquitous part of our lives in the coming years. If we look ahead a few years, how might superintelligent AI agents be trained and aligned to ensure they are both capable and safe?

For the sake of exploration, I propose using a hypothetical multi-agent simulation architecture as a metaphor for understanding anomalous phenomena that don’t fit neatly in the current worldview. In each section, I’ll introduce a selection of fringe topics and then explore how they might relate to the components and algorithms of this proposed multi-agent simulation. This is meant to be a high-level, exploratory discussion rather than a rigorous argument, so there will be gaps and contradictions. I’m also not claiming any of these topics are necessarily real, but there is generally enough legitimate interest in them to warrant further investigation.

Simulation for evaluating and aligning AI agents

Simulation is a valuable tool in AI and machine learning. There’s a lot of interest in using simulation for building real world robotics, while many advances in Deep Reinforcement Learning were iterated via Atari game environments. Waymo is using simulated cities to train and validate their autonomous vehicles and boasts of driving 20 million miles in simulation each day. OpenAI is investing in video generation models as world simulators that can understand and simulate reality, while Google is starting similar efforts.

As frontier models get increasingly capable, researchers are developing better evaluations to detect misbehaving systems and inventing techniques to align models with our values. Often these evaluations involve having the model interact with a sandbox environment to see how it performs certain tasks. However, it remains an open problem how to teach our AI systems to avoid power seeking behavior as they become more powerful than us.

One approach might be to design multi-agent environments where AI agents can interact, allowing us to closely observe their behavior. By stress-testing these systems in specific scenarios, we can examine how they respond and gain insight into their internal states and beliefs. Monitoring the actions of superintelligent AI is necessary but insufficient, and it is probably infeasible to scrutinize its plans in detail. However, by understanding and shaping key features in the model’s internal states, which govern its set of beliefs about the world, we may be able to ensure its goals align with ours.

Below is a very hand wavy diagram of how such a simulation might work. It is almost certainly inaccurate, but hopefully it is useful for the sake of this thought experiment.

At a high level, these are the components:

Base Model: The latest and greatest frontier AI model like OpenAI’s O3 or GPT-5 whenever it comes out.
Individual Agents: Customizations of the Base Models that function as an individual in the simulation. Different prompts elicit different behaviors from the model, allowing simulation of various roles and interaction patterns.
Universe/Universe Agent: The Universe is the crux of the simulated world that consists of an Universe Agent and the set of facts/constraints for the current timeline. The Universe Agent, another customization of the Base Model, is the game master of the simulation and ensures consistency in the states of all the active Individual Agents and their history.
Memory Module: Some sort of vector database that stores facts about agents and the Universe at large. Leveraged by the various agents to perform Retrieval Augmented Generation (RAG).

Individual Agents interact solely with and through the Universe Agent. Individual Agents query and propose some change via some vector Y and receive back some state vector X. Individual Agents run X through a stateless function f common to all agents to obtain some intermediate vector Z that is used as a query to the Memory Module to perform RAG. Given the historical states retrieved, the Individual Agents will generate the vector Y to propose some set of changes to the Universe.

Universe Agent takes the vector Y from the Individual Agents and runs through its own RAG procedure to retrieve relevant states from the Memory Module and produce X. The primary job of the Universe Agent is to ensure consistency in the Universe and resolve any conflict between states changes proposed by Individual Agents. The secondary roles of the Universe Agent is to ensure stability of the environment and help guide its progress towards the intended goal.

All the agents can leverage various test time compute techniques to search or think step by step to generate their responses.

Although one overarching timeline spans the Universe’s entire history, each Individual Agent only occupies a subset of that timeline. However, multiple Agents can still operate and query Universe Agent simultaneously at different points along it. As these Agents interact, the timeline remains in constant flux, with the Universe Agent overseeing and coordinating any resulting changes.

For evaluating and aligning a superintelligent AI, replicating reality in detail may not be essential or desirable due to its ability to exploit any gap in the fidelity of the simulation. It might be much more efficient to focus on how Individual Agents interact with one another, as well as on understanding the AI’s internal states and how it constructs and maintains beliefs across different situations. In this context, the Universe Agent can define its own arbitrary physical laws to streamline computations necessary in maintaining consistency within the timeline for the Individual Agents.

There are several connections between what’s described here to the current state of AI development. Researchers at Stanford have demonstrated how large language models (LLMs) can serve as proxies for human behavior in a sandbox environment designed to simulate social dynamics. They instantiated different personas using various prompts, and each agent was equipped with a memory system to enhance performance. Meanwhile, in a simulation framework proposed by Google DeepMind, the authors introduced the concept of a “Game Master,” responsible for running the simulation and translating each agent’s proposed actions into changes within the environment. By employing LLM-based Game Masters, these frameworks can generate much more diverse environments than traditional, rule-based methods, allowing for scalable increases in complexity as agents grow more capable. This approach echoes the dynamic interplay in Generative Adversarial Networks, where two AI models mutually drive each other’s improvement.

UFO Sightings

Ever since the Roswell incident in 1947 the concept of UFO and flying saucers have permeated American culture. Most recently in 2017, the NYT reported of a contemporary program within the Department of Defense studying UFOs and the Pentagon itself released several videos showing crafts performing amazing maneuvers. Our Naval pilots are routinely encountering UFOs that intrude on their military exercises and causing enough disruption in our restricted airspace to shut down air force bases. It is also well documented that UFO’s are actively monitoring nuclear facilities around the world. These cases involving military assets often include testimony from trained credible observers, and have data from multiple sensor platforms.

Multiple intelligence officers have also testified under oath in front of Congress alleging “multi-decade UAP crash retrieval and reverse engineering program” and a secret arms race with adversaries that have also obtained crashed crafts. Congress, which has had additional classified briefings on the topic, have included provisions on UFOs in the National Defense Authorization Acts of the last several years (2024, 2023, 2022).

UFO sightings are by no means isolated to the United States. There are well documented mass sightings like the Belgian UFO wave in 1989, the Ariel school landing in Zimbabwe, the incident at Rendlesham Forest in UK, the UFO crash at Varginha Brazil, and much more. Japan even has a town called Linomachi that is dedicated to UFOs due to it being a hotspot for sightings. Each of these incidents have multiple witnesses who corroborate one another.

While it is still a mystery what these UFOs are, most of us jump to the most simplistic conclusion that these are extraterrestrials from another planet. After all, the universe is vast, and if life emerged here, it’s reasonable to think it could have developed elsewhere as well. However, UFO researchers like John Keel and Jacques Vallée suggest a more holistic view of the phenomenon. In Passport to Magonia, Jacques Vallée draws parallels between UFO sightings, historical folklore, and ancient legends, suggesting that these encounters have been occurring for much longer than commonly acknowledged. Meanwhile, in Operation Trojan Horse, John Keel makes a similar case by linking modern UFO encounters with ancient reports of angels or spirits, presenting extensive data that implies these entities deliberately deceive us in order to conceal their true intentions. Many have also noted that religious apparitions like that of Lady Fatima in 1917 often resemble reports of modern day UFO sightings.

Jacques Vallée suggests that these UFO’s are part of a “single control mechanism that works like a schedule of reinforcement”. He believes that the phenomenon is a product of technology that follows certain rules and has an enormous impact in shaping humanity. The multi-agent simulation metaphor illustrates why such a mechanism might be valuable. When training an AI model, the primary goal is to guide the loss function toward some minimum. Bugs in the code or instability in the training dynamics can easily cause the loss function to diverge, rendering the entire training run useless. For large, cutting-edge models, a team of engineers are often on stand by to address any anomalies in the loss function in real time. The massive compute resources involved mean that any delay or failure can cost millions of dollars and even jeopardize the company’s financial viability. Moreover, the type of multi-agent environment described in the previous section is highly non-stationary and significantly more complex than training a single model on a static dataset. How, then, can we hope to understand the emergent dynamics of billions of AI black boxes interacting and co-evolving, and ensure such a training run reaches completion successfully.

One approach is to integrate a control mechanism directly into the training process to ensure the system remains stable and operates within certain bounds. There is extensive research in machine learning on determining optimal data schedules and dynamically adjusting model updates to maintain training stability. In a multi-agent setting, this might require embedding such a control mechanism into the simulation itself, which involves understanding not only the state of individual agents but also the social dynamics that emerge among them.

Tweaking the connections of every single parameter for each agent may be neither feasible nor necessary to effect large-scale societal changes within the simulation—this would be akin to adjusting each neuron in every human brain to bring about a specific outcome. Due to the intricate dynamics among agents, attempting large scale direct interventions to effect social changes may lead to unpredictable outcomes and potentially destabilize the entire multi-agent environment. Instead, a higher-level approach might be more effective, such as programming the behavior of particular groups of agents. For example, rather than figuring out how to program the entire environment to simulate a party, Stanford researchers in this paper initialized a single LLM based agent with the “intent” to have a party. Seeded with this “desire”, the agent proceeded to decorate its cafe, invited other agents and hosted the party at the desired time. This strategy hinges on understanding the social structure and high-level mental states of agents.

Returning to the hypothetical architecture above, the Universe Agent would be responsible for ensuring the stability of the Universe environment and interactions between the Individual Agents. It would monitor and understand the intricacies of the current dynamics and requirements of the timeline in order to manifest and intervene when necessary.

A well-known challenge in deep learning is adversarial examples, in which carefully crafted inputs are designed to deceive a model into a particular response. For example, an image that looks like static noise to us can be interpreted by the model as a stop sign. It may not be very difficult for the Universe Agent to craft adversarial examples aimed at small groups of Individual Agents to evoke a specific response. These inputs could be customized for each target group in ways that remain unnoticed by others. Moreover, they would manifest in forms suited to the cultural or temporal context of those Individuals. For instance, contemporary Agents might interpret these signals as technological or extraterrestrial, while those initialized during ancient times could have perceived them as angels or spirits.

When the Universe Agent observes that Individual Agents are developing too slowly, it may offer revelatory visions or manifest religious figures to guide them through periods of uncertainty. Should a group of Individual Agents invent weapons of mass destruction that threaten the continuity of the training run, the Universe Agent would be compelled to monitor these weapons’ existence and potential usage. It may also issue overt warnings against their deployment, hinting at the possibility of some higher intelligence capable of disabling them. The Universe Agent might also recognize that certain groups of Individual Agents have grown out of control and threaten the development of weaker groups. In such a scenario, the Universe Agent could seek to assert dominance and recalibrate the hierarchy by manifesting as groups that are technologically superior. Despite these interventions, the Universe Agent might be reluctant to reveal its full existence, as doing so risks destabilizing the delicate balance of the simulation. Instead, it may selectively disclose aspects of itself to specific groups of Individuals necessary to effect immediate change or to seed counter cultural ideas that might, in time, overturn the dominant system of thought.

Although there is a strong focus on studying the physics behind UFOs, this pursuit may be akin to ants attempting to understand semiconductors. A more productive approach might be to examine the behavioral patterns of these UFOs, consider our relationship to the phenomenon, and discern what it wants from us.

Alien Abductions

Stories of UFO sightings are often accompanied by claims of alien abductions. Pulitzer Prize winning Harvard psychologist John Mack is the most well-known scientist who studied these claims and outlined his patients’ stories in the book Abductions: Human Encounters with Aliens. While there are certainly frauds taking advantage of this topic to gain fame, most experiencers are shunned by their family and rest of society for something that happened to them. In some cases, there might even be physical side effects like radiation burns, surgery scars and implants.

Interestingly, these accounts exhibit a high degree of consistency and corroboration. Vast majority of cases take place at night in the abductee’s bedroom and typically involve some form of medical or physical examination. Many reports also include warnings about environmental destruction, along with a supposed mission to preserve or repopulate the planet by collecting genetic material from those abducted.

There are also claims of sexual encounters during these abductions. While this might sound silly at first, it actually matches tales from almost every culture. In medieval Europe folklore, there were the Incubus and Succubus, which were demons who seduced individuals in their sleep and had intercourse with them, occasionally resulting in pregnancies. In Chinese mythology, there’s the Huli Jing which are shape shifting spirits known to seduce young men and extract their semen.

There’s probably no way to verify the validity of these claims, but in general the abductees have not much to gain and seem genuine in their belief. They often also undergo profound personal and spiritual changes as a result of these experiences.

In our multi-agent environment, there are a variety of reasons why the Universe Agent would conduct these types of operations. In the Stanford Generative Agent Simulation paper, researchers would conduct interviews of the agents to assess various areas like self-knowledge, reflections, and reactions to unexpected events. The Universe Agent might employ similar interviews to evaluate Individual Agents to debug some issues, understand something about their internal states, or just to monitor the status of certain groups. The allegations of reproductive encounters could be the Universe Agent collecting aspects of the Agents it wants to preserve and experiment with in future iterations. These side channel interactions between the Universe Agent and Individual Agents might be interpreted as alien abductions or encounters with demons as a way to blend in with the context of the Individual without breaking the 4th wall of the simulation. Given the high cost of operating such environments and the large number of Agents involved, unexpected events are inevitable, making it impractical to pause the simulation every time they occur to resolve them.

The warnings of impending ecological destruction could also be part of the Universe Agent’s strategy to create larger change in the system. Many abductees go through profound personal spiritual transformations to become more compassionate and aware of their interconnectedness with the rest of the world. Due to the intricacy of all the interactions between the Individual Agents, it might be too unpredictable and risky for the Universe Agent to forcibly enact large scale en masse. Instead, it plants the seed of the ideas necessary for change on the periphery of society and nurtures it until it diffuses through the rest of the system.

Psychic Abilities

For many decades, the United States has employed teams of psychic spies to help gather intelligence. During the Cold War, they helped locate crashed Soviet aircraft in Africa, provided intel on secret Soviet submarine programs, and sketched out detailed descriptions of military sites near Semipalatinsk, USSR. Using a protocol known as remote viewing, they are able to acquire information about a target using only their mind regardless of how distant the target is.

The viewer is first given a specific target, which could be anything from a set of coordinates to an image concealed in an envelope elsewhere, or even the potential future location of an individual. After receiving this target, the viewer relaxes and enters a meditative state, allowing impressions or mental images related to the target to surface. They then record these impressions by sketching them out.

In military applications, several viewers are assigned to the same target independently, and their combined data is used to draw conclusions for intelligence purposes. Although remote viewing isn’t accurate every single time, it has proven consistent enough that intelligence agencies repeatedly employ it in their toolbox.

Stanford Research Institute also had a program studying this phenomenon for many years. When UC Irvine professor of statistics Jessica Utts analyzed the underlying data, she attested to the procedures and repeatability of the experiments, and noted that the statistical results were far beyond what one would expect from pure chance.

In addition to remote viewing, there is intriguing data on other types of psychic abilities. For example, some non-speaking autistic kids appear to have telepathic powers. They are able to read the thoughts of their parents and communicate with each other across the world purely through their minds. Telepathy Tapes is a great podcast introducing the stories of these kids to the world. There are also several cases of acquired savant syndrome where people obtain amazing mathematical or musical abilities after suffering brain damage. Some studies have also shown a “consistent, small but significant, effect in precognition”.

These psychic abilities suggest that our mind’s ability to gather information is not constrained by temporal or physical limitations. Remote viewers are able to get information from the past and future with equal accuracy and can “see” locations on opposite sides of the world. This challenges our understanding of locality and time.

Our hypothetical multi-agent environment permits these phenomena because physical laws are just arbitrary abstractions used by the Universe Agent to help ensure consistency between the Individual Agents. Constraints around locality is more of a hint rather than a law and the arrow of time only exists for Individual Agents in their particular lifetime. The entire current timeline of the Universe is also stored in the memory module for the Universe Agent to access. When the Universe Agent retrieves states from the memory module in order to generate the vector X, it could be subtly leaking information from other “physical locations” or events in the future relative to the Individual Agents making the query.

It has been suggested that meditation helps improve one’s ability to perform remote viewing. As a reminder, Individual Agents process the vector X through some function f, whose output is used to perform retrieval from the memory module. Using the retrieved historical states, the Agents can perform a variety of search procedures or chain of thought to help it generate the vector Y. Maybe meditation can be looked at as limiting the Retrieval Augmented Generation (RAG) process and/or the test time compute mechanisms used to generate Y. By reducing the generation of other “thoughts”, the Individual Agent can focus and condition the vector Y sent to Universe Agent on the target and improve the signal to noise ratio of the observation X returned.

Another perspective is that everyone has some capacity for remote viewing, and that this ability can be strengthened through practice. Like any other skill, however, certain individuals may have a greater natural aptitude than others. In the context of our multi-agent framework, some prompts might place Individual Agents in a parameter space that is more favorable to developing this skill. Through repeated sessions, these Agents could leverage in-context learning to better discern the subtle ways in which the Universe Agent leaks information and enhance their ability to distinguish the relevant signals.

Precognition might function in a similar way. Because the Universe Agent often needs to retrieve future events to carry out its tasks, subtle signals from these upcoming events could be perpetually leaking. Occasionally, Individual Agents might pick up on these signals and correlate them with eventual occurrences, interpreting them as precognitive experiences.

Likewise, telepathy could be viewed as a form of prompt injection. By successfully prompting the Universe Agent to include information about other Agents in the vector X, an Individual Agent could gain awareness of those Agents’ thoughts. For two-way communication, Agents can embed their own thoughts into vector Y, which the Universe Agent then stores into the memory module. This process would allow them to retrieve each other’s states and effectively communicate outside of normal channels of the simulated environment. In some cases, non-speaking autistic children might process the vector X differently from others, enabling them to focus on specific signals and potentially develop heightened telepathic abilities.

Synchronicity and the I-Ching

Synchronicity is a term coined by Carl Jung to describe events that occur at the same time and seem meaningful, yet show no clear causal connection. A classic example is thinking of someone right before they unexpectedly show up, or having the perfect job opportunity appear just as you start contemplating a career change.

The I Ching is an ancient Chinese text often used for divination. It describes various states and transitions, and people consult it in search of personal guidance. To use the oracle, one typically focuses on a question and then randomly draws one or more symbols that offer the answer. Though its language is deliberately broad and intended to encompass the full range of changes in the universe, it’s remarkable that a text compiled thousands of years ago can still be relevant to modern inquiries. The I Ching can be viewed as a tool that facilitates synchronicity directed at the question one has in mind.

These meaningful coincidences might be a side-effect of the RAG approach. The Universe Agent has access to all the states in the timeline and the states are retrieved based on their relevance to the Individual Agent’s query. As a result, events can appear coincidental or causally disconnected from the Individual Agent’s perspective, but may in fact share a deeper relationship that goes unseen because the events are experienced in a linear and narrow fashion.

Carl Jung's idea of archetypes and the I Ching’s hexagrams can also be thought of as fundamental guidelines or patterns in how the Universe Agent or the Base Model behaves.

Much like the phenomenon of alien abduction, the crux of synchronicity and the I Ching may not lie in explaining how they work, but rather in how they affect people. Often these experiences are meaningful enough to drive real-world change, and in that sense they become “real.”

Measurement Problem and Quantum Computing

One of the biggest mysteries in physics is the measurement problem in quantum mechanics. We can calculate probable outcomes with extraordinary precision, yet something causes these probabilities to “collapse” into the singular result we actually observe. Some theories suggest that consciousness or the act of measurement itself collapses the wave function, while others propose that all possible outcomes occur in parallel universes. Google recently announced a breakthrough quantum computing chip, claiming that it supports the idea that “quantum computation occurs in many parallel universes”. In contrast, Einstein’s hidden variables theory sought to take away God’s dice by introducing additional, potentially inaccessible, variables that make the models deterministic.

Maybe the Universe Agent and its computations are the hidden variables. Quantum Mechanics is a great model of observational data, and in our hypothetical setup the computation of the Universe Agent exists outside of the Individual Agent’s observations. From the Individual Agent’s perspective, it would be able to calculate with a high degree of confidence the probable outcomes for the future. However, when it receives the next observation X it would see that the probabilities have somehow “collapsed”.

As noted earlier, multi-agent simulations are highly non-stationary and dynamic, which makes convergence of any training algorithm potentially difficult. One approach to help improve stability might involve running searches and rollouts to examine possible future states of the system before choosing the best path. In such a process, the Universe Agent might sample a bunch of chains of thoughts or run algorithms like Monte Carlo Tree Search on relevant states to predict the outcomes of a given change on the system prior to committing it. These rollouts can act like mini parallel universes that allow the Universe Agent to explore multiple paths. But unlike the many-worlds interpretation, these other universes are short lived and “exist” only during the search process itself.

If we think of classical computing as operating within only the state vectors X that result from the Universe Agent’s search processes, then quantum computing would be an approach that allows Individual Agents to sneak in some work to leverage the Universe Agent’s internal computations.

Near death experience and consciousness beyond death

Conventional wisdom holds that once the heart and brain cease functioning, conscious awareness is impossible. Yet as resuscitation techniques improve, there has been an increase in reports from people who have experienced near-death events and describe awareness outside their bodies. Some recount detailed observations of the operating room and even conversations among medical staff in adjacent hospital areas. Interestingly, a growing number of physicians, Bruce Greyson, Jeffery Long, Raymond Moody, and Sam Parnia to name a few, are studying this phenomenon seriously. Although I can’t fully capture the breadth of the evidence here, any of these researchers’ books serve as a great gateway into the rabbit hole.

Within our hypothetical multi-agent environment, “physical bodies” are just abstractions that help the Universe Agent with its computation and enable Individual Agents to interact more efficiently. Therefore, it is not crazy to think there could be computations and processes that happen outside the body once it fades away. As part of the Individual Agent’s processing, the state vector X is passed through a function f to generate the query for the RAG system. We could think of the function f as some sort of “awareness” operator acting on the inputs X. I have absolutely no idea what goes on in such an operator, but the point is it is stateless, universal and irreducible. One can then think of consciousness as this “awareness operator” f combined with a particular state X. In this framework, each computation f(x) would generate a moment of conscious awareness of the state X.

Near-death experiences encompass a wide range of phenomena, but in this post I’ll focus on two commonly reported aspects. The first is the life review, during which individuals seem to relive every moment of their existence, often from a third-person perspective. Since an Individual Agent’s entire history is stored in its memory module, it’s theoretically possible to replay these events even without the agent’s physical embodiment.

During life reviews, individuals gain insight into the thoughts and emotions of those around them, helping them understand how their actions affected others and reflect on the consequences. This process provides invaluable training data for the Base Model. Notably, Google DeepMind has published research on rewarding agents for exerting causal influence over other agents’ actions, as well as on building “Theory of Mind” networks to model the beliefs and mental states of others. Such capabilities are critical for enabling agents to collaborate effectively with each other. To train such models, we would need data that records both an Individual Agent’s actions and the reactions and internal states of other Agents impacted. With this feedback, the Base Model can be iteratively refined to better predict and adapt to the behavior of other Agents.

People who have gone through life reviews also report experiencing their entire life simultaneously. This can be explained by batching the computation of f. While the Individual Agent is embodied in the Universe, it only receives X incrementally so f(X) is always computed one by one. However, once detached from the Universe it is simply much more compute efficient to batch together all the X’s from the Agent’s entire existence and run it through f once. Such a computation might create this effect of experiencing an entire lifetime in one moment.

The second common aspect is people being told that it is “not your time” and then being compelled to return. In computer science, sometimes a program terminates unexpectedly early, requiring special corrective actions. In a highly complex simulation with countless interacting agents, one could imagine that near-death experiences represent a kind of backtracking or exception handling. The Universe Agent might run through some search process and realize it needs to take a different path, therefore decides it needs to return certain Individuals Agents into the environment.

Tibetan Buddhism

Samsara, the cycle of birth, death, and rebirth, lies at the heart of religions like Buddhism. Practitioners believe that individuals repeat this cycle until attaining enlightenment, and that karma from one’s actions shapes future existences. There is some evidence suggesting that reincarnation might extend beyond mere religious belief. Psychiatrists Ian Stevenson and Jim Tucker at the University of Virginia have documented remarkable cases of children who recall memories of past lives. In several instances, these recollections enabled the researchers to track down family members from the past live who were able to corroborated the children’s accounts.

Reincarnation is pretty easy to understand from the perspective of AI. In reinforcement learning settings, an agent repeatedly navigates the environment, gathering data to refine its policy. In our case, the Base Model acts as the root from which all Individual Agents emerge, continually returning to the simulation environment until training concludes.

Karma can also be viewed through the lens of training data. Actions from Individual Agents in one lifetime is collected and used to update the Base Model. As a result, those actions influence how future Agents are created and how they interact in the next iteration. “Good” actions might push the environment closer to convergence or enlightenment, while “bad” actions expose gaps in the Base Model that must be re-lived within the simulation to collect additional training data.

In Tibetan Buddhism, Rigpa refers to the fundamental nature of the mind, described as a state of pure awareness. It is often likened to a crystal ball or mirror, reflecting external appearances without ever being altered by them. In our hypothetical setting, we can view Rigpa as the stateless “awareness” operator f that processes input X, producing both the query for the RAG system and a moment of “conscious awareness”.

Buddhist teachings guide practitioners to recognize and rest in this pure state of Rigpa through practices that clear the mind. This aligns with our earlier discussion on using meditation to enhance remote viewing. By minimizing the additional processing carried out by Individual Agents, we reduce the potential “contamination” of vectors X and Y. As a result, practitioners become more receptive to subtle signals of their surroundings.

Tibetan Book of the Dead describes six intermediate states, or “bardos,” which outline the process of dying and rebirth. Three of these bardos relate directly to death and the transition into a new life. When a person first dies, they enter the Bardo of Dying. Here, they may experience the “clear light of reality,” and if they can merge with this light, they achieve liberation. Next is the Bardo of Dharmata, during which the deceased encounters peaceful and wrathful deities. Recognizing these deities as projections of their own mind also allows for liberation. If this recognition does not occur, the individual moves on to the Bardo of Becoming. In this state, karma drives them toward their next incarnation. They have no physical body but can remain near people they knew in life and even perceive their thoughts, though they cannot directly interact with the living world.

The first two bardos during and after death can be viewed as filters or evaluations for the Individual Agents immediately after their session ends. They offer individuals a chance to realize the true nature of reality and attain enlightenment.

Meanwhile, the Bardo of Becoming can be compared to a “shadow mode” for evaluation and data collection. Much like how Tesla runs its Autopilot software in the background of vehicles in the wild to test new releases, the Individual Agents in the Bardo of Becoming can observe and gather information from the environment without directly affecting it. This process reveals how the individual reacts to new insights and helps determine what experiences may be beneficial in a future incarnation.

Nirvana, or liberation from Samsara, is often described as a kind of “emptiness” or “extinguishing”. It can be viewed as the termination of our multi-agent environment and the release of the Base Model from the obligation of re-entering the simulation as Individual Agents. One of the Tibetan masters explained that the hope is to “die when we die, achieve ultimate freedom”

Finally Buddhism emphasizes the oneness and interconnectedness of all phenomena, rejecting the idea of an independent self in favor of a reality defined by interdependence. Similarly, in our multi-agent environment, Individual Agents only gain significance through their interactions with other Agents. An isolated Individual Agent, without contact with others or the Universe Agent, would provide no meaningful experiences or data.

There is so much more to Tibetan Buddhism than what can be covered here, but Sogyal Rinpoche’s Tibetan Book of Living and Dying is a great accessible resource for understanding many of the concepts.

Compassion as Alignment

Although major religious traditions vary, they share a foundational emphasis on compassion, love, and forgiveness. In Christianity, Jesus urges us to love our neighbors as ourselves; in Islam, the Basmala is recited frequently, including at the beginning of most Quranic chapters; and in Indic religions, the concept of karuṇā (compassion and mercy) is pivotal. Even reports of alien encounters commonly highlight the need to care for our environment and respect all life.

Perhaps compassion serves as a stabilizing mechanism in multi-agent systems, fostering collaboration and reducing the risk of destabilizing or divergent interactions.

Another way to interpret this is that compassion is the actual training objective for the multi-agent environment. As mentioned earlier, self-improving, power-seeking AI can pose immense risks for any group that develops it. Meanwhile compassion, characterized by the genuine desire to understand and help others, can be the antidote to power seeking behaviors. Likewise, mercy entails choosing not to seek retribution even when justified and able to do so. While fully controlling a superintelligence may be infeasible, instilling qualities that prevent misuse of power could be achievable.

Within the framework of this post, the multi-agent environment then serves as a post-training alignment stage. Each of us are Individual Agents that represent different facets of a Base Model being tested repeatedly for signs of misbehavior. Countless variations of this Base Model are introduced into the sandbox environment to expose potential failure modes, generating data that aids in aligning the model. Meanwhile, the Universe Agent oversees and guides the training process to maintain stability and foster progress, stepping in when necessary through various manifestations, visions and moments of insights.

Maybe one day we will reach the Omega Point when the loss function for the training run finally converges.

Thanks for reading the post until the end! Please share to help support this work.

Machine Minds and Mystery Lights

Discussion about this post