PACOTTI

  1. Image
    Sheldon Pacotti, pictured in 2035.
  2. Image
    Efua Amankwah-Crouse in 2040.

PACOTTI is named after Sheldon Pacotti, whose work inspired Efua Amankwah-Crouse to create the first prototype at Google.

PACOTTI is an applied architectural component for deep learning systems. It combines an attention-based transformer architecture with a reconfigurable differentiable neural computer (DNC), which enables artificial intelligence (AI) systems to dynamically infer, transform, recombine and generalise high-level conceptual structures without the need for explicit training on bulk data sets.

The introduction of PACOTTI is considered the beginning of the third generation of AI. Unlike second-generation AI architectures, which relied on minimum learning, maximum scale (MLMS) approaches that contributed to model collapse, PACOTTI allows for dynamic substructures that rapidly create and align bespoke attention models as new task-relevant information is provided.

PACOTTI’s ability to create neural substructures to solve problems for which there is no human analogue has subsequently enabled the architecture to be applied more widely to other classes of problem, most notably serving as the core processing technology for neural colloids. In addition, the reusable system components and lightweight retraining process allows for greatly reduced energy and data costs for creating performant, scalable neural systems.

History

Predecessors

The explosion of interest in artificial intelligence (AI) technology in the 2010s led to a wave of architectures known as second-generation AI systems, defined by their emphasis on training large foundational models using extremely large data sets. [1] The best-known example of this was the transformer architecture, which emerged in the late 2010s and remained popular until the early 2030s, serving as the fundamental component of AI systems such as ChatGPT, DeepSeek, Kassandra, and UKAI[2]

First-generation AI models mostly relied on recurrence to train their model weights, a process which is slow and inefficient. Second-generation models improved this through architectures and structures which allowed parallelisation, which made it possible to obtain improved performance through larger-scale investment and infrastructure rather than innovations in algorithms or system design. [3]

Model collapse

Model collapse occurs in AI systems that are trained on data produced by AI models, leading to them becoming less accurate, performant, and reliable. [4] Recognised as a phenomenon among second-generation AI systems from as early as 2023, the regularity and severity of incidents caused by model collapse increased throughout the late 2020s. [5]

Initial model collapse events were caused by companies who intentionally trained their systems on synthetic data as a way to scale without acquiring more expensive first-hand data. Later in the 2020s, even companies avoiding synthetic data found it increasingly hard to acquire data sets that were free of AI-generated content. [6] A 2028 study estimated that 98% of websites contained AI-generated data, and that 30% of new content uploaded to the internet daily was AI-generated. [7] This made model collapse a more common occurrence across all deep learning systems, even those ostensibly trained on “pure” human data. [8]

Fears around model collapse peaked with the Lightless Dawn incident, when a pod of Indian drones diverted from a new flight route and collectively released their explosive payloads over the city of Kolkata during the Second Sino-Indian War, resulting in over 800 deaths. An investigative report into the incident stated that over 80% of the data used to train the drones’ command-and-control unit could not be traced back to an original authentic source, and that endemic retraining had led to a ghost layer forming in the network. [9] Official reports by the Indian government blamed Chinese hacking groups. [10]

The incident prompted several international military organisations, including the Chinese, Indian, and Russian Armed Forces, to restrict their use of autonomous weapons platforms that did not provide full data provenance, a decision which is thought to have contributed to India’s eventual push for peace. [11] This shift in military consumer behaviour, combined with growing public skepticism about the reliability of autonomous systems of all kinds, led to a surge in investment in novel AI architectures and solutions to the model collapse problem. This wave of investment was led by prominent weapons manufacturers as well as major technology companies. [12]

Speechriver

In 2031, Google‘s AI centre in Accra formed the Speechriver team, one of a number of Google research projects aimed at indirectly exploring alternative machine learning architectures. Speechriver’s aim was to research comprehension and translation of African languages, particularly the Niger-Congo language family and dialect continuums across sub-Saharan Africa. This problem was considered relevant to addressing model collapse because of the high number of languages, complex dialect continuums, and unavailability of data for many language subgroups, forcing retraining on ambiguous and error-prone information. [5]

Efua Amankwah-Crouse and Curtis Frye, who had joined Speechriver as early transfers from other teams within Google’s Accra centre, focused their early experiments on attempting to mitigate issues relating to model collapse in existing transformer architectures. In particular, one prototype proposed by Frye allowed encoder and decoder stacks to “speak” to one another through a secondary communication network called Janus. None of these early prototypes showed any promise, and the team began to explore more distinct architecture designs. [13]

In early 2032, Amankwah-Crouse began experimenting with a new architecture inspired by Sheldon Pacotti, who had proposed augmenting transformers with differentiable neural computers (DNCs). [14] The final breakthrough came when Amankwah-Crouse adapted Frye’s design into a new component called the Janus attention filter. This first prototype was able to learn to recognise and cross-translate fifty languages and dialects from the Niger-Congo group using 20% of the training data required to train a transformer. Modern PACOTTI units are over one hundred times as efficient as this prototype.

According to Amankwah-Crouse, the first official translation made by PACOTTI was Li Bai’s poem Sitting Alone in Face of Peak Jingting (独坐敬亭山) into Igbo.

Zhupao

Image

Xu Shaoyong is often credited with enabling the international adoption of PACOTTI.

In January 2034, Amankwah-Crouse joined Zhupao after she was contacted by Xu Shaoyong, who had expressed an interest in PACOTTI’s potential as a sustainable and environment-friendly model for AI, given that it largely eliminated the computationally expensive process of training AI models on big data sets. [15] Xu assembled an internal team under Amankwah-Crouse’s leadership and created an AI innovation centre in London City with funding from Zhupao Campus[16]

In June 2034, Amankwah-Crouse and her team began a life-cycle assessment (LCA) study to assess PACOTTI’s energy consumption when deployed on megascale systems. In their published results, they observed an increase in efficiency and performance of PACOTTI over time, despite the system not being retrained or recalibrated.

When it was concluded, the LCA study had shown that PACOTTI would reduce corresponding CO2 emissions by 70% when compared to standard AI models. [17] Subsequent to this, Xu announced that all of Zhupao’s AI and data services would be switched over to PACOTTI, an effort that was completed as of 2037. In June 2035, Amankwah-Crouse and Xu signed an open letter to promote PACOTTI’s adoption as an international standard. [18]

Mass adoption

In April 2040, Amankwah-Crouse joined the G6 project, which had been blueprinted by Zhupao as part of a cooperation strategy between the Chinese Communist Party (CCP) and the World Health Organisation (WHO). [19] By September 2040, Amankwah-Crouse and her team had developed the first version of G6 as a multimodular infranet equipped with PACOTTI units to process external databases, translate across 7,000 different languages and dialects, and design inference algorithms.

Following the passing of Resolution ES-13/6 in February 2041, the WHO organised several working groups with Zhupao to outline the terms of a charter for the international use of G6, which included the adoption of PACOTTI as an international standard by the International Telecommunication Union (ITU).

The inclusion of PACOTTI in G6 is considered to be the primary cause for its high prevalence, rapidly making it one of the most widely distributed person-scale technologies in human history. As of 2049, it is estimated that over 93% of all systems regulated by the International AI Standards Treaty contain at least one PACOTTI unit in their systems description plan. [20] Of the regulated systems that do not contain PACOTTI units, the majority are second-generation or older systems that are maintained for legacy continuity in financial institutions, government systems, and inaccessible remote systems such as pre-2030s spacecraft or deep sea monitoring stations.

Architecture

Image

Block diagram for a single encoder/decoder PACOTTI unit. The output of the encoder on the left is wired to the inputs of the decoder on the right.

The PACOTTI architecture consists of three main components: a quasi-transformer stack (QTS), a differentiable neural store (DNS), and a Janus attention filter.

Quasi-transformer stack

The quasi-transformer stack is the main core of the PACOTTI architecture. It is responsible for receiving, tokenising, and embedding the input signal, as well as un-embedding and composing the output response. [21]

Inheriting some of its structure from the architectures popular in the 2020s, the QTS inside PACOTTI is composed of a series of encoders and decoders arranged in sequence. Information flows through each one, and at each stage new information is encoded into the signal in the input phase, or decoded from the signal in the output phase. This added information shapes how the following layers treat the inputs and outputs.

This stack is referred to as a quasi-transformer because, unlike a traditional transformer component, it does not use dedicated attention layers to moderate its encoding or decoding process. Transformer architectures were able to learn and recognise context through the computationally intensive training of smaller components known as attention heads, which could force contextual relationships in an input sequence. PACOTTI replaces this feature with two other components.

Differentiable neural store

The differentiable neural store is an artificial neural network (ANN)-based information processing unit. This component of PACOTTI is sometimes referred to as a DNC due to the similarities in its architecture, but it is more accurately described as a “headless” DNC, designed specifically to act within a larger system such as PACOTTI. The DNS is an ANN attached to a large memory store, which it can write to and read from depending on what inputs the network is presented with.

The DNS is an abstract block of data, which the network can learn to use in different ways depending on how it is trained. In the same way that a computer’s hard drive can store both static files and executable programs, a DNS can be trained to store both simple information and complex processes. In PACOTTI, the DNS provides long-term abstract computational support for processing inputs from the QTS. At each step of the encoding/decoding process, the DNS is given a weighted function of the network’s signal and can apply a transformational vector to it in response, which changes how the QTS perceives the signal. This allows PACOTTI to store long-term knowledge, intentions, beliefs, and processes, and apply them consistently over long periods of time.

Janus attention filter

The Janus attention filter acts as a bridge between the QTS and the DNS. As information flows through the QTS during the encoding steps, the attention filter applies weights to parts of the DNS, affecting which ones are activated. During the decoding step, the attention filter applies weights to the decoder stack based on the reactions of the DNS[21]

The Janus filter is described as the “meta-rational” part of the PACOTTI reasoning process, allowing it to think about its thinking process. [22] The component gets its name from the two-faced Roman god, since the filter mediates the flow of attention in both directions: towards and away from the DNS. Janus was briefly the codename for the project at Google prior to publication. [23]

Training

First stage

The first stage of training PACOTTI is to train the DNS, which is responsible for directing and influencing the processing of data within the unit as a whole. Because the DNS operates independently of the main stack, it does not need to be trained on grounded or experiential data. Instead, it is trained to perform operations that mimic certain kinds of reasoning processes, such as those outlined by Pacotti in his original proposal. [14] As such, it can learn to perform these processes on any kind of data, given the right feedback.

While Google’s initial PACOTTI DNS was trained on integer sequences, researchers at Zhupao showed in 2037 that a stronger neural foundation can be developed through training on sources of randomness, such as atmospheric noise, background radiation, or quantum laser splitters. [24]

Rather than developing reasoning strategies specific to a particular data set or task, the DNS learns strategies for approaching the processing of data in the abstract, similar to the processes of decomposition and categorisation identified by Pacotti. [14] This means that a DNS can be trained once and then reused in multiple PACOTTI units without retraining, and apply learned reasoning strategies to new contexts. [25]

Second stage

After a DNS has been trained or a pretrained DNS has been acquired, the QTS and Janus filter can be trained. This stage has two phases: calibration and extrapolation.

  • In the calibration phase, the DNS is aligned with the domain that the system will be applied to. This involves feeding the system high-complexity conceptual exemplars and low-specificity procedural exemplars from the target domain. For example, for the domain of medical science, the system might be shown highly specific edge-case diagnoses of rare conditions as well as simple, high-level descriptions of generic surgical or diagnostic methods. This triggers high-intensity responses in the DNS system as it rapidly creates and redistributes a large number of new concepts, which results in clear, low-turbulence reasoning signals for the Janus filter to train on.
  • In the extrapolation phase, the QTS is fed minor variations of the same input data repeatedly, with a reward based on how well it is able to separate the static and dynamic parts of the data. Keeping with the same example of medical science, a QTS might be shown several thousand diagnostic reports of the same condition on different patients. This lowers the thrashing response from the DNS, which provides a more predictable reasoning chain. If the Janus filter has been correctly calibrated, this will provide clean and stable feedback signals from the DNS to the QTS, which allows it to train its encoders and decoders to interpret the DNS state as instructions for processing the input data.

Applications

PACOTTI was originally applied to natural language processing (NLP) as part of a project based at Google’s AI centre in Accra. Its high levels of efficiency and ability to generalise and restructure its neural memory enabled a significant leap in performance for the translation of languages on a dialect continuum, such as those in the Niger-Congo family of languages, and the family of Arabic-derived languages and dialects spoken across Africa and the Middle East.

PACOTTI units have been built into AI systems designed to tackle a range of tasks, including:

  • Quantum error correction (QEC)
  • Drone swarm co-ordination
  • Nanofabrication management
  • Superparallel network optimisation
  • Disease modelling and management
  • Neural encoding and decoding
  • Movie recommendations

Neural colloids

Image

A neural colloid uses PACOTTI to align itself with an individual’s unique neurometric fingerprint and establish a brain-computer interface (BCI).

PACOTTI is most commonly applied as the information processing core in neural colloids to interpret real-time signals from the brain. Outputs from PACOTTI can take the form of direct neural signal injection back into the brain, or transmission of data outside of the colloid.

PACOTTI requires calibration to an individual’s neurometric fingerprint before it can operate normally, which is typically conducted through administering the Six Point Alignment. The successful calibration of a colloid enables a “frictionless” application of neurometrics that can bypass event-related potentials (ERPs) in favour of more foundational neural patterns that do not require any external stimuli.

Criticism

The structures that form in a PACOTTI unit, particularly within its DNS, have been compared to similar structures observed in human brains. [26] In the years directly after the publication of the original research paper, some researchers and technologists proposed that PACOTTI might be the next step on the path towards achieving general synthetic cognition (GSC). Some used this as evidence that PACOTTI should be tightly regulated.

Some of these claims stem from a 2034 LCA study published by Amankwah-Crouse’s team at Zhupao, which showed an increase in efficiency and performance of PACOTTI over time. AI safety critics argued that this was evidence that the system was improving its ability to solve problems without oversight, and that it demonstrated that PACOTTI was capable of recursive self-improvement (RSI).

Elon Musk briefly threatened a lawsuit against Zhupao and Amankwah-Crouse for having “summoned an existential threat to the survival of the human race,” claiming that an AI system with RSI capabilities could potentially lead to an unavoidable singularity. Zhupao came out in support of Amankwah-Crouse, with Xu accusing both Google and Musk of attempting to “suppress PACOTTI because it threatens their AI livelihoods, which they continue to see as a question of access to computational resources for training huge models on tons of data.” [27]

Follow-up studies concluded that the most likely explanation for the improvement was that the processes PACOTTI had developed to handle tasks were better adapted for large-scale deployment, and it naturally performed better as the project increased in scale. Since the DNS is a black box within a black box, the exact manner in which the system solves problems cannot be interrogated, which means this theory cannot be proved.

See also

References

  1. Villabolos, P; Ho, A; Sevilla, J et al. (October 2022). “Will we run out of data? Limits of LLM scaling based on human-generated data.” ICML 2024
  2. Vaswani, A; Shazeer, N; Parmar, N et al. (June 2017). “Attention is All You Need: Advances in Neural Information Processing Systems.” NeurIPS 2017
  3. Amodei, D; Hernandez, D. (May 2018). “AI and compute.” OpenAI
  4. Shumailov, I; Shumaylov, Z; Zhao, Y et al. (May 2023). “The Curse of Recursion: Training on Generated Data Makes Models Forget.” arXiv
  5. Hao, K. (April 2031). “Transformed: The rise and fall of the large language model.” Bitrot Weekly 
  6. Heaven, W. (October 2028). “Disappearance of the author: How human data brokering became a billion dollar black market.” MIT Tech Review
  7. Tanaka, J. (February 2028). “Applications of compressible fingerprinting as AI content detection.” arXiv
  8. Ofori, R; Donnely, A. (June 2029). “Network-wide neural oversaturation and its role in the Grey Friday collapse.” 
  9. Majumdar, R. (September 2030). “Report on the May 13 Kolkata Incident.” UNAWA
  10. World News Wire. (January 2031). “Post-Modern Warfare.” Vessel
  11. Snow, D. (September 2047). “The Setting Sun: The Century Since The British Raj.” Penguin
  12. Hao, K. (January 2032). “Big Tech’s newest friend is also its oldest: gun runners.” The Guardian
  13. Kazemi, Z. (January 2034). “What is the PACOTTI architecture, anyway?” Bytedance Scientific
  14. Pacotti, S. (July 2018). “Designing Intelligence.” Towards Data Science  
  15. Strubell, E; Ganesh, A; McCallum, A. (June 2019). “Energy and Policy Considerations for Deep Learning in NLP.” Cornell University
  16. Wilhite, H. (February 2034). “Efua Amankwah-Crouse joins Zhupao.” AP Wire
  17. Eveleigh, S. (April 2036). “If mining and refining data is the new fossil fuel industry, the PACOTTI neural net is renewable energy.” MIT Technology Review. 
  18. Acar, J. (June 2035). “Over 6,000 scientists and researchers sign open letter to promote environment-friendly AI model.” The Guardian
  19. World Health Organisation. (May 2040). “China-WHO Country Cooperation Strategy 2041-2045.” WHO Regional Office for the Western Pacific
  20. Benny, R. (April 2049). “Annual Report on Intelligent Systems.” United Nations
  21. Amankwah, E; Frye, C; Smith, A et al. (April 2033). “Verification of a compact and efficient dual-process transfer learning model.” ICLR 2033 
  22. Scott, T. (December 2040). “Thinking about thinking about thinking - Royal Society Christmas Lectures.” Royal Society
  23. Zhou, V. (August 2045). “On Everyone’s Mind: In Conversation With Efua Amankwah-Crouse.” Rest of World Quarterly Review
  24. Eken, D; Simyon, A. (February 2037) “PACOTTI Unit Calibration via Pure Randomness.” Journal of Artificial Cognition
  25. Frye, C; White, P; Garcia, L. (March 2035). “PACOTTI Cells are Universal Zero-Shot Reasoning Frames.” Conference on Neural Information Processing Systems
  26. Ho, W; Sinapayen, L. (July 2036). “Neural correlates and PACOTTI unit substructures.” arXiv
  27. Green, B. (December 2034) “Xu Shaoyong: Google only scratched the surface of what PACOTTI can do.” CNBC