PACOTTI

  1. Image
    Sheldon Pacotti, pictured in 2035.
  2. Image
    Efua Amankwah-Crouse in 2040.

PACOTTI is named after Sheldon Pacotti, whose work inspired Efua Amankwah-Crouse to create the first prototype at Google.

PACOTTI is an applied architectural component for deep learning systems. It combines an attention-based transformer architecture with a reconfigurable differentiable neural computer (DNC), which enables artificial intelligence (AI) systems to dynamically infer, transform, recombine and generalise high-level conceptual structures without the need for explicit training on bulk data sets.

The introduction of PACOTTI is considered the beginning of the third generation of AI. Unlike second-generation AI architectures, which relied on minimum learning, maximum scale (MLMS) approaches that contributed to model collapse, PACOTTI allows for dynamic substructures that rapidly create and align bespoke attention models as new task-relevant information is provided.

PACOTTI’s ability to create neural substructures to solve problems for which there is no human analogue has subsequently enabled the architecture to be applied more widely to other classes of problem, most notably serving as the core processing technology for neural colloids.

History

Predecessors

The explosion of interest in artificial intelligence (AI) technology in the 2010s led to a wave of architectures known as second-generation AI systems, defined by their emphasis on training large foundational models using extremely large data sets. [1] The best-known example of this was the transformer architecture, which emerged in the late 2010s and remained popular until the early 2030s. [2]

First-generation AI models mostly relied on recurrence to train their model weights, a process which is slow and inefficient. Second-generation models improved this through architectures and structures which allowed parallelisation, which made it possible to obtain improved performance through larger-scale investment and infrastructure rather than innovations in algorithms or system design. [3]

Google

In 2031, Google‘s AI centre in Accra formed the Speechriver team, one of a number of Google research projects aimed at indirectly exploring alternative machine learning architectures. Speechriver’s aim was to research comprehension and translation of African languages, particularly the Niger-Congo language family and dialect continuums across sub-Saharan Africa. This problem was considered relevant to addressing model collapse because of the high number of languages, complex dialect continuums, and unavailability of data for many language subgroups, forcing retraining on ambiguous and error-prone information. [4]

Efua Amankwah-Crouse and Curtis Frye, who had joined Speechriver as early transfers from other teams within Google’s Accra centre, focused their early experiments on attempting to mitigate issues relating to model collapse in existing transformer architectures. In particular, one prototype proposed by Frye allowed encoder and decoder stacks to “speak” to one another through a secondary communication network called Janus. None of these early prototypes showed any promise, and the team began to explore more distinct architecture designs. [5]

In early 2032, Amankwah-Crouse began experimenting with a new architecture inspired by Sheldon Pacotti, who had proposed augmenting transformers with differentiable neural computers (DNCs). [6] The final breakthrough came when Amankwah-Crouse adapted Frye’s design into a new component called the Janus attention filter. This first prototype was able to learn to recognise and cross-translate fifty languages and dialects from the Niger-Congo group using 20% of the training data required to train a transformer. Modern PACOTTI units are over one hundred times as efficient as this prototype.

Zhupao

Image

Xu Shaoyong is often credited with enabling the international adoption of PACOTTI.

In January 2034, Amankwah-Crouse joined Zhupao after she was contacted by Xu Shaoyong, who had expressed an interest in PACOTTI’s potential as a sustainable and environment-friendly model for AI, given that it largely eliminated the computationally expensive process of training AI models on big data sets. [7] Xu assembled an internal team under Amankwah-Crouse’s leadership and created an AI innovation centre in London City with funding from Zhupao Campus[8]

In June 2034, Amankwah-Crouse and her team began a life-cycle assessment (LCA) study to assess PACOTTI’s energy consumption when deployed on megascale systems. In their published results, they observed an increase in efficiency and performance of PACOTTI over time, despite the system not being retrained or recalibrated. Follow-up studies concluded that the most likely explanation for the improvement was that the processes PACOTTI had developed to handle tasks were better adapted for large-scale deployment, and it naturally performed better as the project increased in scale.

When it was concluded, the LCA study had shown that PACOTTI would reduce corresponding CO2 emissions by 70% when compared to standard AI models. [9] Subsequent to this, Xu announced that all of Zhupao’s AI and data services would be switched over to PACOTTI, an effort that was completed as of 2037. In June 2035, Amankwah-Crouse and Xu signed an open letter to promote PACOTTI’s adoption as an international standard. [10]

Mass adoption

In April 2040, Amankwah-Crouse joined the G6 project, which had been blueprinted by Zhupao as part of a cooperation strategy between the Chinese Communist Party (CCP) and the World Health Organisation (WHO). [11] By September 2040, Amankwah-Crouse and her team had developed the first version of G6 as a multimodular infranet equipped with PACOTTI units to process external databases, translate across 7,000 different languages and dialects, and design inference algorithms.

Following the passing of Resolution ES-13/6 in February 2041, the WHO organised several working groups with Zhupao to outline the terms of a charter for the international use of G6, which included the adoption of PACOTTI as an international standard by the International Telecommunication Union (ITU).

The inclusion of PACOTTI in G6 is considered to be the primary cause for its high prevalence, rapidly making it one of the most widely distributed person-scale technologies in human history. As of 2049, it is estimated that over 93% of all systems regulated by the International AI Standards Treaty contain at least one PACOTTI unit in their systems description plan. [12] Of the regulated systems that do not contain PACOTTI units, the majority are second-generation or older systems that are maintained for legacy continuity in financial institutions, government systems, and inaccessible remote systems such as pre-2030s spacecraft or deep sea monitoring stations.

Architecture

Image

Block diagram for a single encoder/decoder PACOTTI unit. The output of the encoder on the left is wired to the inputs of the decoder on the right.

The PACOTTI architecture consists of three main components: a quasi-transformer stack (QTS), a differentiable neural store (DNS), and a Janus attention filter.

Quasi-transformer stack

The quasi-transformer stack is the main core of the PACOTTI architecture. It is responsible for receiving, tokenising, and embedding the input signal, as well as un-embedding and composing the output response. [13]

Inheriting some of its structure from the architectures popular in the 2020s, the QTS inside PACOTTI is composed of a series of encoders and decoders arranged in sequence. Information flows through each one, and at each stage new information is encoded into the signal in the input phase, or decoded from the signal in the output phase. This added information shapes how the following layers treat the inputs and outputs.

This stack is referred to as a quasi-transformer because, unlike a traditional transformer component, it does not use dedicated attention layers to moderate its encoding or decoding process. Transformer architectures were able to learn and recognise context through the computationally intensive training of smaller components known as attention heads, which could force contextual relationships in an input sequence. PACOTTI replaces this feature with two other components.

Differentiable neural store

The differentiable neural store is an artificial neural network (ANN)-based information processing unit. This component of PACOTTI is sometimes referred to as a DNC due to the similarities in its architecture, but it is more accurately described as a “headless” DNC, designed specifically to act within a larger system such as PACOTTI. The DNS is an ANN attached to a large memory store, which it can write to and read from depending on what inputs the network is presented with.

The DNS is an abstract block of data, which the network can learn to use in different ways depending on how it is trained. In the same way that a computer’s hard drive can store both static files and executable programs, a DNS can be trained to store both simple information and complex processes. In PACOTTI, the DNS provides long-term abstract computational support for processing inputs from the QTS. At each step of the encoding/decoding process, the DNS is given a weighted function of the network’s signal and can apply a transformational vector to it in response, which changes how the QTS perceives the signal. This allows PACOTTI to store long-term knowledge, intentions, beliefs, and processes, and apply them consistently over long periods of time.

Janus attention filter

The Janus attention filter acts as a bridge between the QTS and the DNS. As information flows through the QTS during the encoding steps, the attention filter applies weights to parts of the DNS, affecting which ones are activated. During the decoding step, the attention filter applies weights to the decoder stack based on the reactions of the DNS[13]

The Janus filter is described as the “meta-rational” part of the PACOTTI reasoning process, allowing it to think about its thinking process. [14] The component gets its name from the two-faced Roman god, since the filter mediates the flow of attention in both directions: towards and away from the DNS. Janus was briefly the codename for the project at Google prior to publication. [15]

Applications

PACOTTI was originally applied to natural language processing (NLP) as part of a project based at Google’s AI centre in Accra. Its high levels of efficiency and ability to generalise and restructure its neural memory enabled a significant leap in performance for the translation of languages on a dialect continuum, such as those in the Niger-Congo family of languages, and the family of Arabic-derived languages and dialects spoken across Africa and the Middle East.

PACOTTI units have been built into AI systems designed to tackle a range of tasks, including:

  • Quantum error correction (QEC)
  • Superparallel network optimisation
  • Disease modelling and management
  • Neural encoding and decoding

Neural colloids

Image

A neural colloid uses PACOTTI to align itself with an individual’s unique neurometric fingerprint and establish a brain-computer interface (BCI).

PACOTTI is most commonly applied as the information processing core in neural colloids to interpret real-time signals from the brain. Outputs from PACOTTI can take the form of direct neural signal injection back into the brain, or transmission of data outside of the colloid.

PACOTTI requires calibration to an individual’s neurometric fingerprint before it can operate normally, which is typically conducted through administering the Six Point Alignment. The successful calibration of a colloid enables a “frictionless” application of neurometrics that can bypass event-related potentials (ERPs) in favour of more foundational neural patterns that do not require any external stimuli.

See also

References

  1. Villabolos, P; Ho, A; Sevilla, J et al. (October 2022). “Will we run out of data? Limits of LLM scaling based on human-generated data.” ICML 2024
  2. Vaswani, A; Shazeer, N; Parmar, N et al. (June 2017). “Attention is All You Need: Advances in Neural Information Processing Systems.” NeurIPS 2017
  3. Amodei, D; Hernandez, D. (May 2018). “AI and compute.” OpenAI
  4. Hao, K. (April 2031). “Transformed: The rise and fall of the large language model.” Bitrot Weekly
  5. Kazemi, Z. (January 2034). “What is the PACOTTI architecture, anyway?” Bytedance Scientific
  6. Pacotti, S. (July 2018). “Designing Intelligence.” Towards Data Science
  7. Strubell, E; Ganesh, A; McCallum, A. (June 2019). “Energy and Policy Considerations for Deep Learning in NLP.” Cornell University
  8. Wilhite, H. (February 2034). “Efua Amankwah-Crouse joins Zhupao.” AP Wire
  9. Eveleigh, S. (April 2036). “If mining and refining data is the new fossil fuel industry, the PACOTTI neural net is renewable energy.” MIT Technology Review. 
  10. Acar, J. (June 2035). “Over 6,000 scientists and researchers sign open letter to promote environment-friendly AI model.” The Guardian
  11. World Health Organisation. (May 2040). “China-WHO Country Cooperation Strategy 2041-2045.” WHO Regional Office for the Western Pacific
  12. Benny, R. (April 2049). “Annual Report on Intelligent Systems.” United Nations
  13. Amankwah, E; Frye, C; Smith, A et al. (April 2033). “Verification of a compact and efficient dual-process transfer learning model.” ICLR 2033 
  14. Scott, T. (December 2040). “Thinking about thinking about thinking - Royal Society Christmas Lectures.” Royal Society
  15. Zhou, V. (August 2045). “On Everyone’s Mind: In Conversation With Efua Amankwah-Crouse.” Rest of World Quarterly Review