Physical Address

304 North Cardinal St.
Dorchester Center, MA 02124

Meta offers new memory components that help to improve cognition, reducing hallucinations


Subscribe to our daily and weekly newsletters for the latest updates and content from the industry’s leading AI site. learn more


As businesses continue to adopt large-scale language models (LLMs) in a variety of applications, one of the challenges they face is managing visual information and reducing imagination. In a new paper, researchers at Meta AI ask”scalable memory layers,” which could be one of several solutions to the problem.

Flexible memory modules add more modules to LLMs to increase their learning ability without requiring additional modules. The architecture is useful for applications where you want to save extra memory for real-time information and also want nimbler-style speed.

Dense is memory layers

Traditional Ethnic languages use “thick lines” to put more information in their fields. In dense sectors, all sectors are fully utilized and are activated simultaneously during the indexing process. Fat parts can learn complex tasks, and expansion requires additional resources and energy.

In contrast, for simple factual information, very simple components with coherent memory structures can be efficient and interpretable. This is what memory layers do. They use small simple activations and value analysis methods to store and access information. Small partitions take up more memory than dense partitions but only use a smaller number of partitions at a time, making them more computationally efficient.

Memory units have been around for several years but are rarely used in modern deep learning architectures. It is not optimized for modern hardware accelerators.

Current LLMs use some form of “professional mixer” (MoE) architecture, which uses parallel machines and memory partitions. MoE models are made up of sub-specialists who specialize in specific tasks. At the time of referral, the input method determines which expert will be activated based on the input. MY FRIENDThe architecture recently developed by Google DeepMind, expands MoE to millions of experts, and provides more power on the components that are introduced during the cognitive process.

Development of memory layers

Memory chips are lightweight for computing but heavy on memory, which presents a real challenge for modern systems and software applications. In their paper, the Meta researchers propose several modifications that solve these problems and make it possible to use them at scale.

Memory units
Memory units can store information in parallel across multiple GPUs without reducing the quality (source: arXiv)

First, the researchers designed the memory layers to be parallel, and distributed across multiple GPUs to store millions of units of value without changing other parts of the model. They also implemented a special CUDA kernel for high-bandwidth applications. And, they developed a partitioning system that supports a single set of memory across multiple memory partitions within an instance. This means that the keys and values ​​used in the lookup are divided into fields.

These changes make it possible to install memory modules inside LLMs without reducing the quality.

“Memory regions with low levels of activation are better connected to dense networks, leading to higher cognitive and computational abilities,” the researchers wrote. “It can be easily scaled up, giving engineers an attractive new way to integrate memory and computing.”

To test the components of memory, the researchers changed Llama pictures by replacing one or more hard drives and sharing memory. They compared the memory-enhancing models against the condensed LLMs and the MoE and PEER models on a number of subjects, including answering specific questions, international scientific and intellectual knowledge and literature.

Memory model vs dense components
The 1.3B memory model (solid line) trained on 1 trillion tokens approaches the performance of the 7B model (dashed line) on real-world query-answering tasks when given memory allocations (source: arxiv)

Their findings show that memory expands significantly over denser cores and competes with models that use 2X to 4X more. They also compare the performance of MoE models with the same budget and number of units. Model performance is particularly evident in tasks that require specific knowledge. For example, for real-time question answers, a memory model with 1.3 billion units is approaching the performance of Llama-2-7B, which has been trained twice as many symbols and 10X more.

Furthermore, the researchers found that the benefits of the memory models remained the same with the sample size when they scaled their experiment from 134 million to 8 billion sessions.

“Given this finding, we strongly recommend that memory components should be integrated into next-generation AI architectures,” the researchers write, adding that there is much room for improvement. “In particular, we believe that new learning strategies can be developed to improve the performance of these sessions, resulting in less forgetting, fewer distractions and more continuous learning.”



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *