Physical Address

304 North Cardinal St.
Dorchester Center, MA 02124

LLM’s new optimization method reduces memory costs by up to 75%


Subscribe to our daily and weekly newsletters for the latest updates and content from the industry’s leading AI site. learn more


Researchers at Tokyo-based startup Sakana AI have developed a new method that enables language models to use memory more efficiently, helping businesses reduce the cost of building applications on top of large language models (LLMs) and other Transformer models.

This method, called “everywhere transformer memory,” uses a special neural network to optimize LLMs to retain relevant information and discard information in their context.

Improved Transformer memory

Solutions for Transformer models, the backbone of LLMs, rely on the content of the “story the window” — that is, what they receive from users.

The knowledge window can be thought of as the working memory of the model. Controlling the contents of the display screen can have a significant impact on the model’s performance, which has resulted in a whole section of “quickly engineering.”

Modern models help very long story windows and hundreds of thousands, or millions, of symbols (the number of LLM representing words, parts of words, sentences, ideas and numbers entered by users in their information).

This allows users to add more information to their profile. However, long-term communication can lead to high computational costs and slow performance. Information management to remove unnecessary information while retaining important information can reduce costs and increase speed.

Current optimization methods are very user-intensive or require users to independently try different configurations to reduce the size of the input.

Neural attention memory modules

Universal Transformer Memory expands traffic using neural attention memory models (NAMMs), simple neural networks that decide to “remember” or “forget” any signal stored in the LLM’s memory.

“This new capability enables Transformers to discard useless or unnecessary information, and focus on the most important information, which we find to be very important in tasks that require long-term thinking,” the researchers wrote.

Universal Transformer Memory
Universal transformer memory (source: Sakana AI)

NAMMs are taught separately from LLMs and are combined with a pre-trained model during the presentation, making them flexible and easy to use. However, they require access to open internals for example, which means they can only be used for open source models.

Like other systems developed by Sakana AI, NAMMs are trained through evolution algorithms instead of gradient generation methods. By iteratively evolving and selecting the best performing models through trial and error, adaptive algorithms optimize NAMMs for optimal performance. This is especially important since NAMMs are trying to achieve an inseparable goal: to save or lose brands.

The NAMMs work on LLM’s visual parameters, one of the main features of the Transformer architecture that determines the relationship and importance of each signal in the model window. Based on caution, NAMMs decide which tokens should be kept and which can be discarded in the LLM window. This standard approach makes it possible to use NAMM training on a variety of models without further modification. For example, NAMM training on voice alone can be used for vision or multiple modals without additional training.

NAMM
Neural Attention Memory Models (NAMMs) analyze attentional parameters to determine which signals should be retained or discarded from the display window (source: Sakana AI)

Universal memory is working

To test the concept of global memory, the researchers trained NAMM on top of the open Meta. Llama type 3-8B. Their experiments show that with NAMMs, Transformer-based models perform well on natural language and coding problems for very long sequences. Meanwhile, by discarding unnecessary tokens, NAMM enabled the LLM model to save up to 75% of its storage memory while processing the tasks.

“Through our benchmarks, the NAMMs provide the best performance for the Llama 3-8B converter,” the researchers wrote. “Additionally, our memory system brings benefits, reducing the size of each component, while not being pre-programmed for memory efficiency.”

NAMM
NAMM models compete with leading technology and improve the performance of these models (source: Sakana AI)

He also tested the 70B model of Llama as well as Transformer models designed for other routes and applications, such as. Lava (computer vision) and Decision Transformer (motivational learning).

“Even in these settings that are not distributed, NAMMs retain their advantages by losing signals such as additional video frames and less action, which allows their new models to display the information necessary to improve performance,” the researchers wrote.

Job-dependent behavior

Another interesting finding is that NAMMs change their behavior based on the activity.

For example, for scripting tasks, the model discards matching symbols that correspond to comments and blanks that do not affect the execution of the code.

On the other hand, in the biological process, the model loses the symbols that represent grammatical redundancies and do not affect the meaning of the process.

The researchers released code for creating your own NAMMs. Strategies such as global transformer memory can be very useful for businesses that use millions of tokens and can benefit from faster scaling and lower costs. The reusability of NAMM training also makes it a versatile tool for use in a variety of industries.

In the future, the researchers suggest more advanced methods, such as the use of NAMMs during the training of LLM students to improve their memory skills.

“This work has only begun to capture the potential of our new class of memory, which we hope can provide many opportunities to improve future generations of processors,” the researchers wrote.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *