Physical Address

304 North Cardinal St.
Dorchester Center, MA 02124

Google’s new neural-net architecture LLM separates memory resources to improve the rate of energy consumption and computation


Subscribe to our daily and weekly newsletters for the latest updates and content from the industry’s leading AI site. learn more


A new neural-network design developed by researchers at Google can solve the main problem of large-scale linguistics (LLMs): increasing their memory during prediction without incurring the cost of memory and computation. He has been summoned Titansthis structure helps the model to acquire and retain during the short term the information that is important in the long term.

Titans combine traditional LLM overlays with “neural memory” components that enable models to perform well in both short- and long-term memory tasks. According to the researchers, LLMs that use long-term memory can exceed millions of tokens and outperform older LLMs and alternatives such as Mamba where they have fewer shares.

Components of interest are linear models

The advanced construction of the transformer used in LLMs uses it a way to take care of yourself calculating the correlation between indicators. This is a useful method that can study complex and small patterns in the token flow. However, as the length of the sequence increases, the computer and memory costs for calculating and storing the interest quadruple.

The latest ideas are concerning other constructions which is very complex and can grow without bursting memory and accounting. However, Google researchers argue that similar models do not show competition compared to old editors, because they compress their data and tend to miss important points.

A good architecture, they say, should have different layers of memory that can be connected to use existing information, memorize new information, and learn more from their context.

“We argue that in the effective learning process, similar to (that of) the human brain, there are separate but related modules, which play an important role in learning,” the researchers wrote.

Neural long-term memory

“Memory is a combination of systems — i.e., short-term, working, and long-term memory — each serving a different role in different neurons, and each can function independently,” the researchers wrote.

To bridge the gap between modern languages, the researchers propose a “neural long-term memory” that can learn new information during the cognitive process without the failure of the cognitive process. Instead of storing information during training, the neural memory module learns a function that can memorize new information during the experience and dynamically change the memorization process based on new experiences. This solves the generalization problem that neural network architectures suffer from.

To determine which information should be stored, the neural memory module uses the concept of “surprises.” The increasing number of symbols varies with the type of information stored in the sample weights and available memory, it is more surprising and therefore must be memorized. This allows the module to make better use of its limited memory and store pieces of data that add useful information that the model already knows.

In order to use very long sequences, the neural memory part has an adaptive forgetting mechanism that allows it to remove information that is no longer needed, which helps to improve the memory capacity.

The memory module can be complementary to the analysis method of the latest models, which the researchers describe as “temporary previous modules, looking at the same window size. On the other hand, our neural memory that can continuously learn from the data and store it in its weights can take part of long-term memory.”

Titan construction

Model of Titan architecture (source: arXiv)

The researchers describe Titans as a family of genes that include existing blocks and neural memory modules. The model consists of three main components: the “core” module, which acts as a short-term memory and uses the classical information system to access the current part of the input signals that the model is processing; the “long-term memory” phase, which uses neural memory architecture to store information beyond current events; it is part of “permanent memory”, parts of learning that remain stable after learning and retain information independent of time.

The researchers propose different ways to connect these three components. But in general, the main advantage of this architecture is to make the attention and memory of the modules to be compatible. For example, observational teams can use past and present history to determine which parts of the current window should be stored in long-term memory. Meanwhile, long-term memory provides historical information that is not available in current events.

The researchers ran small tests on Titan models, ranging from 170 million to 760 million units, on a variety of tasks, including language acquisition and long-form language tasks. He compared the performance of the Titans against different models based on transformers, similar models such as Scales and hybrids such as Samba.

Titans (red line) outperforms other models, including GPT-4, for long-range tasks in limited and well-structured formats (source: arXiv)

The Titans showed better performance in terms of production compared to other brands and outperformed transformers and linears of similar sizes.

The difference in performance is especially noticeable in long sequences, such as “a needle in a haystack“when the model has to extract information from a very long process, and BABYLONwhere the model has to think about the facts that are distributed in very long books. In fact, in this process, Titan did very well in the models with the largest partitions, including GPT-4 and GPT-4o-miniand the Llama-3 gene amplified by retrieval-augmented generation (RAG).

Additionally, the researchers were able to expand the Titans window to 2 million tokens while keeping memory costs to a minimum.

These models still need to be tested on a larger scale, but the results of the paper show that the researchers have not yet reached the ceiling of the Titans’ potential.

What does it mean for business applications?

It’s Google to be in front of the tall modelswe can expect this process to find its way to unique and open-minded things like Gemini and Gemma.

With LLMs that support long-term windows, there is a greater potential for programming where you integrate new knowledge into your knowledge rather than using methods such as RAG. The development process of implementing and iterating on a fast application is much faster than the complex RAG pipeline. In the meantime, infrastructure like Titans can help reduce long-term adoption costs, allowing companies to deploy LLM programs for greater use.

Google plans to release PyTorch and JAX code to train and analyze Titans models.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *