co-existence of storage and computing: a new era of data storage in the AI era
Neil  2026-02-25 20:29   published in China

Co-existence and calculation: AI new era of data storage

sun Gang | secretary General of data storage Professional Committee of China Electric Standard Association

 

[Abstract]]

looking back on the past year, I have a strong feeling when communicating with many peers: the profound changes in data storage in the era of artificial intelligence have changed from a technical topic to a core issue in the industry. A general consensus is forming-the data storage industry is stepping into a new stage AI native load-driven and fundamental structural changes. This time is far more than a linear upgrade of capacity or speed, but an all-round reconstruction that touches on the system architecture, computing relationships and even the industrial ecology.

We may have heard or seen such a scenario: In the laboratory, the computing power cluster that trains the trillion-parameter model suddenly generates an alarm. The original because it is not GPU it is not strong enough, but the storage system cannot "feed" them in time. This reveals a widely recognized truth: In the era of artificial intelligence, the role of storage has fundamentally changed. It is turning from the silent subsidiary device behind the computing become the key cornerstone for participating in intelligent construction. (Recommended reading: Why do artificial intelligence models need to be advanced AI storage? [1])

we did stand at a turning point. AI the storage revolution driven by IT challenges many traditional design logic. When data gauge model easy breakthrough PB level parallel direction EB step forward [2], when the storage system needs to start to understand data, not just storage bits the traditional boundary between storage and external storage becomes blurred-what we see is the dawn of a new storage era.

 

I,     qualitative change caused by quantitative change: EB architecture reconstruction in the level era

1.     From PB to EB: It's not just a digital game

in AI domain, the training data volume of a single large model has been changed from TB level jump PB level, future EB level training sets will become normal. Unlike any previous capacity transition, this level of growth is no longer a challenge that the linear expansion of device capacity can cope with. It is fundamentally overturning the design logic of storage systems.

The traditional storage architecture is based on the assumption that data access is localized and the system can optimize performance through policies such as caching and layering. But in AI in the load, this assumption begins to become invalid. The training of large models requires almost random access to a large number of small files, while the inference phase requires a huge dynamic state (such KV Cache). When the data size reaches EB level, traditional based on PCIe of I /O the peripheral device model appears to be powerless. Data in GPU, CPU multiple replication and protocol conversion are required between storage and storage. For example, goods need to be loaded and transported between multiple transfer stations. Its inherent high latency and extra overhead become the fundamental bottleneck restricting performance.

2.     CXL unified bus: redefining data channels

in this context, CXL( Compute Express Link) and similar unified bus architecture have stepped onto the historical stage. It uses the same physical path ( PCIe), but introduced a new set of memory-based semantics " high-speed traffic rules ". By providing cache consistency memory semantics, CXL allowed CPU, GPU share a unified memory space with storage devices, greatly reducing the overhead of data replication and format conversion. [3], it allows CPU and accelerators " load / storage " commands directly access the other's memory, eliminating a lot of unnecessary " parking check " link. This means that the storage device is repositioned as the memory extension of the system from the topological structure, rather than the remote peripherals, so that the storage is transferred from " peripheral device " reposition " computing partner ".

Imagine a scenario: In a traditional architecture, GPU when a training sample is required, the data must be SSD read to the host memory, and then copy GPU memory -- three transfers and two protocol conversions. And in CXL under the architecture, GPU direct access SSD data on, just like accessing your own memory. The performance improvement brought about by this change is of magnitude, and it is only the beginning.

3.     Challenges and Opportunities for unified namespaces

as the storage system scale expands EB level, another fundamental problem emerged: How to manage such a huge data namespace? The directory structure and metadata management mechanism of traditional file systems and object storage are difficult to face billions or even tens of billions of files. This is not only a performance issue, but also a semantic issue. -- when the number of files is so large, the traditional hierarchical naming method is meaningless.

The next generation of storage systems is exploring a new namespace architecture. Some Pioneer systems abandon the traditional directory tree structure and turn to addressing methods based on content hash, vector embedding, or Knowledge Graph. [4]. In this paradigm, data is not passed " path " found, but through " meaning " connection. For example, a " cat " the image file may no longer be located in /images/animals/cats/ under, but through its visual feature vector and all other " cat " directly associate images. This transformation requires storage systems to have basic content understanding capabilities. [5], which also marks the storage from " bit manager " xiang " semantic understanding " the evolution.

 

II,     co-existence and calculation: role reshaping from assistant to leading

1.     Training and reasoning: differentiation and evolution of storage roles

in AI storage plays a completely different but complementary role in different stages of workflow. During the training phase, " calculate " subject -- thousands GPU the card works together and continuously adjusts model parameters. At this time, the main task of storage is to efficiently supply training data and save checkpoints ( CheckPoint). However, behind seemingly simple tasks is a huge challenge: how to change the model state (possibly TB level) quick save? This gave birth " calculate by saving strength " the new paradigm -- with intelligent checkpoint policies, incremental snapshots, and fast recovery mechanisms, the storage system actually enhances the continuity and resilience of computing.

The reasoning stage presents another scene. At this time, the model has been trained and the focus is on how to efficiently serve user requests. Storage roles " data provider " extended " status holder " and " knowledge bearer ". KV Cache(Key-value cache) technology typically reflects this transformation -- this is like a conversation AI provided " short-term memory book " in order to accelerate the autoregressive generation process, the system needs to maintain a huge dynamic cache, stores previously generated key-value pairs. This cache can be up to hundreds GB and the access mode is highly random. Traditional storage architectures can do nothing AI the designed storage system can use multi-layer cache, intelligent prefetch, and GPU the close collaboration of video memory minimizes this sales.

What is more groundbreaking is that " replace storage with calculation " and " replace calculation with check " the rise. In some scenarios, directly storing and retrieving pre-calculated results is more efficient than real-time computing. For example, in the recommendation system, the matching scores of user features and items can be calculated in advance and stored as vectors; In code generation, common patterns can be cached and reused directly. This transformation blurs the boundary between computing and storage and redefines " intelligence " the implementation method. This is similar " every question is calculated from scratch " the master of mental arithmetic, turned into a master of possession " quick check manual " and " common answer Library " senior expert. For common questions (such as recommended products), directly read the manual (search cache) to give answers, thus focusing on solving real new problems.

 

2.     Data flow: from cold and hot layering to bidirectional activation

traditional Data Management " hot - warm - cold " one-way flow mode of: new data is hot, gradually cooling over time, and finally archived to cheap storage. AI the Times subverted this model because it endowed " wake up " the ability to cool data.

Consider such a case: an e-commerce company has a 10-year user purchase record, most of which are " cold data ". Traditionally, these data are mainly used for annual reports and compliance audits. But in AI driven, these data suddenly become valuable -- by analyzing the ten-year trend, the model can predict the long-term evolution of consumption habits; By connecting seemingly unrelated purchases, potential market opportunities can be found. Therefore, the storage system needs to support large-scale and high-efficiency " cold data activation ", will PB the level of historical data is quickly converted into a training available form.

This requires the storage system to have two new capabilities: one is highly automated Cross-layer data movement, which can dynamically adjust the data position according to the needs of training tasks; The other is " data preload " ability to predict the data to be needed and move it to the quick storage layer in advance. Some cutting-edge systems have even been introduced " data importance prediction " the algorithm determines the storage strategy based on the contribution of data in historical training.

 

3.     Storage functionalization: ubiquitous data services

perhaps the most fundamental change is the storage from " device " to " function ". In AI in infrastructure, storage is no longer limited to specific hardware boxes, but becomes a collection of capabilities scattered in all aspects of computing, storage, and network.

Huawei UCM(Unified cache manager, now officially called " inference memory data manager ") architecture provides a vivid example [6]. UCM the positioning is not an independent storage device, but a software suite that integrates multi-type Cache Acceleration algorithm tools. It serves as a cache coordination and scheduling layer for cross-data center work, its core value is that through the open North-South interface, it connects to diversified AI inference framework GPU video memory ( HBM), host memory ( DRAM) NVMe SSD even heterogeneous resources stored more remotely are integrated into a logically continuous and physically distributed unified memory data pool.

When GPU when data is required for large model reasoning, UCM intelligently HBM, DRAM and SSD dynamic scheduling between equal multi-level media KV Cache and other memory data. This not only greatly releases the memory pressure, but also more clearly explains that under this architecture, " storage " it is no longer a fixed place, but a ubiquitous data service that can be intelligently orchestrated according to computing requirements. -- where the data is located is the available storage.

 

III,     media fusion: when the limit between memory and external memory disappears

1.     Hierarchical ambiguity: From pyramid to continuous pedigree

traditional storage architectures are built on clear hierarchies: Registers, caches, memory, flash memory, disks, and tapes. Each layer balances speed, capacity, and cost. AI workloads are blurring these boundaries, pushing the storage hierarchy from " discrete pyramid " xiang " continuous pedigree " evolution.

The most significant change occurs at the junction of memory and external memory. Persistent Memory ( PMem) technology such Intel Optane although the market is suffering a lot, the direction it reveals becomes clearer: we need media that can not only access at a speed close to memory, but also retain data after power failure. This type of media is not to be replaced DRAM or SSD instead, a smooth transition is established between the two. In AI this transition is particularly important in scenarios. -- data structures such as embedded tables, dynamic caches, and intermediate activation values of models require speed and durability, but do not necessarily require DRAM excellent performance.

HBM the rise of (High Bandwidth Memory) represents the convergence in another direction. Traditionally, HBM considered GPU of " attached memory ", but the next generation architecture is exploring HBM as the cache layer of the entire system. Pass CXL such as interconnection technology, CPU shared access with other accelerators HBM to form a global high-bandwidth storage pool. This architecture is especially suitable AI inference scenario, where model parameters and KV Cache high bandwidth support is required.

 

2.     Intelligent layering: provides the best destination for data.

Media diversification brings new management challenges and also breeds new optimization opportunities. Traditional tiered storage is mainly based on the simple index of access frequency, while AI perceived storage systems can adopt more sophisticated hierarchical policies.

An advanced AI the storage system may consider the following factors to determine the data location:

1) calculate affinity: whether the data will be specified GPU frequent access?

2) access mode: is data read sequentially or accessed randomly? It's big IO or small pieces IO?

3) semantic importance: what is the role of data in the model? Is it a key focus or a compressible redundant parameter?

4) lifecycle: is the data a permanent model parameter, a temporary intermediate state, or a cache that is about to expire?

Based on these Multidimensional Information, the system can dynamically and intelligently place data. For example, in MoE( Mixture of Experts, mixed experts) model, only a few " expert " when activated in each inference, the system can put the parameters of active experts in HBM or high speed SSD, while inactive experts are placed in large capacity QLC SSD even HDD in. This semantic-based layering efficiency is several orders of magnitude higher than that based on simple access frequency.

 

3.     Long memory storage: AI time dimension

an important feature of human intelligence is having long-term memory. -- we not only respond based on current inputs, but also based on years or even decades of experience. Current AI most systems lack this long-term memory ability, and each conversation starts almost from scratch. Changing this situation requires fundamental innovation in the storage system.

" long memory storage " the concept came into being. This is aimed AI build a continuously growing " personal Biography " or " institutional knowledge base ", not just recording scattered conversations " short-term note ". It makes AI can connect with long-term context, form coherent personality and deep insight, more like an assistant with rich experience and memory. This storage not only stores data, but also stores the access context, association, and evolution history of the data. Technically, this may involve integrating the capabilities of vector database, graph database and time series database into the storage layer; In terms of architecture, this requires the storage system to maintain complex data relationships and metadata; in terms of algorithms, this requires a new index structure and retrieval mechanism, which can TB quickly find relevant information in the memory.

A specific implementation may be like this: every time AI when the system interacts with users, the key points of interaction are extracted as vector embedding and stored together with timestamps and context tags. When a new interaction occurs, the system quickly retrieves the relevant history to provide the model " memory context ". This capability requires the storage system to support complex neighbor search and graph traversal while maintaining extremely high throughput, which is the blind spot of traditional storage system design.

 

IV,     semantic perception: storage system " cognitive revolution "

1.     From " bit " to " meaning ": Content-Aware Storage

traditional storage systems know nothing about data content -- they store bits without understanding what they represent. In AI times, this " content blindness " the design of is becoming a bottleneck. When the sparse algorithm needs to know which weights can be safely pruning, when the knowledge retrieval needs to understand the semantic structure of the document, when the data cleansing needs to identify and repair damaged samples -- if the storage system can understand the data content, it can provide unprecedented optimization opportunities.

Content-Aware Storage ( CAS, Content-Aware Storage) it is the exploration in this direction. The core is that the storage system is built-in or integrated with a lightweight content analysis engine, which can automatically extract features and create indexes when data is written or managed. For example, a CAS the system can automatically identify Image modes ( CT, MRI, X light), extract key anatomical structures, Mark abnormal areas. For enterprises AI of CAS the solution goes further and is committed to automatically converting unstructured documents (such as reports and emails) AI understandable semantic knowledge. For example, IBM and NVIDIA cooperate to launch a content-aware storage solution, which enables integration AI microservices automatically extract information from text and charts and convert it into vectors for storage inside the system. More importantly, it can last " perception " data changes, only the updated parts are intelligently processed to ensure AI the information obtained is always up to date. [7].

In AI this ability is especially important in training. Large model training usually requires complex data pipelines, including cleaning, deduplication, balancing, and enhancement. Traditionally, these steps are completed by a dedicated preprocessing cluster. Data needs to be transferred multiple times between storage and computing. And in CAS in the architecture, many preprocessing steps can be completed during data storage or the first read, and the results are cached for subsequent use, greatly reducing data movement and repeated computing.

2.     Proprietary protocol stack: AI load customization

the general storage protocol is designed " meet most requirements ", but in AI in workloads, this versatility often means performance loss. AI data access has distinct pattern features: large-scale sequential reading during training, sudden writing during checkpoints, random reading during inference, and status update. ...... These patterns gave birth AI development of native storage protocols.

A typical example is the checkpoint protocol in large-scale distributed training. Traditional methods treat checkpoints as normal file writes, resulting in hundreds or even thousands GPU write data to the storage cluster at the same time, resulting in destructive randomness. IO. Next-Generation Protocol (such Universal Checkpointing) " multipart conversion " mode: each GPU the local model shards are written to storage, but are not directly merged into a single global checkpoint. Instead, the mapping between shards and global parameters is described in a common checkpoint format. When recovery is required, the system dynamically converts and loads the Shard data to the corresponding GPU on. This not only reduces the concurrent pressure on shared storage, but also decouples checkpoints from hardware configurations, providing fundamental support for scenarios such as elastic training and fault recovery. [8].

In addition to optimizing protocols for the training process, inference-Oriented Retrieval links have also created new standardization requirements, such as vector retrieval protocol standardization. With RAG( Retrieval Augmented Generation, search enhancement generation) has become the mainstream paradigm of large model application, and vector similarity search has changed from specialized data warehouse functions to basic requirements of storage systems. The storage system needs to provide efficient vector index construction, update, and query interfaces. These interfaces must AI deep integration of the framework, supports streaming update, hybrid search (vector keywords), multi-modal retrieval and other advanced functions.

3.     Storage microservices: modularization and scalability

the single storage architecture is giving way to the microservice design. In this new paradigm, the storage system is not a single giant, but a group of collaborative microservices: the index Service is responsible for metadata management, and the data service processing block I /O, the cache service coordinates multi-level caching, the retrieval service provides Vector search, and the security service handles encryption and access control.

This architecture brings multiple advantages. The first is scalability: each microservice can be expanded independently, avoiding the problem of over-configuring the entire system in the traditional architecture to obtain a certain capability; The second is flexibility: users can select and configure the required service combination according to the characteristics of workloads to form a customized storage stack. The most important thing is the speed of innovation: new storage functions can be quickly developed in the form of microservices, deploy and iterate without changing the entire storage system.

Taking sparse training as an example, this technique accelerates model training by skipping zero-value computation, but requires the storage system to understand the sparse mode of parameters. In the microservice architecture, you can develop special " sparse perception data service ", provides a sparse mask when providing parameter data, and even compresses and encodes sparse data directly at the storage layer. This deep integration is almost impossible in traditional architectures.

 

V,     AI native security: the immune system revolution of storage

1.     From border defense to endogenous security

traditional data security is based on " border defense " model: put data behind the firewall " safe deposit box ", strictly control who can access. In AI era, this model completely failed -- data must flow to generate value, model training requires massive data aggregation, inference services require low-latency access, and multi-party collaboration requires data sharing. ...... Data has never been so exposed or so fragile.

AI native security adopts a completely different philosophy: it is not an attempt to prevent data from leaving the system, but to ensure data security wherever it goes. This requires the implantation of security capabilities into every link of the data itself and the storage system to form " endogenous security " architecture.

Specifically, AI native storage security includes the following aspects:

1) data traceability: each data unit carries its source, transformation history, and usage policy, no matter how many times it is copied and where it is stored.

2) use policy execution: security policies are bound to data, not to storage locations. For example, a medical dataset may have " only for model training, not for service users " this policy is automatically executed when data is read.

3) privacy protection calculation: the storage system integrates technologies such as homomorphic encryption and secure multi-party computing, so that data can be used for training and reasoning in the encrypted state.

4) defensive defense: the storage layer can detect and defend against AI SPECIAL system attacks, such as data poisoning, model theft, and member inference.

2. Intersection of Secure Computing and trusted storage

hardware security module ( HSM) and Trusted Execution Environment ( TEE, Trusted Execution Environment) and other traditional security technologies AI the combination of storage requirements has created a new security paradigm. To Intel SGX or AMD SEV for example, these technologies can create a protected execution environment to ensure that even cloud providers cannot access the code and data in it. In AI this capability becomes particularly valuable in scenarios.

Imagine a multi-party joint training scenario: three companies want to train a model together, but they are not willing to share their own data. The traditional method is to establish complex legal agreements and technical isolation, which is complicated and difficult to completely eliminate risks. And in TEE under the enhanced storage architecture, each company's data is kept in its own storage, but the model is trained in TEE internal -- data is sent in encrypted form. TEE, which is used for internal decryption. The training results (models) are also encrypted and output. During the whole process, raw data is never exposed and even model parameters are protected.

This architecture requires that the storage system and TEE deep integration. The storage system needs to understand the encryption status of data and know which data can be sent TEE, which operations are required in TEE in. At the same time, the storage system itself also needs TEE run key components, such as the access control engine and audit log service, to prevent the storage software stack from being attacked.

3. Confrontation AI unique security threats

AI systems face security threats that traditional systems have not encountered. Storage systems must evolve to meet these new challenges.

Data poisoning defense: attackers influence model behavior by polluting training data. Storage systems can mitigate this threat by tracking data sources, detecting abnormal patterns, and versioning. For example, the system can record the source and processing history of each training sample. When abnormal model behavior is found, it can be traced back to data batches that may be contaminated.

Model theft protection: attackers rebuild models by using a large number of query and inference services. The storage system can defend against the query mode, detect abnormal access frequency, and implement rate limits. More advanced methods include implementation at the storage layer " differential Privacy ", automatically adds noise to the query results, making model reconstruction difficult.

Member Inference attack defense: an attacker determines whether a specific data is in a training set. Storage systems can reduce risks by training data management, access control, and auditing. For example, the system can ensure that the original training data cannot be directly accessed after the training is completed and can only be controlled through a specific review interface.

 

VI,     conclusion: new era of storage -- invisible Revolution of intelligent infrastructure

We are witnessing the most profound changes in storage technology since the invention of disks. This reform is not a gradual improvement, but a structural transition; It is not a breakthrough of single technology, but a comprehensive reconstruction of architecture, role, media, intelligence and security.

Future storage systems will no longer be " where data is saved ", but the carrier of intelligence, the cooperator of computing, the executor of security. It will understand the meaning of data rather than bits, participate in intelligent construction rather than support, and accelerate the flow of data value rather than storage.

As the boundaries between storage and computing become increasingly blurred, the transformation from Data to Knowledge becomes more direct, and the relationship between security and efficiency is also re-examined. What we see is not only technological progress, it is also a change in the way intelligence itself exists. The new era of storage is exactly AI truly become the cornerstone of general intelligence. On this road, every architecture innovation, every protocol optimization, and every security enhancement are the basis for building richer, more flexible, and more reliable memory and thinking for machine intelligence.

This invisible revolution is quietly taking place in global laboratories and data centers. Its influence will far exceed the scope of technology and reshape every corner from scientific research to commercial innovation. The only certainty is that those organizations that first understood and embraced this change will AI the times have an irreplaceable competitive advantage. Storage is not only a place to store historical data, but also a soil for future intelligence.

 

Replies(
Sort By   
Reply
Reply
Post
Post title
Industry classification
Scene classification
Post source
Send Language Version
You can switch languages and verify the correctness of the translation in your personal center.
Contribute
Name
Nickname
Phone
Email
Article title
Industry
Field

Submission successful

We sincerely appreciate your fantastic submission! Our editorial team is working diligently on the review process—please stay tuned.

Should there be any revision suggestions, we'll promptly reach out to discuss them with you!

Contribute
Article title
Article category
Send Language Version
You can switch languages and verify the correctness of the translation in your personal center.