![]()
SK hynix launches upgraded AiMX memory card with vLLM framework support, cutting LLM token costs via attention offloading.
At the AI Infra Summit 2025, SK hynix showcased its next-generation AiMX card, which features an updated architecture and software enhancements, including native support for the vLLM (virtual large language model) framework. This integration not only boosts functionality for AI services but also ensures stable, long-token generation—critical for complex inference tasks in advanced reasoning models. The new solution is designed to tackle the escalating costs of token processing in LLM services by implementing attention offloading, a technique that distributes portions of attention computations to external memory, thereby reducing GPU workload and operational expenses[2].
By combining improved memory technology with advanced software, SK hynix addresses key challenges in AI infrastructure: performance, power efficiency, and cost. The company’s focus on heterogeneous systems (combining processing-in-memory and GPUs) positions it as a leader in AI memory solutions, with implications for both data center and edge AI deployments. These advancements are expected to lower barriers for organizations scaling LLM-based services while maintaining high throughput and reliability[2].
Source: https://news.skhynix.com/ai-infra-summit-2025/ SK hynix