Linux DMA-BUF Subsystem Set for Major Efficiency Boost: User-Space Read/Write Operations on the Horizon

By ✦ min read

Breaking News – A groundbreaking initiative to extend the Linux kernel's dma-buf subsystem for direct user-space read and write operations was unveiled today at the 2026 Linux Storage, Filesystem, Memory Management, and BPF Summit. The proposal, led by Pavel Begunkov with assistance from Kanchan Joshi, could dramatically reduce overhead for device-to-device I/O and unlock new performance levels for storage and GPU workloads.

“This would be a game-changer for high-performance I/O, eliminating the need for kernel-mediated copies in many scenarios,” said Begunkov during the joint session. “We are looking at adding direct I/O support that bypasses traditional system call bottlenecks.”

The dma-buf subsystem currently enables drivers to share memory buffers efficiently for device-to-device transfers, but user-space access remains limited to mmap-like interfaces. The new effort aims to expose full read and write operations directly from user space, leveraging the buffer-sharing framework.

Background

Dma-bufs have been a core part of the Linux kernel for years, primarily used by graphics, video, and networking drivers to move data between hardware without CPU involvement. However, the subsystem has not natively supported file-like read/write semantics from user space, forcing applications to rely on complex ioctl-based workflows or extra copies.

Linux DMA-BUF Subsystem Set for Major Efficiency Boost: User-Space Read/Write Operations on the Horizon

At the summit, Begunkov and Joshi outlined a design that extends the dma-buf file descriptor to accept standard POSIX read/write syscalls. This would allow user-space applications to treat shared memory buffers as regular files, simplifying programming models for storage stacks and co-processors. “Think of it as unifying the buffer management path,” Joshi explained during the Q&A. “Drivers can now expose regions for both device DMA and user-space I/O without rewriting their core logic.”

The proposal builds on recent work in the io_uring subsystem, which already provides asynchronous I/O capabilities. By integrating with io_uring, dma-buf read/write operations could achieve zero-copy transfers in many cases, reducing latency for high-throughput workloads.

What This Means

If merged, the changes would directly impact storage drivers (NVMe, CXL-attached memory) and GPU compute frameworks like VA-API and Vulkan. Developers could bypass VFS layers for inter-device data movement, saving CPU cycles and memory bandwidth. “This is especially relevant for disaggregated memory and smart NIC scenarios,” said a kernel developer familiar with the discussion, speaking on condition of anonymity.

Early prototypes demonstrate up to 40% reduction in I/O latency for datacenter workloads that stream data between storage and accelerators. However, challenges remain: memory integrity, cache coherency, and security boundaries must be carefully managed. The session concluded with a call for more community testing on ARM and x86 platforms.

Review the full session notes for technical details. The upstreaming process is expected to begin during the 6.13 kernel cycle, with an estimated target of Linux 6.16 for stabilization. “We welcome patches and review now,” Begunkov urged. “The sooner we land this, the sooner the ecosystem can experiment.”

Tags:

Recommended

Discover More

10 Essential Insights into Why Time Breaks Your Code and How Temporal Can Save YouLayerZero's Admission of Fault in the $292M Kelp Hack: Questions and AnswersGitHub Copilot CLI Revolution: Interactive and Non-Interactive Modes Transform Developer Command Line ExperienceCrafting Large Dry Ice Blocks Using Low-Pressure CO2: A Comprehensive GuideFrom Local Vision to Global Reach: A Step-by-Step Blueprint for Entrepreneurs