Friday, May 8, 2026·4 min read
Mounting S3 as NFS: Why FUSE Isn't Enough for Production
Search "mount S3 as NFS" and you'll get a grab bag of results: s3fs-fuse tutorials, rclone guides, AWS's Mountpoint announcement, maybe a Stack Overflow thread about nfs-ganesha. They all promise the same thing — make your S3 bucket look like a filesystem — but they don't all deliver, especially under real load.
Here's what the phrase "mount S3 as NFS" actually means, why the FUSE-based approach falls over in production, and what the real-NFS answer looks like.
Two Different Things Called "Mount S3"
There are fundamentally two architectures people are talking about:
Architecture 1: FUSE on the Client
Every client machine runs a FUSE daemon that translates filesystem calls to S3 API calls. Examples: s3fs-fuse, goofys, Mountpoint for Amazon S3, rclone mount.
[your app] -> [kernel VFS] -> [FUSE] -> [user-space daemon] -> [HTTP] -> [S3]
Every node mounting the bucket runs its own translator. There's no shared state, no shared cache.
Architecture 2: NFS Gateway Server
A purpose-built gateway server speaks NFS on one side and S3 on the other. Clients mount via standard NFS, no FUSE required.
[your app] -> [kernel NFS client] -> [network] -> [gateway] -> [cache] -> [S3]
The gateway has a real cache, handles protocol translation server-side, and presents a proper NFS surface to clients.
Both get called "mount S3 as NFS" but they behave completely differently.
What FUSE-Based Mounts Do Well
To be fair:
- Dev workstations: great for poking at bucket contents from your laptop.
- One-off scripts: "read this CSV from S3 as a file" is fine.
- Read-mostly workloads with large sequential files: acceptable throughput if the files are big and few.
Where FUSE Falls Apart
POSIX Is a Polite Fiction
FUSE-S3 mounts implement a subset of POSIX. The rest is either emulated badly or silently broken:
- Rename: S3 has no rename. FUSE daemons do copy + delete, which is not atomic. Checkpointing code that uses
os.rename()for consistency will corrupt data on concurrent failures. - mmap: Unsupported or badly emulated. Frameworks that mmap shards (numpy memmap, some TF ops) misbehave.
- Locking:
flock()usually does nothing. If two workers think they hold an exclusive lock, they don't. - Hard links: no.
- Permissions: approximated.
Per-Node Cache Fragmentation
With FUSE, every client runs its own cache (if it even has one). Eight training nodes all fetch the same shards independently. That's 8× the S3 requests, 8× the egress bytes, 8× the cache warmup time.
A proper gateway shares the cache across all clients hitting it. One fetch from S3 serves every node that needs it.
Kernel Entanglement
FUSE runs in userspace but hooks into the kernel VFS. A misbehaving FUSE daemon can hang any process that touches the mount, including with uninterruptible (D state) waits that survive SIGKILL. Debugging this at 3am on a production training cluster is not fun.
Container Unfriendliness
Running FUSE in containers requires either:
--privileged(security teams hate this)SYS_ADMIN+/dev/fusedevice passthrough- A CSI driver that shims the mount into the pod namespace
In Kubernetes, this becomes pod-spec surgery plus init containers plus node-level daemons. A real NFS mount is just... a volume.
Metadata Storms
LIST-heavy workloads (anything that os.walk()s a directory) become a storm of paginated S3 LIST requests. Each one is ~100ms. A thousand-file directory is ten seconds before your code even starts reading.
Cost in Request Units
S3 charges per GET. A FUSE mount with a tiny cache issues a GET per file per read. ML training reruns the dataset many times. Your "cheap" S3 bill includes millions of requests per run.
What "Real" NFS Gives You
A gateway that terminates NFS on one side and talks to object storage on the other side:
- Full NFSv4 semantics. Locking, atomic rename, proper file handles, delegation, the works.
- Shared cache across clients. One fetch serves all mounters.
- Predictable performance. NFS tail latencies are measured in microseconds on cache hits.
- Clean container integration. Standard
mount -t nfs4, no privileged mode. - Secure transport. WireGuard tunnels (Training Pipes does this) keep NFS off the public internet.
When FUSE Is Still the Right Answer
We're not saying FUSE S3 mounts are useless. They're great for:
- Exploring buckets from a laptop
- Backfilling data into analytics tools that expect files
- Low-volume batch jobs
- One-shot data-prep scripts
What they're not great for is the hot path of GPU-bound training.
What to Use for Production ML
The pattern that scales:
- Canonical storage in object storage (S3, GCS, R2, or our managed buckets)
- Regional NFS gateway with local cache sitting between compute and storage
- Standard NFSv4 mounts on every client
This is what Training Pipes is. You don't run the gateway. You point the CLI at a bucket, pick a region, and get a mount target your cluster can use.
bucketfs mount create --bucket my-data --region us-east-1 --protocol nfs
# Mount on any node:
sudo mount -t nfs4 $server:/my-data /mnt/data
No FUSE. No privileged containers. No per-node cache fragmentation. Just NFS.