This is an automated archive made by the Lemmit Bot.

The original was posted on /r/storage by /u/Lanky_Barnacle1130 on 2023-06-22 14:26:29+00:00.


We have a fairly resilient architecture for a VMWare vCenter/ESXi on-prem cloud. But we have this one Achilles Heal we need to figure out. We are not using Ephemeral Storage (on the ESXi hosts). Right now, every vCenter cluster (4-8 ESXi hosts) is connecting to two datastores via NFS. The datastores are on Unity Storage Arrays. All Virtual Machine disks (VMDKs they’re called in VMWare-ese), are on these back-end datastores. So any maintenance on routers, causes these VMs to fall down because root file systems cannot be accessed.

I questioned whether it was a good or best practice to put “all” VM storage disks on Storage Arrays. I am also not sure using NFS is the wisest idea either (maybe iSCSI or Fibre Channel is better way to go but we didn’t have the money at the time for new cards).

I am reading about Virtual SAN, Virtual Volumes, et al. But most of what I read has to do with load balancing and selection based on capacity, and not fault tolerance.

Any suggestions on how we can come up with a more fault tolerant storage approach for VMs would be welcomed!