Dell PowerScale F910 First Impressions
Dell's storage platforms have a long history of consistent improvement through simple code updates, so the array you purchase today will have improved functionality three years from now. However, the constant march of hardware improvements at some point necessitates the release of a new model; storage systems don't usually iterate with each turn of the server crank. This year has seen three new entries in the Dell PowerScale ecosystem: the already-released F210 and F710, and the topic of this piece, F910. Don't let the naming fool you, the F910 compared to the F900 is every bit as impactful as the F710/F600 upgrade in terms of what the new hardware has to offer.
To begin with, if you're not familiar, WWT boats our Advanced Technology Center, an innovative production test environment where our clients conduct product validation and integration tests, and where my peers and I demonstrate and evaluate various technologies. Earlier this year, the Dell PowerScale product team approached us and asked if we were willing to beta test the new F910 hardware. A couple of weeks later, three Dell F910 nodes and two Dell S5232F-ON backend switches arrived, and we were off.
Improvements
The major change this beta investigates is the move to Dell PowerEdge 16G as the hardware platform. The new F910 uses PowerEdge R760 as its hardware base whereas the previous flagship F900 is based on the Dell PowerEdge 14G platform. Aside from advancements in CPU and memory, the big update is the PCI Express generation, from Gen-3 in F900 to Gen-5 in F910. Each generation roughly doubles the number of transfers per second, so the two-generation jump moves from 8GT/s (1GB/s) per lane to 32GT/s (4GB/s) per lane. With all of the additional internal bandwidth, the new system should be better able to drive the four 100Gb/s Ethernet ports as well as the NVMe drives in the front of the chassis.
For all of my fellow nerds out there, this next paragraph is for you. One of the things that makes PCIe unique when compared to its predecessors like ISA and PCI, is that it's packet-switched and point-to-point, like an Ethernet network; ISA and PCI/PCI-X utilized shared buses, meaning only one device could talk at a time, which led to contention. Additionally, on PCI, all devices operate as quickly as the slowest device on the bus. With PCIe, I can have multiple slower downlinks connected to a higher-speed uplink. This is relevant since the F910 can have PCIe 5.0 between the CPU and a PCIe switch and double the number of PCIe 4.0 ports coming out of it with no performance loss. In the case of F910, there are 24 NVMe drive slots in the front of a node and currently qualified drives are all PCIe 4.0.
Along with new hardware, F910 requires OneFS 9.8 to work, which has its own list of improvements for existing OneFS users. Most notably:
- Improved streaming write performance on all NVMe platforms
- Multipath client driver
- RDMA for NFS 4.1
- APEX File Platform for Azure
Let's tick through those first three; Azure is a topic for another article.
Multipath Client Driver
File storage is a client/server relationship, where my client talks to one IP to get access to its data. Multinode scale-out systems like PowerScale can grow performance and capacity independently and they're able to round-robin client connections using SmartConnect, but a single client cannot utilize all nodes in a cluster because SMB and NFS still operate at a 1-to-1 client-to-array connection ratio. Recent versions of Linux NFS include an option called nconnect, that allows the client to open multiple TCP streams to drive improved performance and to utilize both ports in a LACP bond, assuming a layer-4 hashing algorithm is in place. Even so, it's still one client to one IP. SMB also has a similar functionality introduced with Server 2012 R2 called multichannel SMB, which OneFS also supports. To better take advantage of all that a scale-out unstructured data platform can offer, Dell has a new multipath client driver. It allows for a single client to connect to multiple nodes in the cluster. For a long time, we've been doing this with scale-out block platforms, letting multipath software and round-robin connection methods spread IO out; this brings the same to file for those bandwidth-intensive workloads. Unfortunately due to timing, we could not test this as part of the beta. Expect to see a deeper review of this as well as multichannel SMB in the future.
RDMA for NFS 4.1
Remote direct memory access (RDMA) is not a new concept. It has been a fundamental underpinning of InfiniBand since it hit the market. Instead of inefficient buffer copies up and down the host stack, RDMA allows direct memory access between the client and its destination, placing the data directly in the remote host's memory for application access.
RDMA has been a part of several data protocols like NFS and iSCSI Extensions for RDMA (iSER) since 2003. In OneFS, it has been supported for NFSv3 since 9.2. OneFS 9.8 makes RDMA available for NFS 4.1, bringing reduced CPU utilization and increased performance to the stateful version of the NFS protocol.
Streaming Write Performance Improvement
Performing a write is a lot of work for storage systems. The entire write has to be ingested, mirrored across redundant components, and then acknowledged to the host. After that, it has to get deduped, compressed, chunked up, have a parity/mirror/erasure-coding scheme applied to protect against media or node failure, and be written to persistent storage. That's just one write; these systems are capable of millions in parallel. Oof. OneFS 9.8 improved the granularity of its internal locking mechanism using a technique called lock sharding. In multithreaded programming, there are critical sections of code that change shared variables. You don't want a race condition where two threads try to change the same variable at the same time, so programmers use locking functions to keep race conditions from happening. Basically, in OneFS 9.8 the shared data structures are split into granular chunks for locking purposes. So now, instead of blocking all other threads while needing to change a value, only a subset is locked, blocking just the threads that want access to the same shard. The net result is better overall performance since fewer threads are waiting on locks to clear.
Our Time With Dell PowerStore F910
Our time with the three-node F910 that Dell provided was unfortunately short. In our few weeks with the system, we threw some synthetic workloads at it, generated by ElBencho. Since F910 is largely a hardware refresh, we aimed to exercise the new hardware first and foremost. To try and quantify how much of an improvement it is over its ancestor, I pulled our ATC's three-node F900 into my test environment.
Test environment summary:
- 8x ESX nodes
- 2x25Gb/s each
- 6 workload VMs - RHEL9
- 8x vCPU
- 16GB memory
- 1 NFS mount to each PowerScale cluster
- Dell PowerScale
- 3x F900 nodes
- Dell S5232F-ON backend switch
- 2x100Gb/s uplinks
- 24x 7.68TB TLC drives
- OneFS 9.7
- 3x F910 nodes
- Dell S5232F-ON backend switch
- 2x100Gb/s uplinks
- 24x 15.36TB QLC drives
- OneFS 9.8
- 3x F900 nodes
Readers who didn't skim over the environment summary will notice two little differences between the two clusters. These subtle variances keep me from making any strict determinations about what the platform is capable of. If you didn't catch it, my F900 uses TLC flash drives and is running OneFS 9.7; the F910 has QLC drives and is running a beta edition of OneFS 9.8. The drives have some performance difference, but probably not as much as some would have you believe. The bigger deal in isolating performance improvements from F900 to F910 is the OneFS version difference. At the time of beta, OneFS 9.8 as an upgrade was request-only and I didn't have the time window to get it upgraded.
Now, what did I see in terms of performance differences? In an impressive but not surprising showing, the F910 turned out a 40% write performance improvement over our F900. To put a quantity on that improvement, it was an additional 800MB/s with no change other than what I've already mentioned. Note that one of OneFS 9.8's code improvements is the aforementioned lock sharding, driving improved write performance. Am I complaining? Nope. However, I'm not sure if the controllers or the code are the drivers of change. In reality, it's probably some of both.
At the end of the day, F910 is the new generation of storage-dense PowerScale all-flash nodes. While the F900 has no announced end-of-sale date, which is when Dell stops selling new F900s, I expect Dell would like to start getting customers onto the 16G PowerEdge platform so they can sunset the 14G systems, which launched in 2017 (PowerEdge, specifically, not PowerScale which was 2021). Additionally, Dell has investment protection for F900 customers in that the F910 is node pool compatible as long as you follow the best practice of maintaining the same drive size and count between the node types. The only corollary to that is for users of 15TB drives: 15TB drives exist in QLC and TLC; ideally you'd keep the same type of flash across your nodes. I am excited to have been a part of this launch and look forward to seeing what Dell will wring out of the new hardware in the future. The storage density of 24x 30TB QLC drives in these nodes coupled with the inline data reduction starts to make a stronger case for a move to all-flash file and object storage, especially when considering throughput-per-watt.