Building a Multi-GPU Deep Learning Machine on a budget
Here’s another story on building your own deep learning rig, containing the information I wish I had known a couple of years ago.
This story is aimed at building a single machine with 3 or 4 GPU’s. The big factors impacting my deep learning training capability has been number of available GPU’s and amount of available GPU VRAM. Having access to 3 or 4 GPU’s on a single machine can be really useful, but can be tricky to build.
The first consideration to make is what CPU/Motherboard combination to use. Each GPU should have a CPU/GPU bandwidth of x8 or x16, with 4 GPU’s at x8 requiring 32 lanes, at x16 64 lanes from the CPU(s). x16 will only give you a 0–5% performance increase over x8 and in most cases I’ve worked out is not worth the extra expense — see https://twitter.com/tim_dettmers/status/932654049874382848?lang=en
In the table above I note the chipsets supporting 4 GPU’s at ≥ x8 configuration.
X79 / C602
The oldest chipset listed above is X79 with a LGA2011 socket, which supports the venerable E5–26xx v2 series CPU’s. Many server motherboards supporting this chipset have dual CPU sockets such as the ASRock Rack EP2C602–4L/D16.
While a big advantage of LGA2011 socket builds is cheap DDR3 RAM (I bought 256GB for ~$600) and cheap used CPU’s ($200 for an E5–2580v2 with 10 cores), the main disadvantage is lack of motherboards supporting 4 GPU’s without the use of PCIe riser cables. To save yourself a lot of hassle with custom brackets, case modification and PCIe riser cables, you want a motherboard with either 7+ PCIe 3.0 full length slots or at least 4 slots with dual/triple spacing between them. The only motherboard for this chipset I have found with dual spacing to fit 4 GPU’s is the relatively rare ASUS Z9PE-D8 WS
If you could find one of these motherboards cheap, you could build pretty capable system for a good price.
The next most modern chipset is the X99 which was released in 2014. Here you will find M2 NVMe support, and generally abundant USB 3 ports. The motherboards supporting 4x GPU’s with dual slot spacing I have found are shown in the table below.
The Gigabyte GA-X99P-SLI has the advantage of being ATX form factor.
You can find an example of a build using one of these motherboards here: https://medium.com/@acrosson/building-a-deep-learning-box-d17d97e2905c
Another common platform is the X299 chipset with its somewhat affordable i7 78x0X 28 lane CPU’s and pretty expensive i9 44 lane CPU’s. Here there is a bit more choice, with examples such as the ASUS WS X299 SAGE, ASUS ROG STRIX X299-E GAMING, EVGA X299 DARK. There are however a ton of X299 boards with support for three dual spaced GPU’s.
I haven’t covered X399 the AMD Ryzen Threadripper chipset motherboards as haven’t looked into these at all, but Threadripper does offer 64 lane versions which may allow getting that extra percent or two out of your build.
To note, that standard PCIe spacing is 20.32mm. Some GPU’s are wider than 40.6mm for example the Gigabyte AORUS Gtx 1080ti is 55m wide — taking up 2.5 slots. Blower style GPU’s tend to be around two slot width and are designed for multi-GPU use (and importantly direct hot air out the back of the case). GPU’s with waterblocks are even narrower, generally a bit more than 1 PCIe slot wide.
You have three options for a 4 GPU rig, blower style GPU’s, AIO watercooled GPU’s or a custom waterloop. Having 4 x AIO waterccoled GPU’s in one machine could be logistically difficult and I have not seen this done. There are plenty of examples of blower style rigs, however I really dont like noisy machines and have preferred to build custom watercooling loops.
A key recommendation for watercooling I suggest is to buy all the same brand and model waterblocks for the GPU’s, ideally a well known brand like EK. This will allow you to use bridges across the GPU’s to connect the waterblocks without having to use lots of tubing as per my build below.
To have all the same brand and model waterblocks you will probably need all the same brand and model GPU’s or at least a reference design. I bought the GPU’s pictured above used (2 x Gigabyte AORUS , 1 x Gigabyte Gaming OC), and had to buy waterblocks from two different manufacturers. Connecting fittings to the middle GPU took a lot of trial and error.
A second recommendation, is to buy the biggest case you can fit in your room. Each GPU is going to need at least 240mm of radiator space (ideally >30mm thick), and preferably more. I have noted that 56mm radiators do a significantly better job than 30mm radiators, and to fit multiple large thick radiators you need a lot of room.
Third, have a radiator in the loop between each GPU — such that you are not pumping hot water straight from 1 GPU to the next, but cooling it down in between.
Also, try to put the reservoir in an accessible position where you can pour in water easily, and without risking spilling on the motherboard/GPU’s. this may be tricky and you might need to mount it externally. Quick disconnects may help here. Quick-disconnects can be usefull for draining a loop too / and or use a tap.
Fans make a big difference, I am using high static pressure Noctua Industrial PPC and Noctua high airflow fans for key radiators. PC case type coolermaster fans just dont push enough air though the radiator fins.
I use 2 pumps in parallel — such that if one fails, the other can still push enough water though the system until I notice the failure.
Finally, for 4 GPU’s you’re going to need a 1600W PSU, I was using a 1200W PSU with 3 x GTX 1080ti’s and occasionally would get a power out when all GPU’s happened to spike in power draw.
April 2020 Update:
I sold the EVGA X299 FTW K motherboard (+ i7 7820X CPU) I was using for a build and have bought a Supermicro X9DRi-LN4F+ motherboard on ebay. The Dual socket R (LGA 2011) C602 chipset board has 24 DIMM slots supporting 1.5TB ECC DDR3 RAM! and 4 x16 PCI-E 3.0 slots (though can only use 3 if using dual spaced GPU’s without PCIe riser cables).
The reason for the sale: I wanted 128GB (and ideally option for more) of RAM for multi GPU training and large Pandas datasets. DDR4 RAM (in particular gaming RAM) is really expensive (~200 AUD per 32GB used), while DDR3 server RAM is really cheap (if you look around).
I ended up buying the used Supermicro motherboard for $200 AUD+ $100 post) ($130 + $65 USD), and bought on local online classifieds: 2 x E5 2690 CPUs plus 196GB of DDR3 ECC RAM for $200 AUD. A total of $500 AUD for motherboard + 2 cpus (total of 16 cores, 32 threads) + a ton of RAM. I am still waiting on the board to get here and may write a new post of the new build
In addition I have sold my watercooling components and am just running air-cooling (for 3 GPUS, — not physically possible to aircool 4 GPU’s with my setup but 3 is plenty anyway)
My tips for bargain hunting on classifieds:
Be patient — if you can
Look for items with little description, ot titles such as ‘parts from HP Z820’.
Look in items a week or older — seller may be getting sick of waiting. (Unless you are looking for GPU’s — RTX and GTX GPU’s are really in demand at the moment here in Australia (in April 2020 mid Covid-19 pandemic). I do see some bargains in local classifieds every now and then but they were snapped up within a few hours — Re: ebay for GPU’s I don’t bother, the prices people are asking for atm is more than new).
Make low offers — but within reason.
Give some background to your offers — rather than just texting “$250”, and even better speak to the owner. Once they realise you are a nice, friendly person they may be more willing to negotiate.