New vs used deep-learning machine builds, Part 2.
In Part 1 I gave an overview of how I came about building two machines for deep learning use. In Part 2 of this series I will provide more detail on a ‘used’ Xeon E5 machine build and performance comparison to a newer i7–7820X build. While the most important part of a deep learning machine build is arguably the GPU(s), spending some time researching a solid platform to plug your GPU into could save you some regrets down the track.
The case I am using to house a Xeon E5–2680 V2 LGA2011 socket based deep learning machine is a Cooler Master HAF 932 that I sourced for free. I did have some reservations about the lack of noise deadening in the HAF 932 but as it was free I was happy to experiment. The case could fit a ASROCK EP2C602–4L/D16 SSI EEB form factor motherboard OK, however I needed to add one hole and standoff for the top-centre motherboard mount (this compares to my Dark Base 900 case which would have required 5 new 3mm holes drilled into the motherboard tray, drive position rotation and 3 custom built standoff supports). I used carefully lined up the motherboard with the current holes in the motherboard tray and used a pencil to carefully mark the hole position. I used a 3.0 mm drill bit then used a standoff to self-thread a hole (the cheapest thread tapping kit at the local hardware store was $65). The self-tap worked fine.
To physically separate two GPU’s I wanted to mount a PCI-E support bracket below the motherboard. A great feature of this Cooler Master case is the ability to mount a PSU either above or below the motherboard.
I used some aluminium angle, the rear cover of an old PSU and a metal 5.25" drive mount to build a PCI-E support and rear housing and bolted on a Thermaltake x16 20cm PCI-E riser. With the PSU at the top of the case I had room for 2x 120mm fans below the bottom GPU.
I also used the rear of an old PC case to create a 120mm to 90mm adapter mount and attached a 90mm fan (exhaust mode) at the top of the case as the length of the Corasir PSU wouldn’t allow 2x120mm fans at the top of the machine.
With the build mostly done I then made some performance comparisons between the 2x GPU Xeon E5 build vs the 1x GPU i7–7820X build. Note that I swapped over to using a Fractal Design R6 case for the i7–7820X build.
Boot time:
Xeon 1:17 seconds
i7–7820X 35 seconds.
(ASROCK motherboard seems to do a lot more checking during boot)
CPU performance
sysbench — test=cpu — cpu-max-prime=20000 run:
Xeon 30.05 seconds
i7–7820X 19.692 seconds.
sysbench — test=cpu — cpu-max-prime=100000 — num-threads=16 run:
Xeon 17.5 seconds
i7–7820X 12.72 seconds.
Python ProcessPoolExecutor to Tokenize strings with 1/2 available cores (same code/data on both machines):
Xeon CPU times: user 208 ms, sys: 124 ms, total: 332 ms
i7–7820X CPU times: user 196 ms, sys: 76.2 ms, total: 272 ms.
NVME vs SSD
Open, resize, cconvert and save 5088 Kaggle Carvana Competition car images and masks (both on the i7–7820X R6 build):
Samsung 960 Evo NVME: 30.01 seconds
Samsung SSD: 34.15 seconds
Load weights, train Resnet34 on Carvana dataset for 24 cycles on 1080ti GPU and save weights:
Data on NVME: 202.81
Data on SSD: 201.84
ie not going to make any difference in training time on GPU where just reading weights at start and writing out weights at end.
Hard drive metrics with hdparm -Tt
Samsung SSD:
Timing cached reads: 16932 MB in 1.99 seconds = 8487.55 MB/sec
Timing buffered disk reads: 1350 MB in 3.00 seconds = 449.46 MB/sec
Samsung NVME:
Timing cached reads: 18344 MB in 2.00 seconds = 9191.58 MB/sec
Timing buffered disk reads: 6008 MB in 3.00 seconds = 2002.53 MB/sec
Temperature
GTX 1080 Ti at idle (display GPU):
Xeon build: 41 DegC vs i7–7820X build: 45 degC
GTX 1080 Ti at 90% load(display GPU):
Xeon build: 72 DegC (upper GTX 1080 GPU idle at 40 degC) vs i7–7820X build: 72 degC
GTX 1080 Ti at 90% load(display GPU) with GTX 1080 GPU at 90% load
Xeon build: GPU 0 72 DegC, GPU 1 (upper) 78 DegC
Noise:
I made a comparison with an android sound meter of questionable accuracy. HAF 932 case (Xeon build): 30 dB at 1m. Moderate fan noise, enough to be a little distracting and I am looking at bringing forward my plans for watercooling the build and housing it in a be quiet Darkbase 900 case that I have.
Fractal Design R6 case (i7–7820X build): 26 dB at 1m. Quiet fan noise and subtle pump (Kraken X52) noise. Update: I have since swapped out the Kraken X-52 for a Noctua NH-U12S CPU cooler which has allowed me to install a couple of fans at the top of the case. CPU temps are actually cooler and the machine is also even quieter.
Price without GPU’s (all in AUD ~USD * 1.3):
Xeon build:
Xeon E5–2680 V2: $220 (Used) Ebay.
Noctua NH-U12 DX i4: (New) $99.
ASROCK EP2C602–4L/D16: $429 (New) Amazon.
Corsair RM1000i: $269 (~New — originally used in i7–7820X build).
120mm Cooler Master fans x 2 $0 (from old cases)
120mm Noctua fan x 1 $21 (New).
90mm Noctua fan x 1 $5 (Used)
230mm Cooler Master fans x 2 $0 (3 came with free case).
Aluminium Angle, 3mm bolts and nuts: $20
40GB DDR3 RAM: $0 (Used — old builds)
240 GB SSD: $85 (New)
128 GB SSD: $0 (Used — old build)
1x2GB HD: $0 (Used — old build)
Thermaltake 10 PWM Fan Controller: $25 (New)
Total: $1088
i7–7820X build:
Fractal Design R6 Case: $209 (New)
i7–7820X: $769 (New)
NZXT Kraken X52 CPU Cooler: $229 (New)
MSI Tomahawk X299: $449 (New)
32 GB Corsair DDR4 RAM: $599 (New)
XFX 650W PSU: $0 (Used — old build)
1TB GB SSD: $520 (New)
2TB GB HD: $0 (Used — old build)
2TB GB HD: $125 (New)
Total: $2900
While you get 30% better CPU single thread performance, USB 3.0 (Yes the ASROCK EP2C602 only has 2xUSB 2.0 at the back and 2x at the front) and a more modern architecture with the LGA2066 socket, the Xeon E5 build has some great advantages: nearly 1/3 the price, an old case that I don’t mind modding, a option to get another CPU and 3x x16/1x x8 GPU support.
There are still a couple of things for me to do: add another 120mm fan at the base of the case, swap the 230mm Cooler Mater fan in the side of the case with 4x 120mm fans to give better airflow to GPU 1, and install a Thermaltake multi fan controller in a 5.25" slot so that I can dial the fans down when I am working.
Recommendations
Note I am just recommending Intel rather than AMD as I have no experience with AMD, and only Intel platforms I am familiar with :-)
I’m new to deep learning and on a budget:
Buy a used machine with an i5 or i7 CPU (check the single thread speed and number of cores here — make sure it’s not a slug, anything over 1500 single thread rating will be fast enough so as not to be annoying)
Make sure the machine has 16GB ram (ideally 32GB) and a >500W PSU (want to have enough supply for your GPU), and that the case is wide enough to fit a GPU (eg get a ‘gamer’ rig). You can get something decent for $500 AUD (<$400 USD) if you look around.
Buy a new or used GTX 1070/1070ti/1080 if you can afford it. Id recommend one of these over a 1060 if you can, that extra 2GB GPU RAM will allow you to do less tweaking of batch sizes when training.
I want a new really fast machine with one or two GPU’s:
Get a i7–8700K LGA1151 socket CPU or an i7/i9 X series LGA2066 socket. If you can I would (and did) go with the LGA2066 socket as you get more cores and PCIe lanes.
I want a multi-thread, multi GPU rig at a reasonable price:
Go with a used Xeon E5–2670 V1/V2 through E5–2690 V2 based build as documented above. Get some used 1070/1070ti/1080/1080ti’s and either: live with the turbo style noise; buy AIO waterblocked GPU’s with a big enough case to fit the radiators; try to aircool the GPU’s with as many fans as possible (as I have done); or build your own water-cooling loop.
I want a multi-thread, multi GPU rig and I don’t mind spending a fair bit on it:
Get an ASUS Z10PE-D16 WS and a couple of Xeon E5–2690 V4’s (Not the ES ones though), fill those PCIe slots with GTX1080ti’s with water blocks and a custom water-cooling loop.