As many of you know, we have been working with AMD & Board Partners on Linux & IOMMU groupings. We pulled our Threadripper 1950X CPU today for routine maintenance and discovered some fairly severe pad discoloration that we are still investigating.
The system this TR cpu was in has been in operation continuously the last month or so, but it has not been under constant load. We have completed a huge number of Linux benchmarking tasks -- compiling the kernel (many times, with many patches), running workstation stuff, rebooting and testing IOMMU/Xen/KVM, but nothing really out of the ordinary. At least 70% of the time the last month the system has been doing this. Though the system has been on 24/7 it has not been heavily loaded for all this time. When this machine was in "Linux Mode" it was at stock speeds and voltages, with the exception of a DDR4-3200 XMP memory profile.
The other 30% of the time we have completed a number of tests of the platform in Windows, including Gaming tests and overclocking tests. To the best of our knowledge, we have never taken this particular CPU past 1.375 volts @ 4.175 GHz and DDR-3200. These overclocks were done with Ryzen Master in Windows for only limited periods of time. We did do 24h of burn-in testing of prime 95 at stock speeds and voltages, however.
Aside from some errant, but apparently harmless, corrected PCIe bus errors that the Linux kernel reports (that also only occurs when performing I/O operations through the promontory chipset) the system had been completely stable before we pulled the CPU for inspection.
This article will be updated shortly with some additional images from our USB Microscope. We are currently working with board partners as maybe an early UEFI or overly agressive voltage profile could be the issue that has since been corrected. We will also take a close look at our 1920X Threadripper CPU because we have been much more cavalier with that CPU in terms of voltages, overclocks and burn-in tests. Watch this space for updates.