But Mummy I don't want to use CUDA - Open source GPU compute
Video Statistics and Information
Channel: linux.conf.au
Views: 81,667
Rating: undefined out of 5
Keywords: lca, lca2019, #linux.conf.au#linux#foss#opensource, DaveAirlie
Id: ZTq8wKnVUZ8
Channel Id: undefined
Length: 43min 11sec (2591 seconds)
Published: Fri Jan 25 2019
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.
I saw the whole talk, pretty good! There are so many compute stacks and OpenCL implementations out there, it's crazy - every driver implementing things in a different way, having their own unique set of bugs etc.
His idea is good, and it would be great if vendors actually worked on such a common implementation, atleast for OpenCL - I doubt it'll happen though. Most likely it's up to community contributors and those few paid employees like him to do this work.
I was wondering why Intel Beignet wasn't mentioned at all. It had good (I think) OpenCL 2.0 support several years ago already, worked well on my Ivy Bridge laptop and was completely open source. Of course it had its own llvm/clang fork...
TIL it has been deprecated a year ago
With intel releasing a discrete GPU I wonder what they'll be doing in this space.
As a GPGPU user (HPC developer), I have the utmost respect for the work Airlie is doing and I wish him to succeed in this unification of the open source implementations. There's however a couple things I would like to highlight:
single source is a bait and switch; it's very good for prototyping, but when it comes to hand-coded stuff most major projects find themselves moving away from it sooner or later, because of the loss in flexibility: there's a reason even NVIDIA has been experimenting with online compilation (NVRTC) since version 7 (2015) and has officially supported it since version 8; it would be better if priority was given to having a robust, complete implementation of OpenCL (possibly 2.x) rather than SYCL (that builds on top of it anyway);
tooling is essential; one of the biggest advantage CUDA has over the competition is its profilers and debugger (that used to support OpenCL on their hardware, but now doesn't anymore); to get anywhere close to be as appealing as an alternative, Mesa should provide hooks to allow similar tools to be built (and thus way to enumerate and collect all performance counters available for each supported device, and their evolution across kernel execution, as well as the possibility βon supporting hardwareβ to preempt execution to step through functions); if I'm not mistaken, similar things have been made (relatively recently, and largely thanks to Valve involvement) for OpenGL (and possibly Vulkan?); exposing them for OpenCL (and thus ultimately SYCL as well) too would be a massive boon;
finally, the ecosystem; most developers don't bother with any of these compute API directly; they rely on higher-level libraries (like the mentioned cuDNN, cuBLAS, or thrust) that allow leveraging of the GPU computational power without any knowledge of the hairy details; the hardest part to break NVIDIA's stranglehold on HPC will be getting FLOSS-friendly companies to cooperate on such libraries.
The talk is a newer version of this: https://www.youtube.com/watch?v=d94N2Lu4x9s
A common implementation has less value than it seems since most likely you need different algorithms for different GPUs anyway.
The whole point of GPUs is acceleration, so performance should always come before portability.
Nvidia CUDA is unique because it has more computational function than standard OpenCL giving CUDA ability to compute some CPU workload, rather than passing to CPU then back to GPU in some applications. In world before 2017, where CPU power was expensive, it was good choice to using CUDA.
But In current CPU development, which AMD is pushing forward CPU raw power with delivering efficient cost, giving an more advantages implementing both CPU and GPU. Thus using OpenCL is more advantageous for using CPU + GPU mode. A 32 core CPU + 30 TFlops (or more) GPU with high memory cache is now possible within customer price hardware. In future, the PCI gen 4 will be available start with customer-grade product. For majority customer PCI gen 4 advantages is still unusable, but in OpenCL it will reduce much issue relating to CPU + GPU Latency in OpenCL applications.
If Nvidia didn't see that's comming and keeping their "fancy" card price going Up, CUDA will be falling behind than OpenCL. The advantages of using CUDA is now irrelevant and soon will be impact to CUDA future development and implementation. Their marketing Ray-tracing is one of their effort to make their GPU still relevant. But that in visualization computation only, the place where only Nvidia has more advantages.
Thanks to AMD, OpenCL golden era was starting also for Open-source community. The place where CPU + GPU is better choice and for future development. "If you can't break trough their strength, break trough their weakness". Radeon VII is solid professional Card in customer price, perfect for OpenCL development. It crush any Nvidia offer in price, bandwidth cache, and memory.
We use compute shaders exactly because of the suffering OpenCL could cause upon one's head. OpenCL was a pain (in terms of configuring and making it work across different vendors).
Does anybody else use OpenGL compute shaders? It seems they are pretty unpopular and not used anywhere but withing game engines. Are they so much worse than CUDA?