In some work that could be considered a continuation of the architecture specific optimization analysis of GEM, Konstantinos Krommydas and I evaluate programmability performance tradeoffs across three architectures, an Intel CPU, Intel Xeon Phi, and an NVIDIA Kepler GPU. Some of the results were surprising, not the least of which being that when fully optimized the GPU core code ended up more readable than the highly optimized CPU code.
Balaji Subramaniam took point on our annual analysis of the Green500 this year, and reached out to Winston Saunders to include the Exascalar metric and draw some new conclusions based on the list.
The most recent installment of our annual analysis of the Green500 list is appearing in ISC this year instead of HPPAC. As we collect more data, we gain more and more insight not only into the progress made in green computing, but into the trends we’re tracking towards future goals. This paper investigates the track between today and the exascale goals set for 2018.
The final camera ready version of our paper “Heterogeneous Task Scheduling for Accelerated OpenMP” is finally in. This paper was a breaking point for me, and the first paper I felt like I really drove start to finish and am happy with. This work and the work to follow will build my thesis, interesting and fun work.
Our first publication discussing the OpenCL and the 13 Dwarfs benchmark suite, glad to have a tangible artifact from this now. Keep a look out for the official release of the benchmark sometime around June, 2012! Update: The official release has come! If you’re interested, go here for the code.
I’m rather fond of this work. It’s in direct opposition to the claims made in the original Mars paper that their two pass method was the only way to handle map reduce on GPUs which cannot use atomics. While StreamMR is compared against versions which can use atomics now, it can work on GPUs with or without them, and does not require a second pass.
The third Green500 paper to HPPAC. I was uninvolved in the second one, working on other projects. Coming back, between Wu, Balaji and I we found some interesting new ways to analyze the data and draw new conclusions from the list. This work covers more than any other Green500 review to date.
This paper explores the benefits of Hierarchical Charge Partitioning with multiple levels as applied to a CUDA version of the GEM application originally explored on GPU in Accelerating electrostatic surface potential calculation with multi-scale approximation on graphics processing units. The final speedup over serial without HCP is completely staggering, at tens of thousands of times faster.
This paper provides an overview of some of the architecture specific optimizations we have identified for AMD Radeon GPUs. Each is characterized in terms of the GEM GPU application described in Accelerating electrostatic surface potential calculation with multi-scale approximation on graphics processing units. While some of these are less necessary now, many of the optimizations can still be applied and will give benefits not only on AMD GPUs but a variety of other platforms as well.
My first foray into GPU research, at least in terms of publication. We opened a big can of worms with this paper, asking where some of these anomalies came from, and the future work explaining it never really happened. If nothing else, this paper serves as a reminder that just because we think we know how something works doesn’t mean we always know how it will behave
My first foray into GPU research, at least in terms of publication. We opened a big can of worms with this paper, asking where some of these anomalies came from, and the future work explaining it never really happened. If nothing else, this paper serves as a reminder that just because we think we know how something works doesn’t mean we always know how it will behave
A retrospective on the first year of the Green500. I have done a few of these now, and am in the interesting position of being the only student member of the team who has been around since the first release.
My first foray into GPU research, at least in terms of publication. We opened a big can of worms with this paper, asking where some of these anomalies came from, and the future work explaining it never really happened. If nothing else, this paper serves as a reminder that just because we think we know how something works doesn’t mean we always know how it will behave
My first publication, and first presentation at a conference. This work spawned off stories on GCN (Government Computer News) and Slashdot to our great surprise. Evidently it was quite a surprise to some that the cores in a multicore system wouldn’t all behave the same.