# **JOSEPH LEE GREATHOUSE**

Austin, TX • joseph.l.greathouse@gmail.com

#### Experience

#### **Advanced Micro Devices, Inc.** Fellow Aug. 2012 – Present Software architect for AMD's Instinct GPUs, handling the design of HW, FW, and SW interactions Co-designed HW/SW mechanisms for kernel dispatch, GPU cache coherence, virtual memory optimizations, performance monitoring mechanisms, DMA engine design, and RAS mechanisms Drove requirements gathering from SW teams to create dozens of new HW architectural features • Led debug, workaround development, and customer communication for multiple post-silicon issues • Created and led both internal and customer training about AMD accelerators, including performance • optimization, and deep-dives on microarchitecture, coherence, and memory management Performance engineer responsible for optimizing SW, HW, and FW for GPU compute solutions Architected and implemented multiple GPGPU software features, including HIP Cooperative Groups Designed, implemented, and published leading GPGPU algorithms for HPC math libraries, including: • Sparse matrix-vector multiplication algorithm that is up to 36% faster than previous state-of-the-art Sparse triangular solve algorithm that is 34% faster than industrial competition • Previously researched topics in performance and power monitoring and management in AMD Research Led multiple Dept. of Energy contract projects; delivered research reports worth millions of dollars • Technical lead for a team of 10 engineers and multiple interns, focusing on HW/SW interaction topics • • Created a new simulator for AMD's exascale program based on hardware performance monitoring • Awarded 24 US patents; 11 patent submissions pending; 25 conference and 7 workshop publications **University of Michigan Research Assistant** May 2007 – Aug. 2012 Identified methods of distributing security and correctness analyses to many users to reduce slowdown • Managed graduate and undergraduate students through the development of prototype systems **University of Michigan Teaching Assistant** Jan. 2012 – Apr. 2012 • Led discussions and evaluated projects for graduate level parallel computer architecture course **Research Contractor** Kelly Services / Intel Corp. May 2010 – Oct. 2010 Researched HW & SW approaches for improving the speed of the Intel Inspector XE data race detector **International Business Machines Corp.** Speed Team Intern May 2008 – Aug. 2008 Designed and built an InfiniBand verification suite that caught multiple bugs in IBM PowerVM firmware **University of Illinois Teaching Assistant** Jan. 2005 - Aug. 2006 Taught discussion sections and graded for undergraduate computer architecture and digital logic courses Education

| University of Michigan, Ann Arbor                       |                           |
|---------------------------------------------------------|---------------------------|
| Ph.D. Computer Science and Engineering                  | May 2012                  |
| Advisor: Prof. Todd Austin                              |                           |
| Dissertation topic: Hardware Mechanisms for Distributed | Dynamic Software Analysis |
| University of Michigan, Ann Arbor                       |                           |
| M.S.E. Computer Science and Engineering                 | May 2008                  |
| University of Illinois at Urbana-Champaign              |                           |
| B.S. Computer Engineering with Honors                   | May 2006                  |
| Minor: International Engineering – Japanese             |                           |

## **Selected Publications**

Raghavendra Pradyumna Pothukuchi, **Joseph L. Greathouse**, Karthik Rao, Christopher Erb, Leonardo Piga, Petros Voulgaris, Josep Torrellas, "Tangram: Integrated Control of Heterogeneous Computers," in the Proceedings of the 52<sup>nd</sup> IEEE/ACM International Symposium on Microarchitecture (MICRO-52), October, 2019

Arkaprava Basu, **Joseph L. Greathouse**, Guru Venkataramani, Ján Veselý, "Interference from GPU System Service Requests," in the Proceedings of the 2018 IEEE International Symposium on Workload Characterization (IISWC), September, 2018 – Nominated for Best Paper

Vignesh Adhinarayanan, Indrani Paul, **Joseph L. Greathouse**, Wei Huang, Ashutosh Pattnaik, Wu-chun Feng, "Measuring and Modeling On-Chip Interconnect Power on Real Hardware," in the Proceedings of the 2016 IEEE International Symposium on Workload Characterization (IISWC), September, 2016 – Awarded Best Paper

Gene Wu, **Joseph L. Greathouse**, Alexander Lyashevsky, Nuwan Jayasena, Derek Chiou, "GPGPU Performance and Power Estimation Using Machine Learning," in the Proceedings of the 21<sup>st</sup> IEEE Symposium on High Performance Computer Architecture (HPCA), February, 2015

**Joseph L. Greathouse**, Mayank Daga, "Efficient Sparse Matrix-Vector Multiplication on GPUs using the CSR Storage Format," in the Proceedings of the Int'l Conf. on High Performance Computing, Networking, Storage and Analysis (SC), November, 2014

Bo Su, **Joseph L. Greathouse**, Junli Gu, Michael Boyer, Li Shen, Zhiying Wang, "Implementing a Leading Loads Performance Predictor on Commodity Processors," in the Proceedings of the 2014 USENIX Annual Technical Conference (USENIX ATC 2014), June, 2014

**Joseph L. Greathouse**, Zhiqiang Ma, Matthew I. Frank, Ramesh Peri, Todd Austin, "Demand-Driven Software Race Detection using Hardware Performance Counters," in the Proceedings of the 38th Annual International Symposium on Computer Architecture (ISCA 2011), June, 2011

### **Computer Languages and Software Qualifications**

#### **Programming Languages**

- C, C++, HIP, CUDA, OpenCL, x86 assembly, AMD GCN, CDNA, and RDNA assembly, Python **Software Systems** 
  - Linux kernel, multiple AMD-internal simulation, firmware, and analysis tools

#### **Software Projects**

#### **AMD Matrix Instruction Calculator**

- Released tool to interactively detail the instructions for AMD GPUs' Matrix Cores and AI Accelerators
- Available at https://github.com/ROCm/amd\_matrix\_instruction\_calculator

#### AMD Research Instruction Based Sampling Toolkit

• Released driver to allow easy access to IBS, AMD's low-level CPU performance monitoring hardware

#### clARMOR – An OpenCL Kernel Buffer Overflow Detector

• Archived at https://github.com/ROCm/clARMOR

### Honors, Associations and Activities

Association for Computing Machinery, Sr. Member2016 IISWC Best Paper AwardInstitute of Electrical and Electronics Engineers, Sr. Member2011 CGO Best Student Presentation AwardEta Kappa Nu Electrical & Computer Eng. Honor SocietyTau Beta Pi Engineering Honor SocietyProgram committee: ASPLOS (2025), MICRO (2022), IISWC (2020), ICPP (2020), ISPASS (2015), HPPAC(2015-2018)

External reviewer: ASPLOS (2012, 2013), CODES (2011), DATE (2008-2012), FMCAD (2010), HPCA (2009, 2010, 2012-2014), ISCA (2009, 2010, 2012), MICRO (2008, 2009, 2011, 2012-2014, 2017, 2020), PACT (2012), and SRCS (2013), IEEE CAL (2015-2017, 2019, 2022), IEEE TPDS (2017), IEEE TCAD (2017, 2018), IEEE TMSCS (2018), SC (2017), and MDPI Computation (2018, 2020); Judge at SRC TechCon (2015)