题 目：Large-scale FPGA implementations of Machine Learning Algorithms
报告人：Philip Leong, The University of Sydney
FPGA implementations of machine learning algorithms have been shown to be extremely efficient when the problem fits entirely on the FPGA but it remains a challenge to scale to problems of interest to industry. In this talk, our recent research on how to increase the capacity of existing approaches will be described.
In the first part of the talk we present an open-source matrix multiplication library for the Xeon+FPGA platform. This supports a number of different precisions down to binary, and additional support for deep learning applications is provided. Performance approaching the maximum of 14 Xeon cores and an Arria 10 FPGA was achieved. The second part of the talk will describe an implementation of the Fastfood algorithm for scaling up online kernel methods. By utilizing the theory of random projections, problems with 1000x larger input dimension and dictionary size can be solved. A systolic implementation that operates at 500 MHz clock frequency was achieved.
Philip Leong received the B.Sc., B.E. and Ph.D. degrees from the University of Sydney. In 1993 he was a consultant to ST Microelectronics in Milan, Italy working on advanced flash memory-based integrated circuit design. From 1997-2009 he was with the Chinese University of Hong Kong. He is currently Professor of Computer Systems in the School of Electrical and Information Engineering at the University of Sydney, Senior Visitor Scholar at Fudan University, Visiting Professor at Imperial College, Visiting Professor at Harbin Institute of Technology, and Chief Technology Advisor to ClusterTech.