ISPD 2020 Contest: Wafer-Scale Deep Learning Accelerator Placement

Contest

cerebras | December 12, 2019

The ACM International Symposium on Physical Design (ISPD) in collaboration with Cerebras Systems organized the the 2020 ISPD contest in the area of placement for a wafer-scale deep learning accelerator. The contest ran from February-April 2020.

The placement problem for a wafer-scale engine is similar to traditional EDA placement tools but with the added twist of optimizing for an entirely new set of performance metrics. The goal of the contest is to encourage academic research on this new class of place and route problems for wafer scale deep learning accelerators.

ISPD Special Session on June 18th, 2020

The winners of the contest were announced during an ISPD special session that was attended by 56 people from all over the world. A total of 10 teams competed in the contest: 2 from the USA, 3 from Taiwan, 2 from Hong Hong, 2 from China and 1 from Korea. Congratulations to the winners of the contest:

  • First place: Team RipplewaferBentian Jiang, Jingsong Chen, Jinwei Liu, Xiaopeng Zhang, Fangzhou Wang, Lixin Liu, Evangeline F.Y. Young from the Chinese University of Hong Kong.
  • Second place: Team Gigaplacer Benzheng Li, Qi Du, Dingcheng Liu, Jingchong Zhang, Xidian University, Gengjie Chen from Giga Design Automation.
  • Third place: Team Cupid Zhuolun He, Peiyu Liao, Ran Chen, Siting Liu, Yuzhe Ma, Bei Yu, from the Chinese University of Hong Kong.

The recording of the ISPD special session can be viewed here.
The slides of the session can be downloaded here (Note: the file is large!)

Due to internet connection issues the presentation of team Gigaplacer did not work out. You can watch a video of their presentation here.

Benchmark Description

A massive wafer-scale chip for Machine Learning training (code-named CS-1) is operational, containing over a trillion transistors. The engine is organized as a homogenous multiprocessor grid of 633×633 processing tiles. Each tile has a programmable execution core, memory, and a router for interconnection with other tiles. A typical deep learning model specifies tens of billions of arithmetic operations to be repeatedly evaluated. A compiled model for CS-1 must coordinate which arithmetic operation is performed on each core as well as the communication path to deliver the result of one operation to all dependent operations.

The purpose of this challenge is to develop algorithms and heuristics that produce compiled neural models that run with the highest possible performance. It is necessary to specify both how the operations will execute and to specify the location of the operations on the wafer engine.

Model operations are grouped into kernels such as matrix multiplication and image convolution. Formally, a kernel is a parametric program that performs specific tensor operations. For instance, a convolution kernel may perform three convolutions corresponding to the forward, backward, and weight update passes of a neural network.

Kernels have formal arguments that specify the shapes of the tensor operations to be performed. For the convolution example, formal arguments are: H, W, R, S, C, K. These specify the size of the input image (H, W); the size of the convolution’s receptive field (R, S); the number of input and output features (C, K). The values of formal arguments are provided and cannot be changed by the compilation procedure.

Kernels also have execution arguments that describe how the operation is parallelized across tiles. For example, the convolution kernel’s execution arguments split the various dimensions of its tensor operations independently, (h’, w’, c’, k’). The values of the execution arguments are variables for the optimization procedure.

A resource cuboid is a 4-tuple (height, width, time, memory) that describes the array of tiles necessary to execute a kernel. Each kernel provides a performance function that maps from its arguments to resource a cuboid.

The convolution kernel’s performance function is given as follows:

convperf(H, W, R, S, C, K; h’, w’, c’, k’) = {
  height = h’*w’*(k’+1)
  width  = 3*c’
  time   = ceil(H/h’) * ceil(W/w’) * ceil(C/c’) * ceil(K/k’) * R * S
  memory = C/c’ * K/k’ * R * S + (W+S-1)/w’ * (H+R-1)/h’ * K/k’
 } 

Each benchmark challenge includes a kernel graph and a kernel library. The graph lists all kernels with their formal arguments as nodes. Data-dependencies between kernels are arcs in this graph. The library is a list of performance functions that can be invoked for each kernel, and the domains for all execution arguments. A solution to the challenge specifies the following values for each kernel in the kernel graph:

  • Execution arguments
  • The x,y coordinates of kernel’s origin
  • Orientation of the kernel (as an isometry in D8)

Evaluation

Solutions will be evaluated against meeting the following constraints:

Legality constraints

  1. All kernels fit within the 633×633 area of the wafer scale engine
  2. No kernels overlap
  3. No kernel’s performance cuboid exceeds the tile’s memory limit of 48 kB
  4. Upper bound of runtime (example: < 8hrs)

Optimality objectives (scoring function)

  1. Minimize maximum compute time (… DeltaT … )
  2. Minimize L1 distance between communicating kernels
  3. Minimize runtime

Solutions are scored by the maximum time of any kernel within the kernel graph, plus an added weighted L1 communication penalty based on the distance from the centers of connected kernels in the kernel graph.

The details of the metrics and scoring system will be released on contest website.

Example Benchmark

Interested in more details? An initial example benchmark is described at this link. (note: updated February 10th to correct small error)

Contest Dates

  • To officially participate, contestants must register by Jan 27, 2020.  See Registration below. Registration has been closed.
  • Initial placement benchmarks is available at this link
  • More testing benchmarks will be made available before March 2nd, 2020
  • Teams must submit final source code by April 24th (Note: this is updated to allow extra time).
  • The contest winners will be announced in May, prizes will be handed out during the ISPD symposium (likely September 2020).

Registration

Registration is closed at this time.

For questions or comments you can contact the organizers at: ispd2020contest@cerebras.net

Prizes

Prizes will be handed out at the ISPD symposium.