Distributed Kernel Software Engineer
Cerebras is developing a radically new chip and system to dramatically accelerate deep learning applications. Our system runs training and inference workloads orders of magnitude faster than contemporary machines, fundamentally changing the way ML researchers work and pursue AI innovation.
We are innovating at every level of the stack – from chip, to microcode, to power delivery and cooling, to new algorithms and network architectures at the cutting edge of ML research. Our fully-integrated system delivers unprecedented performance because it is built from the ground up for the deep learning workload.
Cerebras is building a team of exceptional people to work together on big problems. Join us!
As a Kernel Software Engineer on our team, you will work with leaders from industry and academia at the intersection of hardware and software, to develop state-of-the-art solutions for emerging problems in AI compute.
Our team of kernel developers is responsible for the design, implementation, and performance tuning of deep learning operations on highly parallel custom processors. We are developing parallel and distributed algorithms to maximize hardware utilization and accelerate the training of deep neural networks to unprecedented speeds.
We’re looking for an engineer to design and implement optimized kernels for primitive operations used by state-of-the-art neural network architectures. You should apply if you are an engineer familiar with parallel and distributed architectures who can map various workloads to our high-performance hardware.
The role involves a mix of algorithm design, kernel implementation, and performance tuning. In particular, we are looking for candidates comfortable with performance analysis of parallel algorithms and low-level software optimization. You will also be responsible for understanding the latest deep learning algorithms in order to design kernel implementations.
Skills & Qualifications
- 5+ years of experience in kernel design, implementation, and optimization
- Bachelor’s / Master’s degree or foreign equivalent in Computer Science, Engineering, or related field
- Familiarity with parallel algorithms and distributed memory systems• Ability to read and write code using C and Python
- Experience with assembly-level programming and optimization
- Understanding of hardware architecture concepts — you should be comfortable learning the details of a new hardware architecture