Parallel/Distributed Algorithms Engineer
Cerebras is developing a radically new chip and system to dramatically accelerate deep learning applications. Our system runs training and inference workloads orders of magnitude faster than contemporary machines, fundamentally changing the way ML researchers work and pursue AI innovation.
We are innovating at every level of the stack – from chip, to microcode, to power delivery and cooling, to new algorithms and network architectures at the cutting edge of ML research. Our fully-integrated system delivers unprecedented performance because it is built from the ground up for deep learning workloads.
Cerebras is building a team of exceptional people to work together on big problems. Join us!
- Create high-performance linear-algebra and machine-learning kernels for custom processors.
- Design and implement parallel algorithms on a distributed hardware architecture.
- Tune and optimize low-level assembly code within significant constraints of highly-optimized high performance hardware.
- Understand the tradeoffs of performance, compute, and memory and simultaneously optimize for all three.
Skills & Qualifications
- Bachelor’s / Master’s degree or foreign equivalent in Computer Science, Engineering, or related field.
- Five or more years of related work experience.
- High performance parallel programing experience.
- Strong knowledge of computer architecture fundamentals.
- Your work should display a real passion for low-level system details down to the assembly level.
- You have programming fluency and extensive experience working in C or C++ and Assembly languages
- MS or PhD
- Prior work on a HPC, parallel computation, or dynamically optimizing system
- Demonstrated architectural work at the hardware/software boundary
Our cozy and well-appointed headquarters are in the heart of Silicon Valley near downtown Los Altos, California.
Our beautiful San Diego offices overlook views of the Sorrento Valley canyon.