C

Senior Member of Technical Staff, Multimodal AI

Cohere
over 1 year ago
Full-time
Remote
Worldwide
Remote Engineering
Who are we?

Our mission is to scale intelligence to serve humanity. We’re training and deploying frontier models for developers and enterprises who are building AI systems to power magical experiences like content generation, semantic search, RAG, and agents. We believe that our work is instrumental to the widespread adoption of AI.

We obsess over what we build. Each one of us is responsible for contributing to increasing the capabilities of our models and the value they drive for our customers. We like to work hard and move fast to do what’s best for our customers.

Cohere is a team of researchers, engineers, designers, and more, who are passionate about their craft. Each person is one of the best in the world at what they do. We believe that a diverse range of perspectives is a requirement for building great products.

Join us on our mission and shape the future!

Why this role?

At Cohere, we believe in the power of multimodal AI to revolutionise the way we interact with technology. Our engineering teams push the boundaries of what's possible, and we're looking for talented individuals to join us on this exciting journey. With an exceptional ratio of compute resources to engineers, we provide an ideal environment for you to explore, innovate and shape the future of AI.

July 31st 2025 - Cohere's Multimodal team Introduced Command A Vision: Multimodal AI Built for Business. At release our new flagship vision-language model:
● Consistently outperforms major models like Llama 4 Maverick, Mistral Medium/Pixtral Large, and GPT4.1
● 83.1% average benchmark (73.5% MathVista, 90.9% ChartQA...)
● Built for the real world - 112B parameters running on just 2 GPUs
● Open weights live on HuggingFace

With a focused team, breakthrough performance doesn't require breakthrough compute. Focus on the things that matter, and join the team.

As a Member of Technical Staff with a focus on Multimodal AI, you will:

- Design and develop cutting-edge multimodal AI systems, integrating various modalities such as text, speech, and vision.

- Conduct research and experiments on our advanced compute infrastructure, exploring novel ideas in multimodal representation learning, transfer learning, and more.

- Collaborate closely with our world-class teams, learning from and contributing to their expertise in the field.

You are an ideal candidate if you:

- Possess exceptional software engineering skills, with a proven track record of building robust and scalable systems.

- Have a strong command of Python and are well-versed in popular deep learning frameworks like JAX, PyTorch, and TensorFlow, with an understanding of their multimodal capabilities.

- Knowledge of distributed training strategies, especially for large-scale multimodal models.

- Familiarity with autoregressive models, particularly their application in multimodal tasks such as image or video captioning, speech-to-text generation.

- Bonus: Publications in top-tier venues demonstrating your experti