Research Systems Administrator

  • Cambridge
  • Darktrace
Founded by mathematicians and cyber defense experts in 2013, Darktrace is a global leader in cyber security AI, delivering complete AI-powered solutions in its mission to free the world of cyber disruption. We protect more than 8,400 customers from the world’s most complex threats, including ransomware, cloud, and SaaS attacks.Our roots lie deep in innovation. The Darktrace AI Research Centre based in Cambridge, UK, has conducted research establishing new thresholds in cyber security, with technology innovations backed by over 130 patents and pending applications. The company’s European R&D center is located in The Hague, Netherlands.Headquartered in Cambridge, UK, Darktrace has more than 2,400 employees located globally. Customers include public sector agencies, education institutions, media, organizations supplying critical infrastructure, and businesses of all sizes worldwide.This is an excellent opportunity to join a fast-growing company, named one of TIME magazine’s “Most Influential Companies” for 2021 and one of Fast Company’s “Most Innovative AI Companies” in 2022. For more information on our cutting-edge technology, visit .You will join the dedicated R&D teams in Cambridge that create and improve the products behind the company’s rapid growth. Our teams work on a wide variety of projects and with a diverse tool set. As the Research System Administrator, you will manage the NVIDIA GPU server environment, as well as maintain and optimize the software environment for our machine learning projects.This is a hybrid role, and the expectation would be to work at least 2 days a week in the Cambridge office.To find out more about our world-class products and wider business, please consult our website As a System Administrator, you will be responsible for setting up, configuring, and maintaining the servers and software stack, ensuring their optimal performance and availability for our hardware engineers and researchers working on AI and HPC projects. You will be responsible for but not limited to: Maintaining and optimizing the Linux operating system, file systems, and software stack (Cuda, Pytorch, Python etc.) for machine learning projects,Set up and configuration of NVIDIA HGX servers, including installing, monitoring and updating software, managing user access, and ensuring optimal performance,Implementing and maintaining server security, including patch management, vulnerability scanning, and intrusion detection,Collaborating with network administrators, hardware engineers, and researchers to troubleshoot and resolve server and software-related issues,Collaborating with data scientists and machine learning engineers to understand their software requirements and provide guidance on best practices, Candidate requirements:We welcome applications from engineers who have a solution focused mindset combined with an analytical approach and problem-solving skills. During the interview process you’ll be able to demonstrate your ability and familiarization of AI and HPC provision and management. Additionally, it’s likely that you will: Demonstrate experience in system administration, preferably with a focus on high-performance computing platforms, GPU-based servers, and machine learning software environments. Show a low-level understanding of server virtualization technologies and containerization.Have experience using Linux operating systems Desired: Strong knowledge of NVIDIA HGX server architectures and components,Experience with NVIDIA GPU technologies, such as NVLink, NVSwitch, and Tensor Core GPUs,Experience with machine learning frameworks and libraries, such as PyTorch and associated system optimization, Benefits: 23 days holiday + all public holidays. Increasing to 25 days after 2 years of service.Additional day off for your birthday.Private medical insurance.Life insurance.Pension – 4% employer contribution.Enhanced family leave.Confidential employee Support.Cycle to work scheme #LI-Hybrid