What you will be doing
As a senior cloud system software engineer, extensively use your knowledge of operating systems, algorithms, and computer architecture to provide robust and efficient solutions for building the Nvidia GPU cloud infrastructure.
Benchmark and evaluate the performance of these sub-systems as well as different applications on different hardware components and identify the best suitable platform for the applications in focus.
Design, develop and debug system virtualization and hypervisor based solutions.
Design, prototype and implement a cost-effective system solution to support the Deep Learning, Game Streaming and Content Delivery services.
Develop code for building cloud platforms using software defined network, storage and other cloud infrastructure services.
Evaluate high-available components in the system design, and provide architectural solutions to maintain and scale services globally.
Subject matter expertise in linux kernel, systems and linux userspace development and debugging. Windows Kernel understanding is a plus.
Involved in different levels of software stack development including testing and delivery of product to different environments.
Evaluate and help manage and optimize costs for the deployed solutions.
Prepare and maintain up to date documentation detailing configuration of deployed solutions.
You design and implement distributed storage systems for gaming, deep learning and HPC workloads running in Nvidia cloud infrastructure.
You will help architect, develop, build, configure and operate distributed storage systems in the Nvidia Cloud infrastructure.
Contribute to open source storage system development for feature enhancement and bug fixes relevant to nvidia cloud infrastructure.
Interact with other cloud applications and solutions architects to design the solution for optimal and performant distributed storage systems.
Participate and lead efforts in building storage POCs for various application use cases working with consumers gathering requirements and building a solution satisfying the requirements.
Contributing to reliability enhancements and engineering of the storage systems in the cloud.
Prototype Software enhancements in various subsystems to demonstrate viability of the new architecture.
Actively participate in design and code reviews.
What we need to see :
BS Degree in Computer science / engineering. Master’s degree is a plus. Or equivalent experience.
Experience with software defined storage and understanding storage protocols like NFS, CIFS and ISCSI, block and object.
Strong background in design, implementation, and performance optimization of large scale distributed linux based systems.
Core understanding of storage system around filesystem, block and object storage use cases and distributed storage / caching technologies.
Integration of storage system and cloud virtualization technologies in the areas like Kubernetes, linux KVM and cloud orchestration.
Experience with fine tuning performance on storage cluster and other distributed storage systems.
Contribution to open source projects for features enhancements and bug fixes.
Ability to root cause functional and performance bottlenecks in storage and infrastructure and other distributed storage systems and drive to closure.
Experience with storage system deployments using ansible and other related automation.
Excellent programming and problem solving capabilities. Scripting and automation experience using python and go programming is a plus.
Ability to quickly learn and evaluate new technologies through prototype implementation.
Solid working knowledge of Unix based operating systems (file systems, networking and kernel internals) and windows operating system internals.
Experience with SDN, IP Networking, routing, switching protocols , open flow, openstack etc and implementing large scale network is a plus.
Outstanding communication and soft skill set, able to present to senior management in a sensible and persuasive manner.
Ability to influence and build relationships with other Software and IT functional groups such as server / storage / security teams.