Storage Systems Engineer

Job NameStorage Systems Engineer
Department4111130 - F IT ARS Infrastructure
Job ID2844
Job CodeSYS ADM 4 TX (006375)
IAPTier C Plan (target potential payout of 3.5%, maximum of 5%)
Bargaining UnitTX
Job FamilyInformation Technology
OrganizationUCSF Campus BU
Primary LocationUnited States
Detail URLhttps://careers.ucsf.edu/careers/JobDetail/United-States/1952

Job Description

Job Description:
JOB SUMMARY This position is primarily responsible for architecture, implementation, and lifecycle management for the Facility for Advanced Computing (FAC), storage and systems, including support for large storage environments, NSF-funded infrastructure, and OS Nexus–aligned data platforms. The role ensures seamless integration between storage systems and the CoreHPC compute cluster, enabling performant, reliable, and scalable data access for AI, data science, and computational research workloads. The Storage Systems Engineer will: Work with the lead to continue supporting the design and evolution of storage architecture across on-prem and hybrid environments, including VAST, parallel filesystems, and enterprise storage platforms Develop and maintain data movement strategies and tooling (e.g., rsync, rclone, Globus, SMB workflows) to support large-scale data ingestion, migration, and lifecycle management Ensure tight integration between storage and HPC compute systems, optimizing throughput, latency, and reliability for distributed workloads Support and scale storage systems backing major institutional initiatives (FAC storage, OS Nexus integration) Collaborate closely with DevOps, networking, and security teams to deliver cohesive research infrastructure solutions Design and implement monitoring, performance tuning, and capacity planning strategies for storage and data systems Troubleshoot complex issues across storage, networking, and compute boundaries Participate in system upgrades, migrations, and expansion efforts with minimal disruption to researchers Provide guidance to researchers on data organization, transfer strategies, and performance optimization Evaluate and recommend emerging storage technologies and architectures This role may lead storage-focused projects and contribute to cross-functional initiatives that improve the scalability, usability, and reliability of UCSF’s research computing ecosystem. Department Overview Academic Research Systems (ARS) serves the needs of the UCSF research community by providing an integrated repository of HIPAA compliant clinical and life sciences data and a centralized, secure, professionally managed infrastructure for the storage and management of research data. ARS empowers medical scientific investigations by offering secure computing environments, data capture, management and analysis tools, and support services which meet researchers’ needs. The Research Infrastructure team of the Academic Research Service (ARS) focuses on large scale research platform support, high performance computational and storage services for UCSF researchers so they can address complex computational, AI,  and data science problems.

Qualifications:
REQUIRED QUALIFICATIONS - Bachelor's degree in related area such as compuer science or engineering, and 6+ years of experience with storage infrastructure support and management * or* 10+ years of related experience with large scale storage systems - Demonstrated testing and test planning skills. Demonstrated ability to create automated testing. - Knowledge of HPC job scheduler system design and operation such as SLURM or PBS, - Demonstrated skill (5 years +) deploying, managing, and troubleshooting Warewulf (or similar) infiniband based clusters - Ability to write technical documentation in a clear and concise manner. Ability to develop runbooks defining complex technical processes in a clear and concise manner - Strong knowledge of High performance parallel filesystems and storage such as GPFS, Lustre, Vast, DDN, etc - Understanding of system performance monitoring and actions that can be taken to improve or correct performance. - Demonstrated advanced knowledge, skills and abilities associated with system problem identification and resolution. Experience with design, configuration, operation, repair, and tuning of technology systems. - Ability to elicit and communicate technical and non-technical information in a clear and concise manner. - Self-motivated and works independently and as part of a team. Demonstrates problem-solving skills. Able to learn effectively and meet deadlines. - Advanced experience writing and editing the most complex scripts used to perform system maintenance and administration. - Advanced knowledge of computer security best practices and policies including demonstrated experience securing research cyberinfrastructure systems to meet NIST 800-171 / 800-223, HIPPA or IS-3 requirements PREFERRED QUALIFICATIONS - Expert knowledge of HPC systems infrastructure design - Knowledge of the design, development and application of technology and systems to meet business needs. - General knowledge of other areas of IT. Thorough understanding of and experience with systems-related issues and actions that can be taken to improve or correct performance. - Demonstrated skills associated with adapting equipment and technology to serve user needs. Demonstrated comprehensive understanding of how system management actions affect other systems, system users and dependent / related functions.