Position: Load sharing facility IBM
Location: San Jose, CA- Remote
Contract: W2
Key Job Responsibilities
• Cluster Management: Install, configure, and maintain IBM Spectrum LSF clusters to optimize resource utilization.
• Workload Optimization: Manage job queues, policy-driven scheduling, and workload balancing across server hosts.
• Troubleshooting: Monitor system performance (LIM, MBD, SBD daemons) and resolve issues related to job submission, execution, and host availability.
• Automation & Scripting: Develop tools (Python, shell scripts) to streamline cluster management and improve efficiency.
• License Management: Optimize software license configuration to ensure efficient EDA tool utilization.
• Collaboration: Work with engineering, DevOps, and data science teams to align HPC infrastructure with business needs.
Required Skills and Qualifications
• Experience: Generally 4–12+ years in IT architecture, system engineering, or HPC environments.
• Technical Knowledge: Deep understanding of IBM Spectrum LSF, job scheduling, and workload management.
• OS Proficiency: Strong Linux/Unix systems administration skills.
• Automation Tools: Experience with scripting (Python, shell) and automation tools like Ansible or Terraform.
• Education: Bachelor’s or Master’s degree in Computer Science or Engineering