We use cookies to enhance your experience on our website. Please read and confirm your agreement to our Privacy Policy and Terms and Conditions before continue to browse our website.
Job Description: We are looking for an experienced Site Reliability Engineer (SRE) to specialize in data platforms on Alibaba Cloud. This role involves leveraging MaxCompute and other Alibaba Cloud data solutions to maintain and optimize our data infrastructure.
Responsibilities:
Design, implement, and manage data platforms using Alibaba Cloud services, with a focus on MaxCompute.
Monitor and enhance the reliability, availability, and performance of data pipelines and systems.
Collaborate with data engineering teams to optimize data workflows and storage solutions.
Develop automation scripts and tools to streamline data platform management.
Implement robust security and disaster recovery measures for the data infrastructure.
Create and maintain documentation and standards for managing data systems.
Troubleshoot and resolve incidents related to data systems and ensure root cause analysis.
Requirements:
5+ years of experience in SRE or DevOps roles, with a focus on data platforms.
Strong expertise in Alibaba Cloud services, especially MaxCompute.
Proficiency in Python and Bash scripting.
Experience with Infrastructure as Code (IaC) tools like Terraform.
Familiarity with containerization and orchestration technologies, such as Docker and Kubernetes.
Excellent problem-solving skills and attention to detail.
Strong communication and collaboration skills.
All applications applied through our system will be delivered directly to the advertiser and privacy of personal data of the applicant will be ensured with security.