Position title
Site Reliability Engineer
Description
  • Location: Houston, TX
  • # of Positions: 5
  • Eligibility: Open
  • Client Name / Domain: Oil & Gas
  • Bill Rate: $?? per hour
  • Employment Mode: Contract / Corp-to-Corp
  • Contract Duration: 12+ mos. Contract
  • Experience: 8+ years
  • Skills: CCNA,  Network+,  Security+,  Microsoft Azure Fundamentals or Microsoft Azure Administrator,  Automation with Ansible

Client is looking for an experienced SRE Consultant to provide scalable platform for Ekata to serve customers. The infrastructure team provides a resource for Engineering to help diagnose production issues and provide guidance on improving the availability and performance of our applications. This position also develops systems, automation, and tools to help make it easier for Engineering teams to deploy services in a fast, automated, and reliable fashion.

Responsibilities
  • Use broad full-stack knowledge and experience for proactive incident prevention by baselining against expected service performance, process improvement from application of lessons learned, and utilization of data analytics to proactively identify problem areas and operational gaps.
  • Effectively and efficiently lead product line agile teams in troubleshooting and resolving system problems, including analysing application and critical system performance.
  • Provide on-call operational support during US business hours and serve as a technical resource during critical and major incidents supporting multiple technologies.
  • Facilitate SRE technical assessments, identify gaps, and provide recommendations to product teams on SRE maturity journey plans based on Chevron’s SRE framework.
  • Find opportunities to avoid future issues by improving logging and creating automated resolutions based on triggers. Develop automation scripts for repetitive tasks to eliminate toil / operations support activities.
  • Oversee production environments by monitoring availability and maintaining a holistic view of system health.
  • Measure and optimize system performance, continuously seeking innovation and improvement to meet customer needs.
  • Align, collaborate, and build relationships with peers, company leadership, subject matter experts and users to improve knowledge of end-to-end DevOps / Site Reliability Engineering best practices.
Qualifications
  • Experience and working knowledge in various technology disciplines required for full end-to-end service operations stack: network administration & security (CISCO / Juniper), identity & access management (Active Directory, Azure AD, SAML, OpenID Federation, certificates, and keys), cybersecurity, on-premise & cloud architecture, Windows & Linux OS, performance monitoring & management, troubleshooting (application & database), change management, and API integration.
  • On-call experience troubleshooting incidents and production issues, developing monitoring and alerting systems, and writing post-incident reviews.
  • SRE - Ability to leverage knowledge and best practices of Site Reliability Engineering such as SLO, SLI, and SLA definitions, to eliminate toil via automation, establish error budgets, perform emergency response (triage, postmortem, retrospective), plan demand forecast, and provide capacity planning.
  • Software Engineering -
    • Basic understanding of software development lifecycle and software engineering best practices, including code management (Git/GitHub) and CI/CD pipeline. Ability to develop automation using scripting languages such as PowerShell, Python, or Bash.
  • Communication and Teamwork - Ability to communicate in a clear and concise manner, consider the views and concerns of others, achieve objectives through teamwork, and deliver consistently on commitments.
  • Agile and DevOps - Fundamental understanding of Agile methodologies (e.g., SAFe PI planning, scrum ceremonies, kanban) and DevOps practices (e.g., CI/CD, IaC, containerization, orchestration, and other).
  • Autonomous Motivation - Ability to work independently with minimal guidance to achieve defined technical / business objectives. Excellent time management, organizational skills, crisis management and problem-solving skills are essential.  Ability to grasp new technologies and translate into business value.
  • Key Skills:  CCNA,  Network+,  Security+,  Microsoft Azure Fundamentals or Microsoft Azure Administrator,  Automation with Ansible
Contacts

If you are interested in applying for this role, please send your updated resume to tpjobs@techpeople.us

Employment Type
Contractor
Duration of employment
12 - 18 mos
Industry
Oil and Gas
Job Location
Houston, TX (Hybrid / Work-from-Home) / C2C
Working Hours
Central Time USA
Date posted
September 1, 2023
PDF Export
Close modal window

Thank you for submitting your application. We will contact you shortly!