Most recent job postings at sre
via CloudMR posted_at: 7 days agoschedule_type: Full-time
As a GCP Site Reliability Engineer, you will play a critical role in ensuring the reliability, availability, and performance of cloud-based applications and services on the Google Cloud Platform. You will collaborate with cross-functional teams to design, build, and maintain highly scalable and fault-tolerant systems. Responsibilities... • Design, implement, and manage scalable and reliable infrastructure on GCP, utilizing services such as Compute As a GCP Site Reliability Engineer, you will play a critical role in ensuring the reliability, availability, and performance of cloud-based applications and services on the Google Cloud Platform. You will collaborate with cross-functional teams to design, build, and maintain highly scalable and fault-tolerant systems.

Responsibilities...
• Design, implement, and manage scalable and reliable infrastructure on GCP, utilizing services such as Compute Engine, Kubernetes Engine, Cloud Functions, and more.
• Develop and implement monitoring, alerting, and incident management systems to proactively identify and address potential issues.
• Automate infrastructure provisioning, configuration management, and deployment processes using tools like Terraform, Ansible, or Deployment Manager.
• Collaborate with development teams to ensure optimal application performance, scalability, and availability.
• Implement and enhance logging, tracing, and observability solutions to gain insights into system behavior and performance.
• Participate in capacity planning and performance optimization efforts to ensure efficient resource utilization.
• Respond to and troubleshoot production incidents, driving root cause analysis and implementing preventive measures.

Requirements:
• Bachelor's degree in Computer Science, Engineering, or a related field.
• Strong experience as a Site Reliability Engineer, with expertise in Google Cloud Platform (GCP) services.
• In-depth knowledge of GCP services, including Compute Engine, Kubernetes Engine, Cloud Functions, Cloud Storage, and related technologies.
• Proficiency in infrastructure-as-code tools such as Terraform, Ansible, or Deployment Manager.
• Experience with containerization and orchestration using Docker and Kubernetes.
• Strong understanding of monitoring and observability tools, such as Stackdriver, Prometheus, or ELK stack.
• Familiarity with incident management, change management, and configuration management practices.
• Experience with scripting and automation using languages like Bash, Python, or Ruby.

If you are passionate about Site Reliability Engineering and GCP, and if you thrive in a collaborative and challenging environment, we want to hear from you! Apply now to be part of our team and shape the future of reliable and scalable solutions on the Google Cloud Platform
Show more details...
via Dice posted_at: 4 days agoschedule_type: Full-time
Job Description Site Reliability Engineering at Ford Motor Company plays a critical role in maintaining and improving the reliability, scalability, and performance of our services. You will work closely with our development teams to build and maintain large-scale, distributed systems and ensure our products meet our high standards for availability and user experience... • Write, configure, and deploy code that improves service reliability for existing Job Description

Site Reliability Engineering at Ford Motor Company plays a critical role in maintaining and improving the reliability, scalability, and performance of our services. You will work closely with our development teams to build and maintain large-scale, distributed systems and ensure our products meet our high standards for availability and user experience...
• Write, configure, and deploy code that improves service reliability for existing or new systems; set standard for others with respect to code quality
• Provide helpful and actionable feedback and review for code or production changes
• Drive repair/optimization of complex systems with consideration towards a wide range of contributing factors
• Lead debugging, troubleshooting, and analysis of service architecture and design
• Participate in on-call rotation
• Write documentation: design, system analysis, runbooks, playbooks. Provide design feedback and uplevel design skills of others.
• Implement and manage monitoring solutions using Dynatrace, Splunk, and OpenTelemetry to ensure visibility and proactive issue detection across our platforms.
• Work within Google Cloud Platform infrastructure, optimizing performance, and cost, and scaling resources to meet demand.
• Collaborate with development teams to enhance system reliability and performance, applying a platform engineering mindset to system administration tasks.
• Develop and maintain automated solutions for operational aspects such as on-call monitoring, performance tuning, and disaster recovery.
• Troubleshoot and resolve issues in our dev, test, and production environments.
• Participate in postmortem analysis and create preventative measures for future incidents.
• Bachelor's degree in Computer Science, Engineering, or equivalent experience.
• 3+ years of experience as an SRE, DevOps Engineer, or in a similar role.
• Strong experience with monitoring and observability tools, particularly Dynatrace and OpenTelemetry.
• Proficient with cloud services, with a strong preference for Google Cloud Platform (Google Cloud Platform) experience.
• Solid programming skills in Java, with a good understanding of software development best practices.
• Experience managing and optimizing PostgreSQL databases.
• Familiarity with front-end development frameworks, particularly React.
• Ability to debug, optimize code, and automate routine tasks.
• Strong problem-solving skills and the ability to work under pressure in a fast-paced environment.
• Excellent verbal and written communication skills.

You may not check every box, or your experience may look a little different from what we've outlined, but if you think you can bring value to Ford Motor Company, we encourage you to apply!

As an established global company, we offer the benefit of choice. You can choose what your Ford future will look like: will your story span the globe, or keep you close to home? Will your career be a deep dive into what you love, or a series of new teams and new skills? Will you be a leader, a changemaker, a technical expert, a culture builder...or all of the above? No matter what you choose, we offer a work life that works for you, including:

Immediate medical, dental, and prescription drug coverage

Flexible family care, parental leave, new parent ramp-up programs, subsidized back-up child care and more

Vehicle discount program for employees and family members, and management leases

Tuition assistance

Established and active employee resource groups

Paid time off for individual and team community service

A generous schedule of paid holidays, including the week between Christmas and New Year's Day

Paid time off and the option to purchase additional vacation time.

For a detailed look at our benefits, click here: Benefit Summary

Visa sponsorship is available for this position.

SOUTHEAST MI RESIDENTS: Please note, this job is posted as remote unless the selected candidates lives within 50 miles of Dearborn, MI. We request the candidate to be onsite 1-2 days a week.

Candidates for positions with Ford Motor Company must be legally authorized to work in the United States. Verification of employment eligibility will be required at the time of hire.

We are an Equal Opportunity Employer committed to a culturally diverse workforce. All qualified applicants will receive consideration for employment without regard to race, religion, color, age, sex, national origin, sexual orientation, gender identity, disability status or protected veteran status. In the United States, If you need a reasonable accommodation for the online application process due to a disability, please call 1-.

#LI-Remote

$desc3
Show more details...
via Glassdoor posted_at: 24 days agoschedule_type: Full-timework_from_home: 1
Note: Google’s hybrid workplace includes remote roles. Remote location: Texas, USA... Minimum qualifications: • Bachelor’s degree in Computer Science, a related field, or equivalent practical experience. • 5 years of experience with software development in one or more programming languages. • 8 years of experience with data structures or algorithms. 3 years of experience leading projects and designing, analyzing, and troubleshooting distributed Note: Google’s hybrid workplace includes remote roles.

Remote location: Texas, USA...
Minimum qualifications:
• Bachelor’s degree in Computer Science, a related field, or equivalent practical experience.
• 5 years of experience with software development in one or more programming languages.
• 8 years of experience with data structures or algorithms.

3 years of experience leading projects and designing, analyzing, and troubleshooting distributed systems.
• Preferred qualifications:
• Master's degree in Computer Science or Engineering.

About the job

Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Google Cloud's services—both our internally critical and our externally-visible systems—have reliability, uptime appropriate to customer's needs and a fast rate of improvement. Additionally SRE’s will keep an ever-watchful eye on our systems capacity and performance.

Much of our software development focuses on optimizing existing systems, building infrastructure and eliminating work through automation. On the SRE team, you’ll have the opportunity to manage the complex challenges of scale which are unique to Google Cloud, while using your expertise in coding, algorithms, complexity analysis and large-scale system design. SRE's culture of diversity, intellectual curiosity, problem solving and openness is key to its success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think big and take risks in a blame-free environment. We promote self-direction to work on meaningful projects, while we also strive to create an environment that provides the support and mentorship needed to learn and grow.Behind everything our users see online is the architecture built by the Technical Infrastructure team to keep it running. From developing and maintaining our data centers to building the next generation of Google platforms, we make Google's product portfolio possible. We're proud to be our engineers' engineers and love voiding warranties by taking things apart so we can rebuild them. We keep our networks up and running, ensuring our users have the best and fastest experience possible.

The US base salary range for this full-time position is $189,000-$284,000 + bonus + equity + benefits. Our salary ranges are determined by role, level, and location. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position across all US locations. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Your recruiter can share more about the specific salary range for your preferred location during the hiring process.

Please note that the compensation details listed in US role postings reflect the base salary only, and do not include bonus, equity, or benefits. Learn more about benefits at Google.

Responsibilities
• Engage in and improve the whole lifecycle of services—from inception and design, through to deployment, operation and refinement.
• Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews.
• Maintain services once they are live by measuring and monitoring availability, latency and overall system health.
• Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity.
• Practice sustainable incident response and blameless postmortems.

Google is proud to be an equal opportunity workplace and is an affirmative action employer. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or Veteran status. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. See also Google's EEO Policy and EEO is the Law. If you have a disability or special need that requires accommodation, please let us know by completing our Accommodations for Applicants form
Show more details...
via Ladders schedule_type: Full-time
Minimum qualifications:Bachelor’s degree in Computer Science, a related field, or equivalent practical experience.Experience with data structures/algorithms and software development in one or more programming languages.Preferred qualifications:Master's degree in Computer Science or Engineering.About The JobSite Reliability Engineering (SRE) combines software and systems engineering to build and... run large-scale, massively distributed, fault-tolerant Minimum qualifications:Bachelor’s degree in Computer Science, a related field, or equivalent practical experience.Experience with data structures/algorithms and software development in one or more programming languages.Preferred qualifications:Master's degree in Computer Science or Engineering.About The JobSite Reliability Engineering (SRE) combines software and systems engineering to build and... run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Google Cloud's services—both our internally critical and our externally-visible systems—have reliability, uptime appropriate to customer's needs and a fast rate of improvement. Additionally SRE’s will keep an ever-watchful eye on our systems capacity and performance.Much of our software development focuses on optimizing existing systems, building infrastructure and eliminating work through automation. On the SRE team, you’ll have the opportunity to manage the complex challenges of scale which are unique to Google Cloud, while using your expertise in coding, algorithms, complexity analysis and large-scale system design. SRE's culture of diversity, intellectual curiosity, problem solving and openness is key to its success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think big and take risks in a blame-free environment. We promote self-direction to work on meaningful projects, while we also strive to create an environment that provides the support and mentorship needed to learn and grow.With your technical expertise you will manage project priorities, deadlines, and deliverables. You will design, develop, test, deploy, maintain, and enhance software solutions.The US base salary range for this full-time position is $112,000-$162,000 + bonus + equity + benefits. Our salary ranges are determined by role, level, and location. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position across all US locations. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Your recruiter can share more about the specific salary range for your preferred location during the hiring process.Please note that the compensation details listed in US role postings reflect the base salary only, and do not include bonus, equity, or benefits. Learn more about benefits at Google .ResponsibilitiesWrite product or system development code.Review code developed by other engineers and provide feedback to ensure best practices (e.g., style guidelines, checking code in, accuracy, testability, and efficiency).Contribute to existing documentation or educational content and adapt content based on product/program updates and user feedback.Triage product or system issues and debug/track/resolve by analyzing the sources of issues and the impact on hardware, network, or service operations and quality.Participate in, or lead design reviews with peers and stakeholders to decide amongst available technologies.Google is proud to be an equal opportunity workplace and is an affirmative action employer. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or Veteran status. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. See also Google's EEO Policy and EEO is the Law. If you have a disability or special need that requires accommodation, please let us know by completing our Accommodations for Applicants form .
#J-18808-Ljbffr
Show more details...
via ZipRecruiter schedule_type: Full-time
Lead Site Reliability Engineer Google Cloud Location: Alpharetta GA ... Mandatory Skill- SRE Lead Needed GCP SRE • GCP certification preferred (Associate cloud engineer, Dev Ops or Architect) • Two plus years of experience in designing and deploying enterprise solutions in Google Cloud • Experience in configuring, building, and supporting apps and operations in GCP • Good knowledge of GCP cloud infrastructure (various cloud services, when Lead Site Reliability Engineer Google Cloud
Location: Alpharetta GA
...
Mandatory Skill-
SRE
Lead Needed
GCP
SRE
• GCP certification preferred (Associate cloud engineer, Dev Ops or Architect)
• Two plus years of experience in designing and deploying enterprise solutions in Google Cloud
• Experience in configuring, building, and supporting apps and operations in GCP
• Good knowledge of GCP cloud infrastructure (various cloud services, when to use what, cloud security, IAM, service accounts, VPC), and provisioning tools like Terraform/Ansible
• Experience in continuous integration tools (Jenkins, SonarQube, JIRA, Nexus, GIT-Bit Bucket, Maven, Gradle, Run Deck)
• System administration skills, including automation and orchestration of Linux/Windows using Chef, Puppet, Ansible, Salt Stack and/or containers (Docker, Kubernetes, etc.) • Knowledge of Kubernetes, its deployment, and Google Kubernetes Engine
• Experience in configuring and administering application servers (Tomcat, NGINX)
• Experience in scripting language such as Unix Shells, Python, bash
• Use analytic skills to interpret complex information and adapt; participate with the Enterprise Architecture team to evaluate solution design and also collaborate and provide feedback to the product development team; participate in projects with other IT professionals, deliver quality applications and components within scope, on time, and within budget;
• Participate in business continuous improvement efforts outside of the customer focused teams; and provide guidance and direction to distributed teams, including onshore and offshore resources.
• Responsible for creating and maintaining all technical artifacts on the Platform.
• Provide technical guidance to onshore/offshore development teams
Show more details...
via LinkedIn schedule_type: Contractor
Location: Alpharetta GA Mandatory Skill... SRE Lead Needed GCP SRE • GCP certification preferred (Associate cloud engineer, Dev Ops or Architect) • Two plus years of experience in designing and deploying enterprise solutions in Google Cloud • Experience in configuring, building, and supporting apps and operations in GCP • Good knowledge of GCP cloud infrastructure (various cloud services, when to use what, cloud security, IAM, service Location: Alpharetta GA

Mandatory Skill...

SRE

Lead Needed

GCP

SRE
• GCP certification preferred (Associate cloud engineer, Dev Ops or Architect)
• Two plus years of experience in designing and deploying enterprise solutions in Google Cloud
• Experience in configuring, building, and supporting apps and operations in GCP
• Good knowledge of GCP cloud infrastructure (various cloud services, when to use what, cloud security, IAM, service accounts, VPC), and provisioning tools like Terraform/Ansible
• Experience in continuous integration tools (Jenkins, SonarQube, JIRA, Nexus, GIT-Bit Bucket, Maven, Gradle, Run Deck)
• System administration skills, including automation and orchestration of Linux/Windows using Chef, Puppet, Ansible, Salt Stack and/or containers (Docker, Kubernetes, etc.)
• Knowledge of Kubernetes, its deployment, and Google Kubernetes Engine
• Experience in configuring and administering application servers (Tomcat, NGINX)
• Experience in scripting language such as Unix Shells, Python, bash
• Use analytic skills to interpret complex information and adapt; participate with the Enterprise Architecture team to evaluate solution design and also collaborate and provide feedback to the product development team; participate in projects with other IT professionals, deliver quality applications and components within scope, on time, and within budget;
• Participate in business continuous improvement efforts outside of the customer focused teams; and provide guidance and direction to distributed teams, including onshore and offshore resources.
• Responsible for creating and maintaining all technical artifacts on the Platform.
• Provide technical guidance to onshore/offshore development teams
Show more details...
via LinkedIn posted_at: 5 days agoschedule_type: Full-time
Note: By applying to this position you will have an opportunity to share your preferred working location from the following: San Francisco, CA, USA; San Bruno, CA, USA; Pittsburgh, PA, USA; Cambridge, MA, USA.Minimum qualifications: • Bachelor’s degree in Computer Science, a related field, or equivalent practical experience. • 5 years of experience with software development in one or more... programming languages. • 5 years of experience with Note: By applying to this position you will have an opportunity to share your preferred working location from the following: San Francisco, CA, USA; San Bruno, CA, USA; Pittsburgh, PA, USA; Cambridge, MA, USA.Minimum qualifications:
• Bachelor’s degree in Computer Science, a related field, or equivalent practical experience.
• 5 years of experience with software development in one or more... programming languages.
• 5 years of experience with data structures or algorithms.
• 3 years of experience in designing, analyzing, and troubleshooting large-scale distributed systems, and 2 years of experience leading projects and providing technical leadership.

Preferred qualifications:
• Master's degree in Computer Science or Engineering.

About The Job

Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Google Cloud's services—both our internally critical and our externally-visible systems—have reliability, uptime appropriate to customer's needs and a fast rate of improvement. Additionally SRE’s will keep an ever-watchful eye on our systems capacity and performance.

Much of our software development focuses on optimizing existing systems, building infrastructure and eliminating work through automation. On the SRE team, you’ll have the opportunity to manage the complex challenges of scale which are unique to Google Cloud, while using your expertise in coding, algorithms, complexity analysis and large-scale system design. SRE's culture of diversity, intellectual curiosity, problem solving and openness is key to its success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think big and take risks in a blame-free environment. We promote self-direction to work on meaningful projects, while we also strive to create an environment that provides the support and mentorship needed to learn and grow.

Behind everything our users see online is the architecture built by the Technical Infrastructure team to keep it running. From developing and maintaining our data centers to building the next generation of Google platforms, we make Google's product portfolio possible. We're proud to be our engineers' engineers and love voiding warranties by taking things apart so we can rebuild them. We keep our networks up and running, ensuring our users have the best and fastest experience possible.

The US base salary range for this full-time position is $161,000-$239,000 + bonus + equity + benefits. Our salary ranges are determined by role, level, and location. The range displayed on each job posting reflects the minimum and maximum target salaries for the position across all US locations. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Your recruiter can share more about the specific salary range for your preferred location during the hiring process.

Please note that the compensation details listed in US role postings reflect the base salary only, and do not include bonus, equity, or benefits. Learn more about benefits at Google .

Responsibilities
• Engage in and improve the whole lifecycle of services—from inception and design, through to deployment, operation and refinement.
• Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews.
• Maintain services once they are live by measuring and monitoring availability, latency and overall system health.
• Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity.
• Practice sustainable incident response and blameless postmortems.

Google is proud to be an equal opportunity workplace and is an affirmative action employer. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or Veteran status. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. See also Google's EEO Policy and EEO is the Law. If you have a disability or special need that requires accommodation, please let us know by completing our Accommodations for Applicants form
Show more details...
via Salary.com posted_at: 7 days agoschedule_type: Full-time
Job Details Position: Site Reliability Engineer with Google Cloud Platform... Location: Dearborn, MI (Hybrid) Job Type: W2 Top Requirements: • Bachelor Degree in Computer Science or related field • 2 years of experience as a Site Reliability Engineer including at least one SRE certification • 2 years of expertise in Observability stacks such as Prometheus, Grafana, and Dynatrace • Knowledge of Google Cloud Platform Cloud services • Experience Job Details

Position: Site Reliability Engineer with Google Cloud Platform...

Location: Dearborn, MI (Hybrid)

Job Type: W2

Top Requirements:
• Bachelor Degree in Computer Science or related field
• 2 years of experience as a Site Reliability Engineer including at least one SRE certification
• 2 years of expertise in Observability stacks such as Prometheus, Grafana, and Dynatrace
• Knowledge of Google Cloud Platform Cloud services
• Experience building software and computer systems using a variety of languages ( C/C /C#, Java, JavaScript, Ruby)
• Dice Id: 10244982
• Position Id: 8313082
Show more details...
via Pittsburgh, PA - Geebo posted_at: 3 days agoschedule_type: Full-timesalary: 20–28 an hour
Minimum Qualifications: 10 years of work experience in a production environment. Experience programming in C, C , Java, Python, Go, Perl, and/or Ruby. Experience architecting, developing, and troubleshooting systems. Experience with algorithms and data structures and/or Unix/Linux systems internals (e.g., filesystems, system calls), and administration. Preferred... Qualifications: Bachelor's degree in Computer Science, similar technical field of study, Minimum
Qualifications:
10 years of work experience in a production environment. Experience programming in C, C , Java, Python, Go, Perl, and/or Ruby. Experience architecting, developing, and troubleshooting systems. Experience with algorithms and data structures and/or Unix/Linux systems internals (e.g., filesystems, system calls), and administration. Preferred...
Qualifications:
Bachelor's degree in Computer Science, similar technical field of study, or equivalent practical experience. 10 years of experience in computing, distributed systems, storage, or networking. Experience designing, analyzing, and troubleshooting large-scale distributed systems. Ability to have a sense of ownership and drive. Systematic problem-solving approach with excellent communication skills. About the job Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Google Cloud's services-both our internally critical and our externally-visible systems-have reliability, uptime appropriate to customer's needs and a fast rate of improvement. Additionally SRE's will keep an ever-watchful eye on our systems capacity and performance. Much of our software development focuses on optimizing existing systems, building infrastructure and eliminating work through automation. On the SRE team, you'll have the opportunity to manage the complex challenges of scale which are unique to Google Cloud, while using your expertise in coding, algorithms, complexity analysis and large-scale system design. SRE's culture of diversity, intellectual curiosity, problem solving and openness is key to its success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think big and take risks in a blame-free environment. We promote self-direction to work on meaningful projects, while we also strive to create an environment that provides the support and mentorship needed to learn and grow. We are the team responsible for the systems and business practices for managing compute and storage resources across the Google Fleet. We sustain innovation through simple, reliable, and efficient use of Google's fleet. As a Senior Staff Engineer, you will provide technical leadership across multiple Site Reliability Engineering teams and Product Engineering teams on a global scale. Behind everything our users see online is the architecture built by the Technical Infrastructure team to keep it running. From developing and maintaining our data centers to building the next generation of Google platforms, we make Google's product portfolio possible. We're proud to be our engineers' engineers and love voiding warranties by taking things apart so we can rebuild them. We keep our networks up and running, ensuring our users have the best and fastest experience possible. Responsibilities Lead designs of major software components, systems, and features to improve the availability, scalability, latency, and efficiency of Google's services. Lead sustainable incident response, postmortems, and production improvements that result in direct business opportunities for Google. Provide guidance to other team members on managing availability and performance of mission critical services, building automation to prevent problem recurrence, and building automated responses for non-exceptional service conditions. Mentor and train other team members on design techniques and coding standards, and cultivate innovation and collaboration. Manage individual projects priorities, deadlines, and deliverables.
Salary Range:
$150K -- $200K
Minimum Qualification
DevOps & Site ReliabilityEstimated Salary: $20 to $28 per hour based on qualifications
Show more details...
via Huntington Bank schedule_type: Full-time
Description Summary... The Google Cloud Platform (GCP) Site Reliability Engineer (SRE) Manager is responsible for supporting the GCP framework and consumers of the platform. The position reports into the Chief Development Office (CDO) and will manage a team of SRE’s that support GCP. Job Description The Google Cloud Platform (GCP) Site Reliability Engineer (SRE) Manager is responsible for supporting the GCP framework and consumers of the platform. Description

Summary...

The Google Cloud Platform (GCP) Site Reliability Engineer (SRE) Manager is responsible for supporting the GCP framework and consumers of the platform. The position reports into the Chief Development Office (CDO) and will manage a team of SRE’s that support GCP.

Job Description

The Google Cloud Platform (GCP) Site Reliability Engineer (SRE) Manager is responsible for supporting the GCP framework and consumers of the platform. The SRE manager will lead a team of SRE’s to develop Infrastructure as Code (IaC) to provide platform, infrastructure, observability, and security capabilities via Terraform and Pipeline automation. The qualified candidate will collaborate with the CDO, Application, Incident, Security, and Change Management teams to manage the ITIL process, reduce toil, enhance reliability, and drive innovation for the GCP. Candidate will join a team of developers whose goal is to enable via automation and a culture of support, continuous improvement, and learning.

Responsibilities:
• Manage GCP’s SRE team, discipline, maintain service levels, manage cost, and enhance operations
• Manage Stack Overflow channel, GCP releases and Disaster Recovery exercises
• Manage Platform RBAC, Firewall and User Access certifications
• Support GCP’s Service Now platform and application configurations
• Develop SRE strategies, best practices, and knowledge base
• Build monitoring/alerting/availability/uptime into product and reduce toil
• Participate in the DevSecOps model to build, test, and implement SRE cloud solutions via IaC
• Collaborate with Incident/CSOC/SRE teams to troubleshoot issues and perform root cause analysis
• Provide 24x7 support for the GCP and coordinate on-call rotations
• Conduct periodic blameless incident retrospective and focus on continuous improvement
• Conduct training sessions and simulated game days
• Experience with scripting and programming languages and concepts
• Demonstrate knowledge of GCP, CLI, services and integrations
• Demonstrate knowledge of DevSecOps tool chains and processes
• Demonstrate knowledge of IaC software: Terraform, CLI, CDM, CFT, ARM, etc.
• Demonstrate knowledge of Security as Code principles, policy, best practices, and tools
• Demonstrate knowledge of Credential, Certificate and Encryption best practices, rotation, and policies
• Experience using monitoring tools like Cloud Logging, Splunk, Dynatrace to evaluate system health, research issues, identify root causes and provide solution options
• Additional duties as required

Basic Qualifications:
• Bachelor's Degree
• 7+ years of SRE experience with GCP, AWS, and/or Azure

Preferred Qualifications:
• Minimum of 2 years of supporting IaC automation, preferably Terraform
• Minimum of 2 years of coding/scripting experience
• Self-motivated problem solver
• Experience troubleshooting cloud-based technologies
• Cloud (GCP, AWS, Azure) and/or IaC (Terraform) certifications and/or work experience
• Experience in Agile delivery, Azure DevOps Services, CI/CD Pipelines, Git, Snyk, Cyberark, Splunk, etc.
• Experience with cloud security, IAM, Security Scans and custom polices
• Full stack engineering knowledge – application, network, infrastructure, and security
• Understanding of containers and serverless computing concepts
• Background in application, database, and infrastructure monitoring tools
• Willingness to guild others and outstanding communication skills
• Familiarity with financial industry

Exempt Status: (Yes = not eligible for overtime pay) (No = eligible for overtime pay)
Yes

Workplace Type:

Huntington is an equal opportunity and affirmative action employer and is committed to providing equal employment opportunities for all regardless of race, color, religion, sex, national origin, age, disability, sexual orientation, veteran status, gender identity and expression, genetic information, or any other basis protected by local, state, or federal law.

Tobacco-Free Hiring Practice: Visit Huntington's Career Web Site for more details.

Agency Statement: Huntington does not accept solicitation from Third Party Recruiters for any position
Show more details...