Senior Site Reliability Engineer (AWS, AI/ML, & APM)

Location US-Remote
ID 2025-8855

Job Summary

Granicus​ is seeking an experienced and highly skilled Senior Site Reliability Engineer (SRE) to join our SRE team. As a Senior SRE, you will play a pivotal role in ensuring the reliability, scalability, and performance of our services. You will lead efforts in building and maintaining a robust infrastructure, automating processes, and guiding the team to implement best practices in site reliability. 

What Your Impact Will Look Like

  • ​​On-call Production Support: Provide production support on a shift according to the team on-call roster. 
  • ​Work on the customer and internal engineering/implementation team raised tickets while not on-call for production support. For example, a client may request to correct some data on the database server which cannot be done through the web interface.  
  • ​Work on SREs backlog items.  
  • ​Monitor and Maintain Systems: Continuously monitor the health and performance of our services, systems, and infrastructure. Respond to alerts and incidents promptly to ensure high availability. 
  • ​Automate Processes: Develop and maintain automation scripts and tools to streamline operations and reduce manual intervention. 
  • ​Incident Management: Assist in troubleshooting and resolving incidents, performing root cause analysis, and implementing long-term fixes to prevent recurrence. 
  • ​System Improvements: Participate in designing and implementing system improvements to enhance reliability, scalability, and performance. 
  • ​Collaboration: Work closely with software engineers to understand application requirements, provide feedback on design and architecture, and support deployment and release processes. 
  • ​Documentation: Create and maintain documentation for processes, procedures, and troubleshooting guides to ensure knowledge sharing within the team. 
  • ​Capacity Planning: Assist in capacity planning activities to anticipate future needs and ensure that our infrastructure can handle growth. 
  • ​Security: Implement and adhere to security best practices to protect our systems and data.​ 

You Will Love This Job If You Have

  • 5+ years in site reliability engineering, system administration, or a similar role, with a proven track record of managing large-scale, high-availability systems. Experience supporting AI/ML infrastructure, including model deployment, inference optimization, and integration with services like AWS Bedrock is highly desirable.
  • Expertise in Linux/Unix systems, and cloud platforms (AWS, Azure, or Google Cloud).
  • Strong proficiency in scripting languages (Python, Bash, Ruby) and programming languages (Go, Java, C++).
  • Familiarity with AI/ML operations, including model lifecycle management, vector databases, and inference performance tuning.
  • Experience with the ELK Stack (Elasticsearch, Logstash, Kibana) for centralized logging, monitoring, and observability.
  • Experience with configuration management tools (Ansible, Chef, Puppet).
  • Exposure to AI/ML toolchains, including AWS Bedrock, SageMaker, and LLMOps frameworks.
  • Certifications: Relevant certifications such as AWS Certified DevOps Engineer, AWS Certified Machine Learning – Specialty, Google Cloud Professional DevOps Engineer, or similar are a plus.

The Benefits

At Granicus, we offer a comprehensive and flexible benefits package designed to support your well-being, growth, and work-life balance—starting from day one.

Here’s what you can expect as a U.S.-based team member:


Flexibility & Balance

  • Flexible Time Off – Take the time you need to rest, recharge, and live your life.
  • Company-Wide Wellbeing Days – Paid days off to unplug and focus on your mental health.
  • Work From Home Reimbursement – Support a productive home office environment.

Health & Wellness
  • Multiple Health Plan Options – Including a 100% employer-paid plan.
  • Employer HSA Contributions – When enrolled in a High-Deductible Health Plan.
  • Fitness Reimbursement Program – Stay active, your way.
  • On-Demand Mental Health Support – Access to Headspace and other wellness tools.

Family & Future
  • Paid Parental Leave – For both birthing and non-birthing parents.
  • Traditional & Roth 401(k) – With a generous company match.
  • Life & AD&D Insurance – 100% employer-paid coverage for peace of mind.

Growth & Recognition
  • Online Learning Platforms – Fuel your professional development.
  • Competitive Salary & Bonuses – Your contributions are valued and rewarded.

Equal Opportunity Employer

Granicus is committed to providing equal employment opportunities. All qualified applicants and employees will be considered for employment and advancement without regard to race, color, religion, creed, national origin, ancestry, sex, gender, gender identity, gender expression, physical or mental disability, age, genetic information, sexual or affectional orientation, marital status, status with regard to public assistance, familial status, military or veteran status or any other status protected by applicable law.

Options

Sorry the Share function is not working properly at this moment. Please refresh the page and try again later.