[GenAI Core] - Staff Site Reliability Engineer
π Job Overview
- Job Title: Staff Site Reliability Engineer (GenAI Core)
- Company: Stone - Linkedin
- Location: Remote (Brazil)
- Job Type: Full-Time
- Category: DevOps Engineer
- Date Posted: 2025-07-18
π Role Summary
- Lead the development and implementation of Site Reliability Engineering (SRE) solutions for the GenAI Core team at Stone Tech.
- Collaborate with a multidisciplinary team to define, design, and implement scalable, reliable, and efficient systems.
- Ensure high system availability, fault tolerance, and performance optimization.
- Contribute to the development of SRE best practices and standards within the organization.
π Enhancement Note: This role involves working with cutting-edge AI technology, requiring a strong background in SRE and a deep understanding of modern web technologies and infrastructure.
π» Primary Responsibilities
- System Design & Architecture: Design and implement scalable, reliable, and efficient systems using AWS services and best practices.
- Monitoring & Alerting: Develop and maintain monitoring and alerting systems to ensure high system availability and performance.
- Incident Management: Lead incident management processes, including on-call rotations and post-mortem analysis.
- Collaboration & Knowledge Sharing: Work closely with development teams to ensure system reliability and performance, and share SRE best practices.
- Automation & Tooling: Develop and maintain automation tools and scripts to streamline SRE processes and workflows.
π Enhancement Note: This role requires a strong background in system design, monitoring, and automation to ensure the reliability and performance of AI systems.
π Skills & Qualifications
Education: Bachelor's degree in Computer Science, Computer Engineering, or a related field.
Experience: 5+ years of experience in SRE, DevOps, or a related role, with a strong focus on AWS services and best practices.
Required Skills:
- Proficient in AWS services and best practices
- Experience with CI/CD pipelines and automation tools (e.g., Jenkins, GitLab CI, CircleCI)
- Strong background in system design and architecture
- Familiarity with monitoring tools (e.g., Datadog, Prometheus, Grafana)
- Experience with incident management and on-call rotations
- Excellent communication and collaboration skills
- Proficiency in Python and/or Typescript
Preferred Skills:
- Experience with Terraform and infrastructure as code (IaC) tools
- Familiarity with Kubernetes and containerization technologies (e.g., Docker, Podman)
- Experience with AI and machine learning technologies
- Knowledge of Agile methodologies and software development best practices
π Enhancement Note: This role requires a strong background in AWS, SRE, and DevOps practices, with a preference for candidates with experience in AI and machine learning technologies.
π Web Portfolio & Project Requirements
Portfolio Essentials:
- Detailed documentation of SRE projects, including system design, monitoring, and automation strategies.
- Case studies demonstrating the impact of SRE solutions on system reliability, performance, and user experience.
- Examples of incident management processes and post-mortem analysis.
Technical Documentation:
- Detailed system design and architecture documentation, including diagrams and flowcharts.
- Monitoring and alerting configuration files and scripts.
- Automation scripts and tools for SRE processes and workflows.
π Enhancement Note: This role requires a strong portfolio demonstrating the candidate's ability to design, implement, and document SRE solutions for complex systems.
π΅ Compensation & Benefits
Salary Range: Competitive salary based on experience and location, with regional adjustments for the Brazilian market.
Benefits:
- Health Plan
- Dental Plan
- Digital Hospital (Vitta)
- Meal Voucher
- Remote Assistance (exclusive for remote positions)
- Flexible Hours
- Education Benefit (Plataforma interna com acesso a diversos livros, podcasts, treinamentos e vΓdeo aulas)
- Gympass
- Childcare Assistance
- Profit Sharing (PLR)
- Life Insurance
- Transportation Voucher (exclusive for on-site positions)
Working Hours: Full-time position with flexible hours and remote work arrangements.
π Enhancement Note: The salary range and benefits package are competitive and tailored to the Brazilian market, with adjustments for experience and location.
π― Team & Company Context
Company Culture: Stone Tech fosters a dynamic, agile environment with a strong focus on collaboration, innovation, and continuous learning. The company values transparency, ownership, and a customer-centric approach.
Team Structure: The GenAI Core team consists of multidisciplinary professionals, including AI specialists, software engineers, and SRE engineers. The team follows Agile methodologies and works in close collaboration with other departments.
Development Methodology: The team employs Agile/Scrum methodologies, with sprint planning, code reviews, and continuous integration and deployment (CI/CD) pipelines.
Company Website: https://www.stone.co/
π Enhancement Note: Stone Tech's culture emphasizes collaboration, innovation, and continuous learning, with a strong focus on customer-centric solutions and Agile methodologies.
π Career & Growth Analysis
Web Technology Career Level: Staff Site Reliability Engineer (GenAI Core) is a senior-level role responsible for leading SRE initiatives and collaborating with cross-functional teams to ensure system reliability and performance.
Reporting Structure: This role reports directly to the Head of SRE or a similar leadership position within the organization.
Technical Impact: The candidate will have a significant impact on the reliability, performance, and scalability of AI systems, ensuring high availability and user experience.
Growth Opportunities:
- Technical Leadership: Develop and mentor junior SRE engineers and contribute to the development of SRE best practices and standards.
- Technical Specialization: Gain expertise in AI and machine learning technologies and their application to SRE.
- Career Progression: Transition to a senior leadership role, such as Head of SRE or a similar position, with increased responsibilities and strategic decision-making.
π Enhancement Note: This role offers significant growth opportunities, including technical leadership, specialization in AI and machine learning, and career progression to senior leadership positions.
π Work Environment
Office Type: Remote-first work environment with occasional in-person meetings and team-building events.
Office Location(s): Stone Tech has offices in SΓ£o Paulo, Brazil, but this role can be performed remotely from anywhere in Brazil.
Workspace Context:
- Remote work arrangements with flexible hours and asynchronous communication.
- Access to remote collaboration tools, such as Slack, Microsoft Teams, and Google Workspace.
- Occasional in-person meetings and team-building events to foster collaboration and camaraderie.
Work Schedule: Flexible work schedule with core hours and asynchronous communication to accommodate different time zones and work preferences.
π Enhancement Note: Stone Tech's remote-first work environment emphasizes flexibility, collaboration, and asynchronous communication to accommodate diverse work preferences and time zones.
π Application & Technical Interview Process
Interview Process:
- Technical Phone Screen: Assessment of AWS, SRE, and DevOps skills, as well as problem-solving and communication abilities.
- On-site Technical Deep Dive: In-depth evaluation of system design, architecture, and automation skills, as well as incident management and monitoring expertise.
- Behavioral Interview: Assessment of cultural fit, collaboration, and leadership skills, with a focus on Stone Tech's core values.
- Final Interview: Review of the candidate's overall fit for the role, with input from key stakeholders.
Portfolio Review Tips:
- Highlight SRE projects that demonstrate system design, monitoring, and automation skills.
- Include case studies that showcase the impact of SRE solutions on system reliability, performance, and user experience.
- Emphasize incident management processes and post-mortem analysis to demonstrate problem-solving and communication abilities.
Technical Challenge Preparation:
- Brush up on AWS services and best practices, with a focus on SRE and DevOps.
- Review system design principles and architecture patterns for distributed systems.
- Prepare for incident management and monitoring scenarios, focusing on problem-solving and communication skills.
ATS Keywords: AWS, SRE, DevOps, System Design, Architecture, Monitoring, Alerting, Incident Management, Automation, CI/CD, Python, Typescript, Terraform, Kubernetes, AI, Machine Learning, Agile, Scrum, Collaboration, Communication, Leadership, Mentoring, Career Progression
π Enhancement Note: The interview process focuses on assessing the candidate's technical skills in SRE, DevOps, and AI, as well as their cultural fit and problem-solving abilities.
π Technology Stack & Web Infrastructure
AWS Services:
- EC2, RDS, and other compute and storage services
- Elastic Load Balancing (ELB) and Application Load Balancer (ALB)
- Elastic Kubernetes Service (EKS) and Elastic Container Registry (ECR)
- CloudFormation and AWS Lambda
- Amazon Route 53 and API Gateway
- AWS Certificate Manager (ACM) and AWS Secrets Manager
- AWS CloudWatch and AWS X-Ray
Monitoring & Alerting Tools:
- Datadog
- Prometheus and Grafana
- ELK Stack (Elasticsearch, Logstash, Kibana)
Automation & CI/CD Tools:
- Jenkins, GitLab CI, or CircleCI
- Terraform and infrastructure as code (IaC) tools
- AWS CodePipeline and AWS CodeBuild
Programming Languages:
- Python
- Typescript
- Bash and PowerShell scripting
π Enhancement Note: This role requires a strong background in AWS services, monitoring tools, and automation technologies to ensure the reliability and performance of AI systems.
π₯ Team Culture & Values
Stone Tech Values:
- Own It: Take ownership and responsibility for your work, and strive for continuous improvement.
- Live the Ride: Embrace challenges and learn from failures, focusing on delivering value and innovation.
- No Bullshit: Communicate openly and honestly, and value transparency and feedback.
- Team Play: Collaborate effectively with cross-functional teams, and prioritize collective success over individual achievements.
- The Reason: Focus on the impact of your work on customers and users, and strive to make a positive difference in their lives.
Collaboration Style:
- Agile and dynamic work environment, with a focus on continuous learning and improvement.
- Cross-functional collaboration with development, design, and product teams.
- Knowledge sharing and mentoring, with a strong emphasis on technical excellence and best practices.
π Enhancement Note: Stone Tech's culture emphasizes collaboration, innovation, and continuous learning, with a strong focus on customer-centric solutions and Agile methodologies.
π Challenges & Growth Opportunities
Technical Challenges:
- Design and implement scalable, reliable, and efficient systems using AWS services and best practices.
- Develop and maintain monitoring and alerting systems to ensure high system availability and performance.
- Lead incident management processes and post-mortem analysis to minimize downtime and impact on users.
- Collaborate with development teams to ensure system reliability and performance, and share SRE best practices.
Learning & Development Opportunities:
- Gain expertise in AI and machine learning technologies and their application to SRE.
- Develop leadership skills and mentor junior SRE engineers.
- Contribute to the development of SRE best practices and standards within the organization.
- Expand your knowledge of emerging technologies and trends in AI, machine learning, and SRE.
π Enhancement Note: This role offers significant technical challenges and growth opportunities, including expertise in AI and machine learning, leadership development, and contributions to SRE best practices and standards.
π‘ Interview Preparation
Technical Questions:
- System Design: Describe your approach to designing scalable, reliable, and efficient systems using AWS services and best practices.
- Monitoring & Alerting: Explain your strategy for developing and maintaining monitoring and alerting systems to ensure high system availability and performance.
- Incident Management: Walk through your process for incident management, including on-call rotations, incident response, and post-mortem analysis.
- Automation & CI/CD: Discuss your experience with automation tools and CI/CD pipelines, and how you've used them to streamline SRE processes and workflows.
Company & Culture Questions:
- Stone Tech Culture: Explain how you align with Stone Tech's core values, and provide examples of how you've demonstrated these values in your previous roles.
- Agile Methodologies: Describe your experience with Agile/Scrum methodologies, and how you've applied them to SRE and DevOps projects.
- Cross-Functional Collaboration: Discuss your experience working with cross-functional teams, and how you've ensured effective communication and collaboration to deliver successful projects.
Portfolio Presentation Strategy:
- SRE Projects: Highlight SRE projects that demonstrate your system design, monitoring, and automation skills, as well as your incident management and problem-solving abilities.
- Case Studies: Include case studies that showcase the impact of your SRE solutions on system reliability, performance, and user experience.
- Technical Documentation: Present detailed system design and architecture documentation, as well as monitoring and alerting configuration files and scripts.
π Enhancement Note: The interview process focuses on assessing the candidate's technical skills in SRE, DevOps, and AI, as well as their cultural fit and problem-solving abilities.
π Application Steps
To apply for this Staff Site Reliability Engineer (GenAI Core) position at Stone Tech:
- Customize Your Resume: Highlight your SRE, DevOps, and AI-related skills and experiences, and tailor your resume to the specific requirements of this role.
- Prepare Your Portfolio: Curate SRE projects that demonstrate your system design, monitoring, and automation skills, as well as your incident management and problem-solving abilities.
- Research Stone Tech: Familiarize yourself with Stone Tech's culture, values, and technology stack, and be prepared to discuss how your skills and experiences align with the company's mission and goals.
- Practice Technical Interview Questions: Brush up on your technical skills in SRE, DevOps, and AI, and practice answering interview questions to ensure you're well-prepared for the assessment.
β οΈ Important Notice: This enhanced job description includes AI-generated insights and web technology industry-standard assumptions. All details should be verified directly with the hiring organization before making application decisions.
Application Requirements
Candidates should have over 5 years of experience in DevOps culture and AWS best practices, along with experience in CI/CD and monitoring applications. Familiarity with distributed software development and database management is also required.