Description

CUJO AI is the leading provider of artificial intelligence solutions for network service providers. We use machine learning and real-world data to develop and deliver cutting-edge cybersecurity, device intelligence, and parental controls that enable network operators to offer better and safer connected experiences to millions of households.

We are seeking a highly skilled Ops engineer with software orientation expertise and proven production experience managing large-scale systems. The ideal candidate will have a background in backend development, particularly in Python, and the ability to troubleshoot and resolve application bugs efficiently.

This role requires a proactive individual who can work collaboratively with cross-functional teams to ensure the reliability, performance, and scalability of our production environments. Additionally, the candidate should be an expert in AWS cloud, monitoring methods, and alerting systems, with knowledge of Tier 2 technical support.

 

Your responsibilities will be

  • Implement and manage SRE principles and best practices to enhance system reliability, availability, and performance.
  • Develop and maintain monitoring, alerting, and incident response systems to detect and resolve issues proactively.
  • Conduct root cause analysis (RCA) for incidents and implement corrective and preventive measures.
  • Manage and optimize large-scale production systems, ensuring high availability and scalability.
  • Troubleshoot and resolve application bugs, working closely with development teams.
  • Develop and maintain backend components using Python and other relevant technologies.
  • Ensure code quality and performance through code reviews, testing, and optimization.
  • Implement and manage comprehensive monitoring and alerting solutions to ensure system health and performance.
  • Use tools such as Prometheus, Grafana, ELK stack, and CloudWatch to track metrics and respond to incidents.
  •  Develop automated responses to common alerts and incidents to minimize downtime.
  • Work closely with development, QA, and operations teams to ensure seamless integration and delivery of software releases.
  • Communicate effectively with stakeholders to understand requirements and provide technical guidance.
  • Document processes, procedures, and technical specifications to maintain a knowledge base.
  • Diagnose and resolve intricate infrastructure and application issues.
  • Contribute to the development and enforcement of operational policies and procedures.
  • Oversee and maintain monitoring, logging, and alerting systems.
  • Continuously seek opportunities to optimize the production environment, focusing on both infrastructure and application code.
  • Identify system weaknesses, fix bugs, and improve system latency, leading to cost reduction.
  • Ensure the serviceability, monitoring, and maintainability of the production environment, which is central to the Ops Engineer role

 

Preferred competencies

  • Excellent problem-solving skills and attention to detail.
  • Strong communication and collaboration skills.
  • Knowledge of Tier 2 technical support processes and best practices.
  • Strong proficiency in backend development using Python.
  • Expert knowledge of AWS cloud services and infrastructure.
  • Experience with monitoring and observability tools (e.g., Prometheus, Grafana, ELK stack, CloudWatch).
  • Proven experience as a Site Reliability Engineer (SRE) or in a similar role managing large-scale production systems.
  • Team player, self-learner and delivery-oriented

 

Benefits and Perks

  • You will have ability to work flexible hours and choose your preferred location – home office or CUJO AI office (in Kaunas or Vilnius)
  • Modern development equipment
  • Opportunity to learn from highly skilled colleagues
  • Ambitious projects and meaningful cause
  • Team Building and company events
  • Conferences, training, books – anything for your development
  • 100 hours/year for training during paid business hours
  • Multiple Bonus systems, as Performance, AWS Certifications, Inventions and other
  • Benefits package that includes Lunch in the office and Wolt coupons every month, Recreational, Health insurance benefits and more!
All positions