Position: Site Reliability Engineer

Location : Santa Clara, CA

Duration : 1 Year +

Customer  : Ericsson thru ITC InfoTech

Job Description:

Responsibilities:

  • Development and Operations (DevOps) subject matter expert for 24×7 SaaS operation
  • Work hand-in-hand with micro-service software developers, architects, and field integration resources to architect and deliver Ericsson’s next generation TV platforms.
  • Contribute to the development of new tools and automation that ensures the service can be optimized and tuned with minimal human intervention.
  • Accountable for working upstream with micro service developers on monitoring, tools and architecture to deliver security, reliability, manageability and availability at scale
  • Point of escalation/decision maker on response level of incidents
  • Participate in the Core SRE on-call roster and respond with command and control incident management during High Pri Events while maintaining internal and external SLAs
  • Act as Technical Duty Officer who leads resolution effort of the most complex service problems from network layer to the application at scale
  • Drive Problem Management/Retrospectives (“post mortems”)
  • Strong contribution and maintenance of our knowledge base
  • Analyze trends and make recommendations in the areas of monitoring, incident and change management, cloud orchestration and support.
  • Contribute to the future growth of the team by conducting candidate screenings and assessments
  • Accountable for deploying services to production environments

Technologies:

  • Experience with Docker and SaltStack, Kubernetes orchestration tools, etc.
  • Knowledge of MongoDB, Cassandra databases, Kafka, IIS Servers on Azure/AWS/Openstack
  • Azure, Openstack and AWS concepts and APIs
  • Experience designing, setting up and maintaining, refining (noise reduction, auditing) monitoring tools such as Prometheus, Prometheus exporters, Kibana, Grafana, Alertmanager, etc
  • Demonstrable experience in one or more languages: Powershell, Python, BASH, C#, .NET
  • Strong knowledge of TCP/IP networking, DNS, VPNs, HTTP, load-balancers (such as NGINX), highly available microservice architecture, CDNs
  • Team Foundation Server/Visual Studio, Atlassian suite (Jira, Confluence), Git
  • Network analysis, performance and application issues using tcpdump, Fiddler and Wireshark.

Qualifications:

  • Bachelor’s Degree in CS, MIS, or equivalent experience
  • 5+ years of relevant experience with Windows/Unix systems fundamentals, monitoring, cloud services, networking, storage, database, and application knowledge;
  • Solid communications skills both written and verbal. Able to effectively tailor messaging to different audiences: External Customer, Leadership, technical SME, or to Tier-1
  • Previous experience in customer facing roles during high stress situations
  • Demonstrated skills as an influencer within a previous organization
  • In-depth knowledge of IT concepts, strategies, and methodologies; Agile knowledge a plus
  • In-depth knowledge of business operations, objectives, and strategies..
  • Familiarity with Containers (e.g. Docker, RKT) and IaaS (e.g. AWS, Azure, Openstack).
Close Menu