Position: Site Reliability Engineer
Location : Santa Clara, CA
Duration : 1 Year +
Customer : Ericsson thru ITC InfoTech
- Development and Operations (DevOps) subject matter expert for 24×7 SaaS operation
- Work hand-in-hand with micro-service software developers, architects, and field integration resources to architect and deliver Ericsson’s next generation TV platforms.
- Contribute to the development of new tools and automation that ensures the service can be optimized and tuned with minimal human intervention.
- Accountable for working upstream with micro service developers on monitoring, tools and architecture to deliver security, reliability, manageability and availability at scale
- Point of escalation/decision maker on response level of incidents
- Participate in the Core SRE on-call roster and respond with command and control incident management during High Pri Events while maintaining internal and external SLAs
- Act as Technical Duty Officer who leads resolution effort of the most complex service problems from network layer to the application at scale
- Drive Problem Management/Retrospectives (“post mortems”)
- Strong contribution and maintenance of our knowledge base
- Analyze trends and make recommendations in the areas of monitoring, incident and change management, cloud orchestration and support.
- Contribute to the future growth of the team by conducting candidate screenings and assessments
- Accountable for deploying services to production environments
- Experience with Docker and SaltStack, Kubernetes orchestration tools, etc.
- Knowledge of MongoDB, Cassandra databases, Kafka, IIS Servers on Azure/AWS/Openstack
- Azure, Openstack and AWS concepts and APIs
- Experience designing, setting up and maintaining, refining (noise reduction, auditing) monitoring tools such as Prometheus, Prometheus exporters, Kibana, Grafana, Alertmanager, etc
- Demonstrable experience in one or more languages: Powershell, Python, BASH, C#, .NET
- Strong knowledge of TCP/IP networking, DNS, VPNs, HTTP, load-balancers (such as NGINX), highly available microservice architecture, CDNs
- Team Foundation Server/Visual Studio, Atlassian suite (Jira, Confluence), Git
- Network analysis, performance and application issues using tcpdump, Fiddler and Wireshark.
- Bachelor’s Degree in CS, MIS, or equivalent experience
- 5+ years of relevant experience with Windows/Unix systems fundamentals, monitoring, cloud services, networking, storage, database, and application knowledge;
- Solid communications skills both written and verbal. Able to effectively tailor messaging to different audiences: External Customer, Leadership, technical SME, or to Tier-1
- Previous experience in customer facing roles during high stress situations
- Demonstrated skills as an influencer within a previous organization
- In-depth knowledge of IT concepts, strategies, and methodologies; Agile knowledge a plus
- In-depth knowledge of business operations, objectives, and strategies..
- Familiarity with Containers (e.g. Docker, RKT) and IaaS (e.g. AWS, Azure, Openstack).