Vista de conjunto
The ins and outs:
•Act as incident and crisis manager when our sites experience a critical problem. Orchestrate the efforts of multiple teams to resolve time-critical situations.
•Incident commanders for crisis situations, tasked with leading and directing relief and resolution whilst sharing this information with other departments and stakeholders.
•Accountable for key activities during an incident or crisis, including ownership of the technical control bridge and communications with stakeholder functions.
•Accountable for rapid identification and recovery from revenue impacting outages and incidents.
•Work with Problem Managers and Site Reliability Engineers after resolution to determine root causes and long term fixes to prevent future incidents
•Ensure all documentation surrounding major incidents is accurate and communication with Operations, Engineering, management and the executive leadership team is clear and concise.
•Shift based schedule; 5 days per week, 8 hours per day.
What you’ll need:
•A proven track record in either Software Development, Network Engineering, System Engineering or System Administration with a degree in computer science or equivalent/ work equivalent
•A proven track record with enterprise scalable ecommerce applications or mobile software
•Experience working in a 24/7 Network Operations Centre environment on a global scale
•Strong understanding of ‘command and control’ procedures
•Understanding of and/or experience with CDNs such as Akamai, Limelight Networks, Amazon Cloudfront, etc.
•Experience with any of the following technologies: Java development, Unix, Linux, Redhat, Fedora, networking, other systems, etc.
•Excellent communication skills with the ability to urgently motivate NOC technicians, remote engineers across multiple functions and locations and other stakeholders.
•Excellent verbal and written communication skills.
•Knowledge of and/or experience with cloud technologies i.e. Openstack, Elastic Cloud, EC2, AWS, etc.
•Ability to think on your feet and work calmly under pressure.
•Ability to coordinate multiple groups and systems while communicating clear and concise status reports to senior management.
Acerca de Walmart
Through the innovative fusion of retail, social and mobile, WalmartLabs is redefining Commerce for the largest retailer worldwide. We are a group comprised of the brightest technologists and business people in the industry, excited about the limitless opportunities that this next generation of commerce will bring to billions of people around the globe, all in an effort to help them save money and live better. As the idea incubator for the world's largest global retailer, we don't just build products, we create experiences. Every day is an opportunity to reshape the landscape of ecommerce while having a lasting impact on the industry. @WalmartLabs taps into the talents of online retail visionaries to design, prototype and build technology-fueled products that bridge the gap between what's next and what's best. WalmartLabs brings together dozens of engineers, scientists and product experts to execute a cutting edge vision. If you're fluent in the language of innovation, WalmartLabs is a place to become a leading change agent. Join us! Walmart eCommerce sites handle hundreds of thousands of visitors, and millions of transactions per day so when something stops working a significant amount of money could be lost. When an issue is detected, the Major Incident Manager/ Technical Duty Officer (TDO) has to act immediately and the clock starts ticking. If an issue breaks out it needs to be resolved quickly. Our TDO’s have a broad range of experience from Network Operations to Software Development and, with the development of our new eCommerce platform, we are looking for an engineer with cloud experience and a great sense of urgency to resolve problems.