Problem Specialist

The CIO group is responsible for all Network, Telecommunications, Infrastructure, Systems, Security and Support services globally. This group of highly skilled and technical individuals plays a vital role in ensuring business success through product and service management, provisioning, and uptime of these vital services.
You will work for the Service Management Office (SMO), the team in charge of the Production Operation Support. As an online (always-on) business, any service downtime has a direct and measurable impact on turnover and revenue; therefore, all CIO functions are 24×7 and work together to safeguard the business’s products and services, ensuring uptime is maintained.

As SMO Problem Specialist, you will be expected to ensure that standardized methods and procedures are used for efficient and prompt handling of all problem and known issues in order to minimize the impact of any related Incidents upon service.

The role includes responsibility for the investigation of ongoing problems, dissemination of workarounds and resolution advice leading to timely closure.

Day to day responsibilities run around reactive and proactive problem records. Reactive problem management requires your immediate involvement after service-impacting incidents and ensuring the root cause analysis is carried out in a structured and methodical way and that all remediation activities identified are completed in a timely manner, overall reducing the risk to the organizations technology systems. Under proactive problem management you will be focused on analysis and trends, creating proactive initiatives across the technology environments and teams to drive improvement initiatives which are result-driven for company business.

Responsibilities

– Ensure investigation of underlying causes of real or potential anomalies in IT services internally and with Vendors and Partners
– Access systems to get logs, review monitor graphs, analyse the information and correlate the events from different systems to formulate theories about incident root causes
– Identify and address key areas that are not properly monitored and those areas that are single point of failure
– Work with other IT groups and business stakeholders to define or recommend possible solutions or strategies and initiate corresponding requests for change (RFC) needed to re-establish quality of service
– Suggest and document potential workarounds for problems
– Reduce the number of incidents being escalated to 3rd Tier support through improved process, procedures knowledge and skills
– Facilitate resolution for escalated incidents and engage the necessary technical support
– Ensure the reliability and availability of systems and services to meet agreed service levels
– Keep the problem records updated

Technical knowledge and experience

– Knowledge in virtualized systems
– Good understanding of Apache configurations (ie. 301, wrapper scripts,)
– Knowledge of troubleshooting networking issues, MPLS, VPN, LAN, WAN, Load Balancers, etc.
– Basic programming experience – an understanding of code, and coding concepts
– Experience with log tools (ie. Splunk, graylog, etc)
– Experience with monitoring tools and used to interpret the values and suggest areas for monitoring
– Ability to solve problems and take prompt decisive action
– Experience with Windows administration and investigation (ie. Event log, Services, etc.)
– ITIL knowledge

Job Category:  SMO
Job Type: Full Time
Job Location: Remote

Menu