Operations Teams
Operation Teams usually work as a part of a value chain that responds to customer needs, change demand and periods of unexpected intensity to deliver reliable, high-quality outcomes at the lowest cost.

Team purpose

Operation Teams usually work as a part of a value chain that responds to customer needs, change demand and periods of unexpected intensity to deliver reliable, high-quality outcomes at the lowest cost.

Examples of Operations Teams might include:

  • A people operations team that is responsible for onboarding new staff
  • A finance operations team that is responsible for reconciliation
  • A technology operations team that manages one or more IT platforms
  • A field workforce team that installs and repairs machinery

Operations Teams are the custodians of some of the most critical organisational metrics - customer satisfaction, revenue and margin and play a variety of key roles in the value chain:

  • Handling and Fulfilling Customer Orders
  • Billing Customers for Services Provided
  • Providing Support to Customers
  • Workforce Enablement
  • Platform Operations and Production Support for Logical (e.g. Technology) and Physical Inventory to ensure Business Continuity during key business hours (increasingly 24/7)
  • Building Resilience into Platforms and Processes
  • Incident and problem management including common cause incident analysis (CCIA)
  • Ensuring that change to Production Platforms and Processes don’t adversely impact Business Operation
  • Understanding of customer pain points and issues
  • Understanding of cross Platform / Product impacts

Remote Working Challenges

Operations teams are probably the most challenging to lead in a remote context for a variety of reasons. Operations teams are the most likely to:

  • Handle sensitive data
  • Handle customer interactions
  • Need to interact with physical machinery
  • Need to swarm to solve a time critical problem

They are also the teams that conduct the most ‘repetitive’ or ‘procedural’ work during their average work day. It’s relatively easy to motivate a team to solve a problem or create an outcome, it’s quite challenging to motivate a team to manually migrate hundreds of customer records, clean up data, or chase down unpaid invoices.

Why the team is included in remote:af?

  • In short, the Operations/Support/Service teams provide a key function within the value chain. We understand how the Product teams build products to deliver value to the customer, but this is not done in a vacuum.
  • Operational teams provide the customer/platform/product support to ensure that what the Product teams have built is available for customers to consume.
  • They often also provide support services to the Product teams by maintaining internal tools and platforms
  • These teams are also often left out of the delivery process. DevOps has moved some way to remedy this gap, but many companies need to keep operations and delivery separate.
  • Helping bring Operations closer to the delivery stream and hence building better understanding between teams is a key principle of remote:af (Pass things Gently: We don’t throw things over the fence. We build it, we run it, or we make damn sure we’d be happy to run it.)

Team Configuration

Team Structure

remote:af Operations teams are comprised of an Operations Lead and Operators which are usually technical specialists or process/task specialists.

The size of the teams can vary although should not be greater than 15 people.

Leadership

remote:af Operations Leaders manage their teams using a combination of care and empiricism. They are patient with new starters, ensuring that they are clearly instructed and supported. They understand the lumpy / seasonal nature of work and use the slack in the system to implement change. They ensure that their work environments are safe and productive and create efficiency through a focus on key metrics and small, evolutionary improvements. They ensure that their team is prepared for crisis and are prepared to switch modes, taking control during time-critical events and ensuring that the team acts methodically during the response and controls risk.

Funding

Operations teams tend to be funded out of Operational Expenditure (OPEX) and are usually very cost conscious.

One of the interesting nuances of enterprise finance is that most executives are rewarded based on EBITDA performance. Capital expenditure (CAPEX) is recorded as an Asset on the balance sheet instead of an Expense. This means that OPEX is often a lot more tightly controlled than CAPEX. By virtue of this, Operations Teams are constantly searching for efficiencies.

Boundaries

Within the team the boundaries tend to be quite explicit. Tasks are usually executed by an individual or by a set of people with clear roles in the process. Specialisation is common and often critical.

Operations teams have low failure tolerance at their boundaries. Because they are designed to run highly efficiently an error in a process can have significant system consequences. For this reason Definition of Ready and Definition of Done in an Operations value chain are highly explicit and the boundaries between teams are well defined.

Operating Model

Operations Teams tend to operate as links in tightly coupled value chains. In a remote environment the fewer steps in the chain the better but we can be more tolerant of chained systems for operations teams due to the explicit nature of the interface agreements. Operations teams:

  • are most effective when organised around a Value Chain, Process or Function
  • are focussed on efficiency, risk management and resilience.
  • are margin centric
  • seek to build economies of scale
  • are the most likely to impose rigid constraints (method of procedure, work instructions, task automation, etc.)

Team Profile

Operations Teams suit talent that enjoys the rewards of task accomplishment and who can remain calm and act methodically in a crisis. Team members must be highly competent, reliable and risk-aware. Operations Teams ideally suit process, task or technical specialists who excel at structured learning and problem solving.

Professional Development

Development programs for Operations Teams tend to be tightly structured and based around the skills and knowledge required for the job at hand.

Culture

  • Empirical
  • Compliant / Conservative
  • Risk-Aware

Team Launch Pattern

Clarify System of work

  • Visual Work
  • Ops teams have lots of hidden work, visualising in a Kanban helps expose the hidden work
  • The recommendation is to build a board with clear swimlanes that separate the different workflows.
  • Expedite lane for high impacting incidents
  • Centralise/create a backlog
  • Probably existing in a tool such as Service Now/ Remedy etc for production issues and production analysis (CCIA)
  • Potentially Jira/Azura/Rally for Delivery and Routines
  • Customer input from surveys and feedback capability
  • Regulatory from business projects
  • KEY ITEM: Centralise where you can into logical flows and tools. The recommendation is not to have Incidents in Jira if it doesn't make sense

Remote Team Alliance (need to consider the following)

  • Similar to Mission and Product teams (not all are separate roles specifically).
  • But need to include
  • High Severity Incident response
  • Comms lead - who will take the lead on communicating status to the business to protect the team who are resolving the issue
  • Tech Lead - who will lead the investigation
  • Tech Support - Who will support the tech investigation
  • Production Monitoring - Monitoring the rest of the production environment to ensure that the incident isn’t escalating to other systems
  • Rotation of Incident/Routine to Problem analysis to development (if team is responsible for building their own production automation/efficiency initiatives)
  • Support for projects and delivery
  • After Hours support roster and rotation

Establishing Operating cadence based on Team Events

Team Events

Same as the other team types, but things to consider:

Planning/Replenishment

  • Based on Kanban
  • Usually from Production Incidents and Delivery requirements - see replenishment section for details
  • From Analysis and Problems
  • From Delivery
  • From Customer Input
  • From Regulation/Legal
  • Change Demand
  • System Demand
  • Selection Based on Impact/Severity and customer SLAs
  • Note that some incidents will be allowed to run longer to enable greater investigation and analysis

Daily Standup

  • Not a status update.
  • Focus on the next items to pull into progress and trends/patterns of incident work.
  • SLA and project deadlines

Review

  • Celebrate the work the team does
  • often this is missed
  • Use data to show improvements.
  • This many tickets closed,
  • this much reduction in customer impact,
  • This much reduction in downtime because of the automation the team did

Reflection

  • Data metric based.
  • Focused on production trends and SLA trends
  • Prod trends inform what/where to focus capacity towards production improvements
  • SLA trends on bottlenecks and “delivery” improvements

Key Metrics and Benchmarks

What gets measured gets done.

Peter Drucker

Using data in today’s businesses is crucial to evaluate success and gather insights needed for a sustainable company. Identifying what is working and what is not is one of the invaluable management practices that can decrease costs, determine the progress a business is making, and compare it to organizational goals. By establishing clear operational metrics and evaluate performance, companies have the advantage of using what is crucial to stay competitive in the market, and that’s data.

Since every business is different, it is essential to establish specific metrics and OKRs to measure, follow, calculate and evaluate.

When identifying the key metrics you should consider the following parameters

  1. What needs to be measured?
  2. Who will measure the metrics?
  3. What is the time interval between measuring?
  4. How frequently the information is being reported or made available?

Turning the datasets into a business dashboard can effectively track the right values and offer a comprehensive application to the entire business system.

  • SLA: Response Cycle Time
  • Time taken from first received to first response
  • SLA: Restore Cycle Time
  • Time taken from first response to resolved
  • Outage Restore Cycle Time
  • Time Taken from outage detection to resolution
  • Outage Detection Cycle Time
  • Time Taken from customer impact to outage detection
  • System Lead Time (or Cycle Time)
  • Time taken from first received to resolved
  • Failure vs Value Demand
  • Throughput
  • incidents/problems
  • Projects and delivery
  • Other Service Level Expectation / Achievement

Operations

System Lead Time (Cycle Time)

Customer Lead Time

Failure vs Value Demand

Throughput

System / Process Up Time

Service Level Expectation / Result

Outage Time to Resolution

What about DevOps?

The remote:af Operations teaming pattern is effective for Platform Operations teams but has a broader organisational purpose covering customer and enterprise operations.

DevOps is a specific pattern developed for the integration of development and operations teams in technology. DevOps is ideally suited to parts of the organisation where change is a constant but as we move through the product lifecycle then the need for dedicated operational capability emerges. DevOps is not an appropriate pattern for a legacy platform that is due to be decommissioned or a product with a declining revenue stream where margin is the focus.

As a product, platform or service moves through the product lifecycle the Operations characteristics change:

Introduction Product Phase - Operations Characteristics

  • Barebones support systems
  • Limited or no automation
  • Limited or no operational support processes such as incident and problem management practices
  • Small system footprint enables targeted customer base (i.e. friendly customers)
  • Small or no customer support volume
  • Build teams tend to support the product  as a means to learn and develop

Team Structure: Development Approach – Early product lifecycle Development only, looking to determine if we have a product or not. Support is limited bare bones only. Build tend to support as a way to learn and iteratively develop. As the volume of support increases, the capability of this model to meet demand is limited and process development to capture support requirements will need to be developed.

Growth Product Phase - Operations Characteristics

  • Customer support required
  • Customer support and incident management practices and processes in place as a minimum
  • Limited automation
  • Increasing customer bases driving increased support volume
  • Release size and cadence drives support impact
  • Increasing proactive support
  • Operational support improvements are being included in new development

Team Structure: DevOps approach – Customer base requires formal support processes (traditionally L1 and L2 at minimum). Dev Team provides more formal support of the product in parallel to dev requirements. As the customer base grows, support demand also increases.

Recommended use: Late Introduction to a mid Growth phase

Optimised DevOps approach – Product Build & Maintain within a single team. Support Segmented with the team and team members dedicated to supporting on a rolling basis

Maturity Product Phase - Operations Characteristics

  • Peak customer support
  • volume
  • Focus on efficiency and product availability
  • Support focus on reducing customer support volume
  • Increasing automation
  • Increasing problem management and common cause incident analysis
  • Peak Subject Matter expertise
  • Focus on driving cost efficiencies

Team Structure: Development & Operational team approach – Dedicated Dev Team and Ops teams supporting customer volume. Drive for efficiency and reducing the number of support requests through automation and customer self-serve. Support undertaking Incident analysis and Problem analysis

Regarding Hybrid Dev/Ops teams and Dedicated Ops teams’ rather than Dedicated Dev and Ops depends on the perspective we take on DevOps vs SRE vs ITIL/ITSM Level 3 Problem Management, but these hybrid teams exist and are supported by remote:af. These models try to overcome the same problem and we can see SRE as a role and practice that your Problem management Operations team aim to do in ITIL Problem Management. In this way SRE is a highly functioning, highly effective Problem Management Team.

DevOps has people building the product looking for effective improvements to minimise product support. SRE has a dedicated slice of capacity to Operational support tasks (50% as a guide) and the rest spent on operational efficiency and improvement activities (automation, process improvement, root cause resolution), which is that same as a ITIL/ITSM Problem Management role.

Decline Product Phase - Operations Characteristics

  • Customer Volume decreasing
  • Focus and clear drive on cost reduction
  • Decreasing support and development team sizes
  • Reduction in customer support requests
  • Decreasing expenditure n automation initiatives
  • Cost out is the driving principle
  • Impact of outages increasing

Team Structure: Development & Tiered Operational team approach – Customer base and support demand have outgrown single operational team. Layered support can be traditional (iTIL) or split based on customer need/product/location etc. Focus on formal workflow and reducing handoffs between teams

“Remote AF”, “RAF” and associated trade marks are trade marks of Remote Agility Framework Pty Ltd used under licence by Remote AF Co Pty Ltd.
© Remote Agility Framework Pty Ltd. Used under licence by Remote AF Co Pty Ltd.