Humberger Nav
mployee.me logo
Senior Consultant - SRE Architect
Incedo Inc.
linkedin
Austin, TX
5-10 years
Not Disclosed
Full time
15 April 2026
Top Skills:
AiArchitectureAutomationAwsAzureBankingCloudCollectionContinuous ImprovementData TransformationDatadogDistributed SystemDynatraceElkEnterpriseGcpGoGovernanceGrafanaIncident ManagementIncident ResponseJavaLatinMicroservicesMicroservices ArchitectureMonitoring SystemPerformance MetricPerformance OptimizationPythonReportingRoot Cause AnalysisScriptingService LevelSplunkSystem DesignTechnical DirectionTelemetryTopologyTraceabilityVisualizationWealth Management

96

Get Personalized Job Matches with 1 Click

Job Description iconJob Description
Download Resume iconDownload Resume

Position: Senior Consultant - SRE Architect (Observability & Transaction Reliability)

Location: Austin, TX

Type: Full-Time


About the company:

Incedo is a global AI and data transformation specialist empowering companies to realize sustainable business impact from their digital investments by delivering ROI from AI@Scale.


As a long-term partner for strategy to execution, we operate at the intersection of business and technology. Our integrated services and platforms are built on the foundation of AI & Data, digital engineering, and operations transformation, bringing deep domain expertise and full stack capabilities together.


With over 4,000 people in the US, Canada, Latin America and India and a large, diverse portfolio of Fortune 500 enterprises and fast-growing clients worldwide, we work across banking & payments, wealth management, telecom, hi-tech and life sciences.


Job Overview

We are seeking a highly experienced Senior Consultant / SRE Architect to lead the strategy, design, and implementation of enterprise-wide observability and reliability frameworks supporting business-critical transaction flows across distributed systems.


In this role, you will act as a thought leader and architect, driving end-to-end transaction visibility, resilience, and performance optimization across microservices, APIs, databases, and third-party integrations. You will partner with engineering, architecture, and business stakeholders to define standards, influence technical direction, and implement scalable observability solutions.


This is a high-impact role focused on transforming SRE maturity, improving advisor experience, and enabling proactive, data-driven operations through modern observability practices. The ideal candidate is passionate about SRE, observability, and system design, with a proven ability to drive large-scale transformation initiatives.


Required Qualifications

  • 10+ years of experience in SRE, Observability, or related roles, with a strong focus on architecture and strategy
  • Deep hands-on expertise with observability platforms such as Dynatrace, ELK, Datadog, Splunk, OpenTelemetry, Jaeger
  • Proven experience designing observability solutions in cloud environments (AWS, Azure, GCP)
  • Strong understanding of microservices architecture, APIs, and distributed systems
  • Proficiency in programming/scripting (e.g., Python, Go, Java) for automation and integration
  • Demonstrated ability to lead cross-functional initiatives and influence technical direction


Preferred Qualifications

  • Dynatrace Associate or Professional Certification
  • Experience implementing OpenTelemetry standards at scale
  • Strong background in chaos engineering and resiliency testing
  • Familiarity with AIOps platforms and intelligent automation solutions
  • Consulting experience or prior role as an architect / technical advisor


Key Responsibilities

Observability Strategy & Architecture

  • Define and lead the enterprise observability strategy for end-to-end transaction traceability across distributed systems
  • Architect scalable solutions leveraging tools such as Dynatrace, OpenTelemetry, ELK, Grafana, Datadog, Splunk, Jaeger
  • Establish standardized frameworks for logging, metrics, tracing, and telemetry collection
  • Design and implement dependency mapping and service topology visualization across complex ecosystems


Performance Engineering & Optimization

  • Provide architectural guidance for monitoring latency, throughput, and error rates across critical transaction paths
  • Lead root cause analysis using distributed tracing and telemetry data to resolve systemic performance issues
  • Partner with application and database teams to optimize system performance and scalability
  • Drive adoption of performance engineering best practices across teams


Resiliency & Reliability Engineering

  • Define and implement resiliency strategies for business-critical transaction flows
  • Architect fault-tolerant systems, including failover, redundancy, and self-healing mechanisms
  • Lead and design chaos engineering initiatives to validate system resilience
  • Establish and govern Service Level Objectives (SLOs) and Service Level Indicators (SLIs) aligned to business outcomes


Consulting, Governance & Leadership

  • Act as a trusted advisor to engineering teams, architects, and leadership on observability and SRE best practices
  • Define and enforce standards, policies, and governance models for monitoring and tracing
  • Lead cross-functional initiatives to drive adoption of observability frameworks
  • Mentor engineers and SRE teams, fostering a culture of continuous improvement and operational excellence


Operational Excellence & Outcomes

  • Drive measurable improvements including:
  • 30% reduction in MTTD and MTTR within the first year
  • ≥70% root cause identification within 1 hour
  • ≥90% proactive issue detection via monitoring systems
  • Develop executive-level reporting on system health, reliability trends, and performance metrics
  • Build reusable frameworks, accelerators, and playbooks for incident management and observability adoption


Documentation & Knowledge Enablement

  • Establish comprehensive documentation for transaction flows, system dependencies, and observability architectures
  • Develop and standardize incident response playbooks and runbooks
  • Lead training and enablement initiatives to scale observability expertise across teams