ACCENTURE @ META
Network Infrastructure Engineer / SRE
AUG 2023 — PRESENT
// Production Reliability & Incident Response
- Owned and resolved 16 production incidents (P1/P2) affecting 1,000+ users; maintained availability for executive demos and critical business operations
- Completed 2,100+ oncall hours across 24/7 and business-hours rotations for AWS tooling and production infrastructure
- Mitigated packet loss incidents and recovered Kubernetes control plane by remediating failed nodes during network outages
- Coordinated cross-functional incident response with networking, security, and infra teams to minimize MTTR during critical outages
// Monitoring, Observability & SLO/SLI
- Built SLO/SLI dashboards tracking HTTP, ICMP, SSH health, connection latency, packet loss, and throughput across 8+ global data centers
- Integrated time-series metrics for real-time service health visualization with historical trend analysis
- Tuned alerting thresholds and severity classifications to reduce false positives; improved signal-to-noise ratio for oncall alerts
- Built network probe heatmap visualizations with async data fetching from time-series databases
// Python & Distributed Systems
- Built Python pipelines processing hardware inventory across 10,000+ devices — extracting and categorizing CPU/GPU/memory specs from distributed databases
- Built 43+ SQL analytics queries for hardware resource analysis and device fault diagnostics across petabyte-scale datasets
- Authored 24 Jupyter-style notebooks for automated device categorization, network monitoring, and data quality analysis
- Migrated service APIs from legacy Thrift to thrift-python, enhancing parameter handling for telemetry and inventory updates
// Infrastructure Automation & Linux
- Managed Kubernetes cluster operations: control plane recovery, node lifecycle management, failover handling during network incidents
- Deployed DMZ architecture at London and Ashburn data centers with reverse proxies, F5 load balancers, and MetalLB endpoints
- Automated Linux server configurations via Chef — sysctl tuning, cron job scheduling, and log forwarding pipelines
- Created VLAN validation automation tools for configuration verification via MAC address, hostname, and serial number lookups
// MR & XR Development — Project Holostage
- Implemented CapturyReplay motion capture software and hardware in Meta Workrooms and Horizons VR platform; tracked and matched Captury avatar movement, translating the Meta rig to use the same parent with existing third-party rig support
- Launched Azure infrastructure containing Azure OpenAI endpoints enabling Meta researchers to access LLM tooling within the research environment