Staff + Software Engineer - CPU Telemetry and Performance
Markham, Ontario
Staff Software Engineer – CPU Telemetry & Performance Engineering
Markham, ON (in-office)
Overview Our Toronto-based client is building the next generation of high-performance, energy-efficient computing platforms. This role sits at the intersection of silicon, firmware, and operating systems — ideal for senior engineers who thrive in complex, performance-critical environments. You’ll design and optimize custom kernel-level solutions, create powerful telemetry and debugging tools, and collaborate with world-class teams to ensure that no performance is left untapped.
Key Responsibilities
Build and optimize custom Linux kernel drivers that integrate with low-level firmware to expose and control advanced CPU features.
Develop performance and telemetry tooling to capture, analyze, and visualize CPU behavior, enabling deep insights into system efficiency and scalability.
Debug and profile across the silicon–firmware–OS boundary, working directly with performance counters, schedulers, and power management subsystems.
Collaborate with leading hardware and software engineers to push the boundaries of efficiency, performance, and innovation in next-generation datacenter and edge platforms.
Contribute to system bring-up, kernel porting, and board support packages for new CPU architectures.
Leverage open-source tools and communities (perf, ftrace, eBPF, etc.) and contribute improvements upstream.
Investigate and resolve the most challenging performance bottlenecks, spanning compiler output, kernel scheduling, cache/memory, and interconnect.
Drive performance benchmarking methodology and automation for large-scale workloads, from microbenchmarks to real application scenarios.
Provide technical leadership, mentoring, and guidance for cross-functional teams working on hardware bring-up, firmware integration, and OS performance tuning.
Preferred Qualifications
Expertise in Linux kernel internals, including schedulers, memory management, interrupts, and boot flows.
Strong background in device driver development (custom drivers, PCIe, I/O, networking, storage, or accelerators).
Hands-on experience with performance counters and profiling tools (perf, ftrace, eBPF, VTune, oprofile, or custom frameworks).
Familiarity with power and performance management concepts such as DVFS, CPU idle states, clock/power domains, and thermal throttling.
Ability to debug at multiple layers: firmware, kernel, virtualization, and user space applications.
Exposure to SoC bring-up, BSP development, and low-level board initialization.
Proficiency in systems programming languages (C, C++, Python, assembly) with emphasis on writing performant, maintainable, and low-level code.
Knowledge of CPU microarchitecture concepts (pipelines, caches, MMU/virtual memory, coherency, interconnects).
Experience working with or contributing to open-source kernel communities.
Comfort navigating ambiguous performance issues, using telemetry and measurement to drive root-cause analysis.
How to Apply?
All qualified and interested applicants can apply directly to Gord Marriage by sending an email with attached resume to gord.marriage@talentlab.com. You may also apply directly on our website at www.talentlab.com. Although we thank all applicants for their interest, only those in consideration will be contacted.