RTL DesignHardware Verification

Silicon Systems& Architecture

RTL design and verification on FPGA. Aiming for graduate research in ML-accelerator microarchitecture.

Begin
Silicon architecture macro
Fig. 01 — Cover image
00 / About

Designing digital RTL,end-to-end.

I'm a Computer Engineering undergrad at McMaster. Most of my work is digital RTL and hardware verification on FPGA, with a focus on ML hardware accelerators. The most recent piece I've finished is an 8×8 INT8 systolic MAC array. It verifies bit-exact against a NumPy reference and closes timing at 100 MHz on Artix-7.

After undergrad I want to do research on ML-accelerator microarchitecture. It sits between the algorithm choices and the silicon you can actually build, and that interface is the part of the field I find most interesting.

RTL DesignSystemVerilog / Verilog
ArchitectureML Accelerators / Systolic Arrays
Verificationcocotb / Verilator
ImplementationFPGA / Timing Closure
01 / Projects

Selected Work

INT8 Systolic MAC Array
01
FPGA / Architecture

INT8 Systolic MAC Array

Parameterized 8×8 output-stationary INT8/INT32 systolic MAC array in SystemVerilog. Designed for transformer Q/K/V/O and FFN matmuls. Closes timing at 100 MHz on Artix-7 with +3.76 ns slack, 64 DSP48E1 slices, peak 12.8 GOPS.

SystemVerilogVivadococotbVerilator
Real-Time FM Software-Defined Radio
02
DSP / Embedded

Real-Time FM Software-Defined Radio

Real-time FM SDR on Raspberry Pi 4. Recovers mono audio, stereo audio, and RDS metadata from RF input through a three-thread producer-consumer pipeline with polyphase resampling. The polyphase rewrite was a 1.4× speedup over the naive version. Holds real-time at 600 MHz, 101 taps, no underruns over five minutes.

C++17PythonDSPRaspberry Pi
RTL Image Decompression Pipeline
03
FPGA / RTL

RTL Image Decompression Pipeline

JPEG-style FPGA image decoder on the Altera DE1-SoC at 50 MHz. Around 2,600 lines of SystemVerilog. The pipeline does chroma upsampling, then YCbCr→RGB, then a 2-D inverse DCT. Four hardware-multiplexed multipliers feed six outputs per pixel pair, and a dual-port RAM hides the IDCT transpose.

SystemVerilogQuartusModelSimDE1-SoC
02 / Career

Experience

Contributing Writer

Tellura/Hamilton, ON
Sep 2025 — Apr 2026
  • Wrote articles on AI, healthcare, and new medical tech for a youth-focused publication.
  • Took dense technical material and rewrote it for readers who weren't trained engineers.
  • Most of the actual writing got better in the editorial back-and-forth, not in the first draft.

Software Developer Intern

Bank of Montreal (BMO)/Toronto, ON
May 2025 — Aug 2025
  • Built and tested backend microservices. The brief was correctness over speed, with careful attention to what happens when a service partially fails.
  • Wrote the automated test scaffolding for a couple of distribution services. Coverage went from the low 60s to above 80%, mostly by catching the edge cases the original tests skipped.
  • Did a lot of debugging across services where the failure was rarely in the obvious place. Made me much more patient about reading logs.

Technical Assistant

Down Syndrome Association of Hamilton/Hamilton, ON
Sep 2023
  • Ran activities for attendees throughout the day.
  • Captioned the live feed so the event was accessible to people who couldn't hear it.