
Justin Sapun,
justin.sapun.th@dartmouth.edu
Arun Guruswamy, Arun.Guruswamy.th@dartmouth.edu
This project implements a realtime audio analysis and video synthesis system using the Zybo Z7-10 FPGA. Audio is streamed from either the onboard I2S codec or a Direct Digital Synthesis (DDS) module, then processed using a Fast Fourier Transform (FFT) pipeline to extract frequency-domain features. These features are used to drive HDMI video effects, enabling dynamic visual output that responds to audio input. The design is built entirely with AXI-Stream and AXI-Lite interfaces, using Xilinx IP and custom VHDL components.
Design Components
-
Zybo Z7-10 FPGA development board
-
I2S Audio Codec (ADAU1761)
-
Direct Digital Synthesis (DDS) module
-
FFT core for real time spectral analysis
-
AXI4-Stream and AXI4-Lite interfaces
-
HDMI timing and video output modules
-
Onboard switches and buttons for mode control
-
Custom VHDL logic and AXI wrappers
Implementation
We structured the system around modular VHDL components, each with isolated responsibilities for audio buffering, FFT analysis, and video transformation. Careful coordination between the audio and video clock domains was key to maintaining synchronization. Debugging involved both simulation and hardware tools like the ILA, which helped uncover subtle issues like timing polarity mismatches and spectral leakage early in the integration process.

Video Generation
To generate visuals on the HDMI display, we built a pixel generator synchronized with a Video Timing Controller and pixel clock. This module dynamically created scenery including a moving square by tracking the current x and y pixel coordinates. Color was applied to pixels falling within the bounds of the square, while other areas remained black. A simple FSM updated the square's position each frame, and we layered in basic background elements like a tree and grass to test spatial rendering. This provided a foundation for later transformation based on audio input.

Video Transform
This part of the system handles realtime audio driven video effects. Audio samples from either I2S or DDS
are streamed into a custom AXI FIFO buffer (`axis_fifo.vhd`), where 512 samples are collected at 48kHz and
windowed using a Hanning function. These samples are sent in bursts to a pipelined FFT IP core operating at
100MHz. The FFT output is parsed by `fft_axi_rx.vhd`, which calculates the magnitude squared of each bin and
identifies the peak frequency. This bin index is sent to the `rgb_transform.vhd` module, which scans each
pixel and modifies the color of designated moving objects based on the dominant frequency. A preloaded
colormap stored in BRAM links frequency bins to RGB values, enabling dynamic, audio responsive visuals on
the HDMI output.
The paper design to the left is only the logic for the pixel change in video_transform.vhd post FFT
processing.

System Integration
All major components were combined into a single top-level design to complete the full audio-to-video processing pipeline. The final block diagram includes the I2S/DDS audio input, AXI FIFO for buffering, FFT IP core, frequency analysis logic, and video generation and transformation modules. Control inputs like mute and source select were wired through AXI-Lite, and synchronization between domains was managed using separate audio and video clocks. This fully integrated system enabled realtime HDMI output that reacts dynamically to audio signals in hardware.

System Verification
Throughout development, we created individual testbenches for each module to verify functionality in isolation. This final testbench simulated the complete integrated system, including audio input, FFT processing, and video transformation. Due to the slow 48kHz audio clock, simulation runtime was long, but the testbench successfully validated end-to-end behavior and allowed us to proceed confidently with hardware testing.
Demo
In the demo video, we play a YouTube sweep tone from 0Hz to 20kHz. The system tracks the dominant frequency in realtime, changing the color of the moving square on screen as the detected peak bin shifts with the audio.
Considerations
While the system architecture worked as intended, the FFT output introduced unexpected challenges during implementation. Spectral leakage caused instability in peak frequency detection, leading to rapid color shifts on screen. To address this, we applied a Hanning window to the buffered audio samples before FFT processing. This reduced but did not fully eliminate the effect, highlighting the nuance of realtime signal analysis in hardware.
Skills
We gained experience in VHDL design, AXI-Stream and AXI-Lite protocols, FFT-based audio processing, and HDMI video synchronization. The project involved clock domain management, simulation with custom testbenches, and debugging using Vivado and the Integrated Logic Analyzer (ILA). We also applied windowing techniques to reduce spectral leakage in real-time signal analysis.