OracleBeam

Improvements & Ablations on Subspace Hybrid-MVDR (2024/04 - 2024-06)

We revisit hybrid-MVDR beamforming and propose three add-ons—Mask-Hybrid-MVDR, Sub-band Inter-Method PCA, and Inter-Frequency PCA—to reduce musical noise from per-TF beamformer switching while preserving speech and spatial cues. Experiments on the SPEAR/EasyCom smart-glasses array show sub-band Inter-Method PCA consistently outperforms iso-MVDR, hybrid-MVDR, and wide-band subspace baselines in SI-SDR/SDR/PESQ, with clearer spectrograms and lower noise floors.

From left to right: (a) Wide-band PCA; (b) Beamforming;
Subspace Hybrid-MVDR

Background

MVDR minimizes output power with a distortionless constraint $\mathbf{w}^H\mathbf{d}(\theta)=1 $ using a noise covariance $ \mathbf{R} $:

\[\mathbf{w}(f)=\frac{\mathbf{R}^{-1}(f)\mathbf{d}(\theta)}{\mathbf{d}^H(\theta)\mathbf{R}^{-1}(f)\mathbf{d}(\theta)}.\]

Hybrid-MVDR builds a small dictionary of $ \mathbf{R} $ (isotropic, anisotropic diffuse, plane-wave, white), applies each MVDR per TF bin, and selects the minimum-energy output; this denoises aggressively but induces musical noise from rapid switching. Subspace Hybrid-MVDR then applies Inter-Method PCA to (iso, hybrid) outputs to smooth artifacts.

Our Add-ons

From left to right: Subspace Hybrid-MVDR, Mask Hybrid-MVDR, and Frequency Subspace Hybrid-MVDR
  • Mask-Hybrid-MVDR. Treat the hybrid output as a rough speech estimate → derive speech-presence mask → update steering vector and NCM adaptively → run adaptive MVDR to mitigate non-linear musical distortion.
  • Sub-band Inter-Method PCA. Replace wide-band PCA with per-band PCA over $K$ equal-width frequency groups: $ Z=[Z_1,\dots,Z_K],\; Z_i\in\mathbb{C}^{2\times F/K} $. This respects frequency-dependent beam patterns and SNR, improving high-band preservation.
  • Inter-Frequency PCA. Treat the frequency axis as the signal space and beamformer variants as samples $(M{+}1)\times F$, promoting spectral coherence; included as an ablation.

Dataset & Setup

We use SPEAR (EASYCOM subset): meetings with up to 3 concurrent talkers, smart-glasses 6-mic array, plus 10 loudspeakers emitting diffuse noise. Ground-truth DOAs provided from on-device cameras. Metrics: PESQ, SDR, SI-SDR.

From left to right: Evaluations of SDR, SI-SDR, and PESQ among different methods

Key Findings

  • Sub-band PCA > Wide-band PCA. Per-band processing reduces musical noise and retains high-freq speech energy better than wide-band PCA that overfits low-freq energy.
  • Hybrid-MVDR excels at aggressive denoising but needs post-smoothing; PCA modules provide that smoothing.
  • Inter-Frequency PCA lags due to strong cross-frequency noise correlation, limiting separability when frequencies are treated as the feature space.
  • Mask-MVDR can help in principle, but gains are mask-quality limited when the hybrid estimate is not clean enough to seed robust masks.

Spectrograms (Qualitative)

Spectrum comparison (from left to right): (a) noisy speech; (b) isomorphic MVDR; (c) hybrid MVDR; (d) inter-method PCA (Wideband); (d) inter-method PCA (4 subbands)

Takeaways

  • Practical recipe: run a small NCM dictionary (iso + a few anisotropic/plane-wave models) → hybrid min-energy selectionsub-band PCA (e.g., $K{=}8$–16).
  • When to skip: if masks are unreliable or latency is tight, avoid mask-MVDR; prefer sub-band PCA for stable gains.

References

Project report: OracleBeam — Improvement and Ablations on Subspace Hybrid-MVDR. UIUC ECE 513 Final Report.