Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Rethinking Non-Negative Matrix Factorization with Implicit Neural Representations (2404.04439v1)

Published 5 Apr 2024 in eess.AS, cs.LG, and cs.SD

Abstract: Non-negative Matrix Factorization (NMF) is a powerful technique for analyzing regularly-sampled data, i.e., data that can be stored in a matrix. For audio, this has led to numerous applications using time-frequency (TF) representations like the Short-Time Fourier Transform. However extending these applications to irregularly-spaced TF representations, like the Constant-Q transform, wavelets, or sinusoidal analysis models, has not been possible since these representations cannot be directly stored in matrix form. In this paper, we formulate NMF in terms of continuous functions (instead of fixed vectors) and show that NMF can be extended to a wider variety of signal classes that need not be regularly sampled.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (21)
  1. P. Paatero and U. Tapper, “Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values,” Environmetrics, vol. 5, no. 2, pp. 111–126, 1994.
  2. D. D. Lee and H. S. Seung, “Learning the parts of objects by non-negative matrix factorization,” Nature, vol. 401, no. 6755, pp. 788–791, 1999.
  3. P. Smaragdis and J. C. Brown, “Non-negative matrix factorization for polyphonic music transcription,” in 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No. 03TH8684).   IEEE, 2003, pp. 177–180.
  4. T. Virtanen, “Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria,” IEEE transactions on audio, speech, and language processing, vol. 15, no. 3, pp. 1066–1074, 2007.
  5. D. Rudoy, P. Basu, T. F. Quatieri, B. Dunn, and P. J. Wolfe, “Adaptive short-time analysis-synthesis for speech enhancement,” in 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.   IEEE, 2008, pp. 4905–4908.
  6. A. Zhao, K. Subramani, and P. Smaragdis, “Optimizing short-time Fourier transform parameters via gradient descent,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).   IEEE, 2021, pp. 736–740.
  7. J. C. Brown, “Calculation of a constant Q spectral transform,” The Journal of the Acoustical Society of America, vol. 89, no. 1, pp. 425–434, 1991.
  8. P. Balazs, M. Dörfler, F. Jaillet, N. Holighaus, and G. Velasco, “Theory, implementation and applications of nonstationary Gabor frames,” Journal of computational and applied mathematics, vol. 236, no. 6, pp. 1481–1496, 2011.
  9. G. Tzanetakis, G. Essl, and P. Cook, “Audio analysis using the discrete wavelet transform,” in Proc. conf. in acoustics and music theory applications, vol. 66.   Citeseer, 2001.
  10. K. Subramani and P. Smaragdis, “Point cloud audio processing,” in 2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).   IEEE, 2021, pp. 31–35.
  11. P. Smaragdis and M. Kim, “Non-negative matrix factorization for irregularly-spaced transforms,” in 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.   IEEE, 2013, pp. 1–4.
  12. P. Flandrin, F. Auger, and E. Chassande-Mottin, “Time-frequency reassignment: from principles to algorithms,” in Applications in time-frequency signal processing.   CRC Press, 2018, pp. 179–204.
  13. M. Tancik, P. Srinivasan, B. Mildenhall, S. Fridovich-Keil, N. Raghavan, U. Singhal, R. Ramamoorthi, J. Barron, and R. Ng, “Fourier features let networks learn high frequency functions in low dimensional domains,” Advances in Neural Information Processing Systems, vol. 33, pp. 7537–7547, 2020.
  14. V. Sitzmann, J. Martel, A. Bergman, D. Lindell, and G. Wetzstein, “Implicit neural representations with periodic activation functions,” Advances in neural information processing systems, vol. 33, pp. 7462–7473, 2020.
  15. B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,” Communications of the ACM, vol. 65, no. 1, pp. 99–106, 2021.
  16. D. Lee and H. S. Seung, “Algorithms for non-negative matrix factorization,” Advances in neural information processing systems, vol. 13, 2000.
  17. X. Serra, “Musical sound modeling with sinusoids plus noise,” in Musical signal processing.   Routledge, 2013, pp. 91–122.
  18. J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S. Pallett, N. L. Dahlgren, and V. Zue, “TIMIT acoustic-phonetic continuous speech corpus,” Philadelphia: Linguistic Data Consortium, 1993.
  19. P. Smaragdis and S. Venkataramani, “A neural network alternative to non-negative audio models,” in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).   IEEE, 2017, pp. 86–90.
  20. C. Févotte, R. Gribonval, and E. Vincent, “BSS_eval toolbox user guide–revision 2.0,” [Technical Report] 2005, pp.19. ffinria-00564760, 2005.
  21. C. H. Taal, R. C. Hendriks, R. Heusdens, and J. Jensen, “An algorithm for intelligibility prediction of time–frequency weighted noisy speech,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 7, pp. 2125–2136, 2011.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Krishna Subramani (9 papers)
  2. Paris Smaragdis (60 papers)
  3. Takuya Higuchi (26 papers)
  4. Mehrez Souden (4 papers)

Summary

We haven't generated a summary for this paper yet.