Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Sample-level CNN Architectures for Music Auto-tagging Using Raw Waveforms (1710.10451v2)

Published 28 Oct 2017 in cs.SD, cs.LG, cs.MM, cs.NE, and eess.AS

Abstract: Recent work has shown that the end-to-end approach using convolutional neural network (CNN) is effective in various types of machine learning tasks. For audio signals, the approach takes raw waveforms as input using an 1-D convolution layer. In this paper, we improve the 1-D CNN architecture for music auto-tagging by adopting building blocks from state-of-the-art image classification models, ResNets and SENets, and adding multi-level feature aggregation to it. We compare different combinations of the modules in building CNN architectures. The results show that they achieve significant improvements over previous state-of-the-art models on the MagnaTagATune dataset and comparable results on Million Song Dataset. Furthermore, we analyze and visualize our model to show how the 1-D CNN operates.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Taejun Kim (5 papers)
  2. Jongpil Lee (17 papers)
  3. Juhan Nam (64 papers)
Citations (87)

Summary

We haven't generated a summary for this paper yet.