Volume -14 | Issue -6
Volume -14 | Issue -6
Volume -14 | Issue -6
Volume -14 | Issue -6
Volume -14 | Issue -6
In today's digital age, automated music recommendation systems based on human emotions are gaining significant traction. This paper introduces a novel approach aimed at enhancing music mood classification accuracy by integrating both audio and lyrical modalities within a fusion model. The primary objective is to investigate the effectiveness of various attention mechanisms, including self-attention (SA), channel attention (CA), and hierarchical attention network (HAN), within a multi-modal framework tailored for music mood classification. Through rigorous experimentation, we demonstrate that multi-modal architectures enriched with attention mechanisms outperform their non-attention counterparts and single-modal architectures, achieving higher accuracy rates. Motivated by the promising performance of attention mechanisms, we propose a groundbreaking network architecture, the HAN-CA-SA based multimodal classification system, which boasts an impressive accuracy of 82.35%. Additionally, we evaluate the proposed model using ROC and Kappa metrics, validate its robustness through Kfold cross-validation, and conduct comparative analyses with existing systems such as XLNet and CNN-BERT, supported by a statistical hypothesis test.