经验首页 前端设计 程序设计 Java相关 移动开发 数据库/运维 软件/图像 大数据/云计算 其他经验
当前位置:技术经验 » 大数据/云/AI » 人工智能基础 » 查看文章
免费背景音人声分离解决方案MVSEP-MDX23,足以和Spleeter分庭抗礼
来源:cnblogs  作者:刘悦的技术博客  时间:2023/12/29 9:16:30  对本文有异议

在音视频领域,把已经发布的混音歌曲或者音频文件逆向分离一直是世界性的课题。音波混合的物理特性导致在没有原始工程文件的情况下,将其还原和分离是一件很有难度的事情。

言及背景音人声分离技术,就不能不提Spleeter,它是一种用于音频源分离(音乐分离)的开源深度学习算法,由Deezer研究团队开发。使用的是一个性能取向的音源分离算法,并且为用户提供了已经预训练好的模型,能够开箱即用,这也是Spleeter泛用性高的原因之一,关于Spleeter,请移步:人工智能AI库Spleeter免费人声和背景音乐分离实践(Python3.10),这里不再赘述。

MVSEP-MDX23背景音人声分离技术由Demucs研发,Demucs来自Facebook Research团队,它的发源晚于Spleeter,早于MDX-Net,并且经历过4个大版本的迭代,每一代的模型结构都被大改。Demucs的生成质量从v3开始大幅质变,一度领先行业平均水平,v4是现在最强的开源乐器分离单模型,v1和v2的网络模型被用作MDX-net其中的一部分。

本次我们基于MVSEP-MDX23来对音频的背景音和人声进行分离。

本地分离人声和背景音

如果本地离线运行MVSEP-MDX23,首先克隆代码:

  1. git clone https://github.com/jarredou/MVSEP-MDX23-Colab_v2.git

随后进入项目并安装依赖:

  1. cd MVSEP-MDX23-Colab_v2
  2. pip3 install -r requirements.txt

随后直接进推理即可:

  1. python3 inference.py --input_audio test.wav --output_folder ./results/

这里将test.wav进行人声分离,分离后的文件在results文件夹生成。

注意推理过程中会将分离模型下载到项目的models目录,极其巨大。

同时推理过程相当缓慢。

这里可以添加--single_onnx参数来提高推理速度,但音质上有一定的损失。

如果本地设备具备12G以上的显存,也可以添加--large_gpu参数来提高推理的速度。

如果本地没有N卡或者显存实在捉襟见肘,也可以通过--cpu参数来使用cpu进行推理,但是并不推荐这样做,因为本来就慢,用cpu就更慢了。

令人暖心的是,官方还利用Pyqt写了一个小的gui界面来提高操作友好度:

  1. __author__ = 'Roman Solovyev (ZFTurbo), IPPM RAS'
  2. if __name__ == '__main__':
  3. import os
  4. gpu_use = "0"
  5. print('GPU use: {}'.format(gpu_use))
  6. os.environ["CUDA_VISIBLE_DEVICES"] = "{}".format(gpu_use)
  7. import time
  8. import os
  9. import numpy as np
  10. from PyQt5.QtCore import *
  11. from PyQt5 import QtCore
  12. from PyQt5.QtWidgets import *
  13. import sys
  14. from inference import predict_with_model
  15. root = dict()
  16. class Worker(QObject):
  17. finished = pyqtSignal()
  18. progress = pyqtSignal(int)
  19. def __init__(self, options):
  20. super().__init__()
  21. self.options = options
  22. def run(self):
  23. global root
  24. # Here we pass the update_progress (uncalled!)
  25. self.options['update_percent_func'] = self.update_progress
  26. predict_with_model(self.options)
  27. root['button_start'].setDisabled(False)
  28. root['button_finish'].setDisabled(True)
  29. root['start_proc'] = False
  30. self.finished.emit()
  31. def update_progress(self, percent):
  32. self.progress.emit(percent)
  33. class Ui_Dialog(object):
  34. def setupUi(self, Dialog):
  35. global root
  36. Dialog.setObjectName("Settings")
  37. Dialog.resize(370, 180)
  38. self.checkbox_cpu = QCheckBox("Use CPU instead of GPU?", Dialog)
  39. self.checkbox_cpu.move(30, 10)
  40. self.checkbox_cpu.resize(320, 40)
  41. if root['cpu']:
  42. self.checkbox_cpu.setChecked(True)
  43. self.checkbox_single_onnx = QCheckBox("Use single ONNX?", Dialog)
  44. self.checkbox_single_onnx.move(30, 40)
  45. self.checkbox_single_onnx.resize(320, 40)
  46. if root['single_onnx']:
  47. self.checkbox_single_onnx.setChecked(True)
  48. self.pushButton_save = QPushButton(Dialog)
  49. self.pushButton_save.setObjectName("pushButton_save")
  50. self.pushButton_save.move(30, 120)
  51. self.pushButton_save.resize(150, 35)
  52. self.pushButton_cancel = QPushButton(Dialog)
  53. self.pushButton_cancel.setObjectName("pushButton_cancel")
  54. self.pushButton_cancel.move(190, 120)
  55. self.pushButton_cancel.resize(150, 35)
  56. self.retranslateUi(Dialog)
  57. QtCore.QMetaObject.connectSlotsByName(Dialog)
  58. self.Dialog = Dialog
  59. # connect the two functions
  60. self.pushButton_save.clicked.connect(self.return_save)
  61. self.pushButton_cancel.clicked.connect(self.return_cancel)
  62. def retranslateUi(self, Dialog):
  63. _translate = QtCore.QCoreApplication.translate
  64. Dialog.setWindowTitle(_translate("Settings", "Settings"))
  65. self.pushButton_cancel.setText(_translate("Settings", "Cancel"))
  66. self.pushButton_save.setText(_translate("Settings", "Save settings"))
  67. def return_save(self):
  68. global root
  69. # print("save")
  70. root['cpu'] = self.checkbox_cpu.isChecked()
  71. root['single_onnx'] = self.checkbox_single_onnx.isChecked()
  72. self.Dialog.close()
  73. def return_cancel(self):
  74. global root
  75. # print("cancel")
  76. self.Dialog.close()
  77. class MyWidget(QWidget):
  78. def __init__(self):
  79. super().__init__()
  80. self.initUI()
  81. def initUI(self):
  82. self.resize(560, 360)
  83. self.move(300, 300)
  84. self.setWindowTitle('MVSEP music separation model')
  85. self.setAcceptDrops(True)
  86. def dragEnterEvent(self, event):
  87. if event.mimeData().hasUrls():
  88. event.accept()
  89. else:
  90. event.ignore()
  91. def dropEvent(self, event):
  92. global root
  93. files = [u.toLocalFile() for u in event.mimeData().urls()]
  94. txt = ''
  95. root['input_files'] = []
  96. for f in files:
  97. root['input_files'].append(f)
  98. txt += f + '\n'
  99. root['input_files_list_text_area'].insertPlainText(txt)
  100. root['progress_bar'].setValue(0)
  101. def execute_long_task(self):
  102. global root
  103. if len(root['input_files']) == 0 and 1:
  104. QMessageBox.about(root['w'], "Error", "No input files specified!")
  105. return
  106. root['progress_bar'].show()
  107. root['button_start'].setDisabled(True)
  108. root['button_finish'].setDisabled(False)
  109. root['start_proc'] = True
  110. options = {
  111. 'input_audio': root['input_files'],
  112. 'output_folder': root['output_folder'],
  113. 'cpu': root['cpu'],
  114. 'single_onnx': root['single_onnx'],
  115. 'overlap_large': 0.6,
  116. 'overlap_small': 0.5,
  117. }
  118. self.update_progress(0)
  119. self.thread = QThread()
  120. self.worker = Worker(options)
  121. self.worker.moveToThread(self.thread)
  122. self.thread.started.connect(self.worker.run)
  123. self.worker.finished.connect(self.thread.quit)
  124. self.worker.finished.connect(self.worker.deleteLater)
  125. self.thread.finished.connect(self.thread.deleteLater)
  126. self.worker.progress.connect(self.update_progress)
  127. self.thread.start()
  128. def stop_separation(self):
  129. global root
  130. self.thread.terminate()
  131. root['button_start'].setDisabled(False)
  132. root['button_finish'].setDisabled(True)
  133. root['start_proc'] = False
  134. root['progress_bar'].hide()
  135. def update_progress(self, progress):
  136. global root
  137. root['progress_bar'].setValue(progress)
  138. def open_settings(self):
  139. global root
  140. dialog = QDialog()
  141. dialog.ui = Ui_Dialog()
  142. dialog.ui.setupUi(dialog)
  143. dialog.exec_()
  144. def dialog_select_input_files():
  145. global root
  146. files, _ = QFileDialog.getOpenFileNames(
  147. None,
  148. "QFileDialog.getOpenFileNames()",
  149. "",
  150. "All Files (*);;Audio Files (*.wav, *.mp3, *.flac)",
  151. )
  152. if files:
  153. txt = ''
  154. root['input_files'] = []
  155. for f in files:
  156. root['input_files'].append(f)
  157. txt += f + '\n'
  158. root['input_files_list_text_area'].insertPlainText(txt)
  159. root['progress_bar'].setValue(0)
  160. return files
  161. def dialog_select_output_folder():
  162. global root
  163. foldername = QFileDialog.getExistingDirectory(
  164. None,
  165. "Select Directory"
  166. )
  167. root['output_folder'] = foldername + '/'
  168. root['output_folder_line_edit'].setText(root['output_folder'])
  169. return foldername
  170. def create_dialog():
  171. global root
  172. app = QApplication(sys.argv)
  173. w = MyWidget()
  174. root['input_files'] = []
  175. root['output_folder'] = os.path.dirname(os.path.abspath(__file__)) + '/results/'
  176. root['cpu'] = False
  177. root['single_onnx'] = False
  178. button_select_input_files = QPushButton(w)
  179. button_select_input_files.setText("Input audio files")
  180. button_select_input_files.clicked.connect(dialog_select_input_files)
  181. button_select_input_files.setFixedHeight(35)
  182. button_select_input_files.setFixedWidth(150)
  183. button_select_input_files.move(30, 20)
  184. input_files_list_text_area = QTextEdit(w)
  185. input_files_list_text_area.setReadOnly(True)
  186. input_files_list_text_area.setLineWrapMode(QTextEdit.NoWrap)
  187. font = input_files_list_text_area.font()
  188. font.setFamily("Courier")
  189. font.setPointSize(10)
  190. input_files_list_text_area.move(30, 60)
  191. input_files_list_text_area.resize(500, 100)
  192. button_select_output_folder = QPushButton(w)
  193. button_select_output_folder.setText("Output folder")
  194. button_select_output_folder.setFixedHeight(35)
  195. button_select_output_folder.setFixedWidth(150)
  196. button_select_output_folder.clicked.connect(dialog_select_output_folder)
  197. button_select_output_folder.move(30, 180)
  198. output_folder_line_edit = QLineEdit(w)
  199. output_folder_line_edit.setReadOnly(True)
  200. font = output_folder_line_edit.font()
  201. font.setFamily("Courier")
  202. font.setPointSize(10)
  203. output_folder_line_edit.move(30, 220)
  204. output_folder_line_edit.setFixedWidth(500)
  205. output_folder_line_edit.setText(root['output_folder'])
  206. progress_bar = QProgressBar(w)
  207. # progress_bar.move(30, 310)
  208. progress_bar.setValue(0)
  209. progress_bar.setGeometry(30, 310, 500, 35)
  210. progress_bar.setAlignment(QtCore.Qt.AlignCenter)
  211. progress_bar.hide()
  212. root['progress_bar'] = progress_bar
  213. button_start = QPushButton('Start separation', w)
  214. button_start.clicked.connect(w.execute_long_task)
  215. button_start.setFixedHeight(35)
  216. button_start.setFixedWidth(150)
  217. button_start.move(30, 270)
  218. button_finish = QPushButton('Stop separation', w)
  219. button_finish.clicked.connect(w.stop_separation)
  220. button_finish.setFixedHeight(35)
  221. button_finish.setFixedWidth(150)
  222. button_finish.move(200, 270)
  223. button_finish.setDisabled(True)
  224. button_settings = QPushButton('?', w)
  225. button_settings.clicked.connect(w.open_settings)
  226. button_settings.setFixedHeight(35)
  227. button_settings.setFixedWidth(35)
  228. button_settings.move(495, 270)
  229. button_settings.setDisabled(False)
  230. mvsep_link = QLabel(w)
  231. mvsep_link.setOpenExternalLinks(True)
  232. font = mvsep_link.font()
  233. font.setFamily("Courier")
  234. font.setPointSize(10)
  235. mvsep_link.move(415, 30)
  236. mvsep_link.setText('Powered by <a href="https://mvsep.com">MVSep.com</a>')
  237. root['w'] = w
  238. root['input_files_list_text_area'] = input_files_list_text_area
  239. root['output_folder_line_edit'] = output_folder_line_edit
  240. root['button_start'] = button_start
  241. root['button_finish'] = button_finish
  242. root['button_settings'] = button_settings
  243. # w.showMaximized()
  244. w.show()
  245. sys.exit(app.exec_())
  246. if __name__ == '__main__':
  247. create_dialog()

效果如下:

界面虽然朴素,但相当实用,Spleeter可没给我们提供这个待遇。

Colab云端分离人声和背景音

托Google的福,我们也可以在Colab云端使用MVSEP-MDX23:

  1. https://colab.research.google.com/github/jarredou/MVSEP-MDX23-Colab_v2/blob/v2.3/MVSep-MDX23-Colab.ipynb#scrollTo=uWX5WOqjU0QC

首先安装MVSEP-MDX23:

  1. #@markdown #Installation
  2. #@markdown *Run this cell to install MVSep-MDX23*
  3. print('Installing... This will take 1 minute...')
  4. %cd /content
  5. from google.colab import drive
  6. drive.mount('/content/drive')
  7. !git clone https://github.com/jarredou/MVSEP-MDX23-Colab_v2.git &> /dev/null
  8. %cd /content/MVSEP-MDX23-Colab_v2
  9. !pip install -r requirements.txt &> /dev/null
  10. # onnxruntime-gpu nightly fix for cuda12.2
  11. !python -m pip install ort-nightly-gpu --index-url=https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ort-cuda-12-nightly/pypi/simple/
  12. print('Installation done !')

随后编写推理代码:

  1. #@markdown #Separation
  2. from pathlib import Path
  3. import glob
  4. %cd /content/MVSEP-MDX23-Colab_v2
  5. input = '/content/drive/MyDrive' #@param {type:"string"}
  6. output_folder = '/content/drive/MyDrive/output' #@param {type:"string"}
  7. #@markdown ---
  8. #@markdown *Bigshifts=1 to disable that feature*
  9. BigShifts = 7 #@param {type:"slider", min:1, max:41, step:1}
  10. #@markdown ---
  11. overlap_InstVoc = 1 #@param {type:"slider", min:1, max:40, step:1}
  12. overlap_VitLarge = 1 #@param {type:"slider", min:1, max:40, step:1}
  13. #@markdown ---
  14. weight_InstVoc = 8 #@param {type:"slider", min:0, max:10, step:1}
  15. weight_VitLarge = 5 #@param {type:"slider", min:0, max:10, step:1}
  16. #@markdown ---
  17. use_VOCFT = False #@param {type:"boolean"}
  18. overlap_VOCFT = 0.1 #@param {type:"slider", min:0, max:0.95, step:0.05}
  19. weight_VOCFT = 2 #@param {type:"slider", min:0, max:10, step:1}
  20. #@markdown ---
  21. vocals_instru_only = True #@param {type:"boolean"}
  22. overlap_demucs = 0.6 #@param {type:"slider", min:0, max:0.95, step:0.05}
  23. #@markdown ---
  24. output_format = 'PCM_16' #@param ["PCM_16", "FLOAT"]
  25. if vocals_instru_only:
  26. vocals_only = '--vocals_only true'
  27. else:
  28. vocals_only = ''
  29. if use_VOCFT:
  30. use_VOCFT = '--use_VOCFT true'
  31. else:
  32. use_VOCFT = ''
  33. if Path(input).is_file():
  34. file_path = input
  35. Path(output_folder).mkdir(parents=True, exist_ok=True)
  36. !python inference.py \
  37. --large_gpu \
  38. --weight_InstVoc {weight_InstVoc} \
  39. --weight_VOCFT {weight_VOCFT} \
  40. --weight_VitLarge {weight_VitLarge} \
  41. --input_audio "{file_path}" \
  42. --overlap_demucs {overlap_demucs} \
  43. --overlap_VOCFT {overlap_VOCFT} \
  44. --overlap_InstVoc {overlap_InstVoc} \
  45. --overlap_VitLarge {overlap_VitLarge} \
  46. --output_format {output_format} \
  47. --BigShifts {BigShifts} \
  48. --output_folder "{output_folder}" \
  49. {vocals_only} \
  50. {use_VOCFT}
  51. else:
  52. file_paths = sorted([f'"{glob.escape(path)}"' for path in glob.glob(input + "/*")])[:]
  53. input_audio_args = ' '.join(file_paths)
  54. Path(output_folder).mkdir(parents=True, exist_ok=True)
  55. !python inference.py \
  56. --large_gpu \
  57. --weight_InstVoc {weight_InstVoc} \
  58. --weight_VOCFT {weight_VOCFT} \
  59. --weight_VitLarge {weight_VitLarge} \
  60. --input_audio {input_audio_args} \
  61. --overlap_demucs {overlap_demucs} \
  62. --overlap_VOCFT {overlap_VOCFT} \
  63. --overlap_InstVoc {int(overlap_InstVoc)} \
  64. --overlap_VitLarge {int(overlap_VitLarge)} \
  65. --output_format {output_format} \
  66. --BigShifts {BigShifts} \
  67. --output_folder "{output_folder}" \
  68. {vocals_only} \
  69. {use_VOCFT}

这里默认使用google云盘的目录,也可以修改为当前服务器的目录地址。

结语

MVSEP-MDX23 和 Spleeter 都是音频人声背景音分离软件,作为用户,我们到底应该怎么选择?

MVSEP-MDX23 基于 Demucs4 和 MDX 神经网络架构,可以将音乐分离成“bass”、“drums”、“vocals”和“other”四个部分。MVSEP-MDX23 在 2023 年的音乐分离挑战中获得了第三名,并且在 MultiSong 数据集上的质量比较中表现出色。它提供了 Python 命令行工具和 GUI 界面,支持 CPU 和 GPU 加速,可以在本地运行。

Spleeter 是由 Deezer 开发的开源音频分离库,它使用深度学习模型将音频分离成不同的音轨,如人声、伴奏等。Spleeter 提供了预训练的模型,可以在命令行或作为 Python 库使用。它的优势在于易用性和灵活性,可以根据需要分离不同数量的音轨。

总的来说,MVSEP-MDX23 在音频分离的性能和精度上表现出色,尤其适合需要高质量音频分离的专业用户。而 Spleeter 则更适合普通用户和开发者,因为它易于使用,并且具有更多的定制选项。

原文链接:https://www.cnblogs.com/v3ucn/p/17933992.html

 友情链接:直通硅谷  点职佳  北美留学生论坛

本站QQ群:前端 618073944 | Java 606181507 | Python 626812652 | C/C++ 612253063 | 微信 634508462 | 苹果 692586424 | C#/.net 182808419 | PHP 305140648 | 运维 608723728

W3xue 的所有内容仅供测试,对任何法律问题及风险不承担任何责任。通过使用本站内容随之而来的风险与本站无关。
关于我们  |  意见建议  |  捐助我们  |  报错有奖  |  广告合作、友情链接(目前9元/月)请联系QQ:27243702 沸活量
皖ICP备17017327号-2 皖公网安备34020702000426号