经验首页 前端设计 程序设计 Java相关 移动开发 数据库/运维 软件/图像 大数据/云计算 其他经验
当前位置:技术经验 » 大数据/云/AI » 人工智能基础 » 查看文章
Bert-vits2最终版Bert-vits2-2.3云端训练和推理(Colab免费GPU算力平台)
来源:cnblogs  作者:刘悦的技术博客  时间:2023/12/27 13:50:42  对本文有异议

对于深度学习初学者来说,JupyterNoteBook的脚本运行形式显然更加友好,依托Python语言的跨平台特性,JupyterNoteBook既可以在本地线下环境运行,也可以在线上服务器上运行。GoogleColab作为免费GPU算力平台的执牛耳者,更是让JupyterNoteBook的脚本运行形式如虎添翼。

本次我们利用Bert-vits2的最终版Bert-vits2-v2.3和JupyterNoteBook的脚本来复刻生化危机6的人气角色艾达王(ada wong)。

本地调试JupyterNoteBook

众所周知,GoogleColab虽然可以免费提供GPU让用户用于模型训练和推理,但是每一个JupyterNoteBook文件脚本最多只能运行12小时,随后就会被限制,所以为了避免浪费宝贵的GPU使用时间,我们可以在线下调试自己的JupyterNoteBook脚本,调试成功后,就可以把脚本直接上传到GoogleColab平台。

首先通过pip命令进行本地安装:

  1. python3 -m pip install jupyter

随后运行启动命令:

  1. jupyter notebook

此时,访问本地的notebook地址:

随后选择文件-》新建-》Notebook 即可。

输入笔记内容:

  1. #@title 查看显卡
  2. !nvidia-smi

点击运行单元格:

程序返回:

  1. #@title 查看显卡
  2. !nvidia-smi
  3. Wed Dec 27 12:36:10 2023
  4. +---------------------------------------------------------------------------------------+
  5. | NVIDIA-SMI 546.17 Driver Version: 546.17 CUDA Version: 12.3 |
  6. |-----------------------------------------+----------------------+----------------------+
  7. | GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
  8. | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
  9. | | | MIG M. |
  10. |=========================================+======================+======================|
  11. | 0 NVIDIA GeForce RTX 4060 ... WDDM | 00000000:01:00.0 Off | N/A |
  12. | N/A 50C P0 20W / 115W | 0MiB / 8188MiB | 0% Default |
  13. | | | N/A |
  14. +-----------------------------------------+----------------------+----------------------+
  15. +---------------------------------------------------------------------------------------+
  16. | Processes: |
  17. | GPU GI CI PID Type Process name GPU Memory |
  18. | ID ID Usage |
  19. |=======================================================================================|
  20. | No running processes found |
  21. +---------------------------------------------------------------------------------------+

至此,就可以在本地调试NoteBook了。

安装ffmpeg

新增单元格:

  1. #@title 安装ffmpeg
  2. import os, uuid, re, IPython
  3. import ipywidgets as widgets
  4. import time
  5. from glob import glob
  6. from google.colab import output, drive
  7. from IPython.display import clear_output
  8. import os, sys, urllib.request
  9. HOME = os.path.expanduser("~")
  10. pathDoneCMD = f'{HOME}/doneCMD.sh'
  11. if not os.path.exists(f"{HOME}/.ipython/ttmg.py"):
  12. hCode = "https://raw.githubusercontent.com/yunooooo/gcct/master/res/ttmg.py"
  13. urllib.request.urlretrieve(hCode, f"{HOME}/.ipython/ttmg.py")
  14. from ttmg import (
  15. loadingAn,
  16. textAn,
  17. )
  18. loadingAn(name="lds")
  19. textAn("Cloning Repositories...", ty='twg')
  20. !git clone https://github.com/XniceCraft/ffmpeg-colab.git
  21. !chmod 755 ./ffmpeg-colab/install
  22. textAn("Installing FFmpeg...", ty='twg')
  23. !./ffmpeg-colab/install
  24. clear_output()
  25. print('Installation finished!')
  26. !rm -fr /content/ffmpeg-colab
  27. !ffmpeg -version

由于语音转写需要ffmpeg的参与,所以需要安装ffmpeg的最新版本。

程序返回:

  1. Installation finished!
  2. c Copyright (c) 2000-2023 the FFmpeg developers
  3. built with gcc 9 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
  4. configuration: --prefix=/home/ffmpeg-builder/release --pkg-config-flags=--static --extra-libs=-lm --disable-doc --disable-debug --disable-shared --disable-ffprobe --enable-static --enable-gpl --enable-version3 --enable-runtime-cpudetect --enable-avfilter --enable-filters --enable-nvenc --enable-nvdec --enable-cuvid --toolchain=hardened --disable-stripping --enable-opengl --pkgconfigdir=/home/ffmpeg-builder/release/lib/pkgconfig --extra-cflags='-I/home/ffmpeg-builder/release/include -static-libstdc++ -static-libgcc ' --extra-ldflags='-L/home/ffmpeg-builder/release/lib -fstack-protector -static-libstdc++ -static-libgcc ' --extra-cxxflags=' -static-libstdc++ -static-libgcc ' --extra-libs='-ldl -lrt -lpthread' --enable-ffnvcodec --enable-gmp --enable-libaom --enable-libass --enable-libbluray --enable-libdav1d --enable-libfdk-aac --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libkvazaar --enable-libmp3lame --enable-libopus --enable-libopencore_amrnb --enable-libopencore_amrwb --enable-libopenh264 --enable-libopenjpeg --enable-libshine --enable-libsoxr --enable-libsrt --enable-libsvtav1 --enable-libtheora --enable-libvidstab --ld=g++ --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libx265 --enable-libxvid --enable-libzimg --enable-openssl --enable-zlib --enable-nonfree --extra-libs=-lpthread --enable-pthreads --extra-libs=-lgomp
  5. libavutil 58. 2.100 / 58. 2.100
  6. libavcodec 60. 3.100 / 60. 3.100
  7. libavformat 60. 3.100 / 60. 3.100
  8. libavdevice 60. 1.100 / 60. 1.100
  9. libavfilter 9. 3.100 / 9. 3.100
  10. libswscale 7. 1.100 / 7. 1.100
  11. libswresample 4. 10.100 / 4. 10.100
  12. libpostproc 57. 1.100 / 57. 1.100

这里安装的是最新版ffmpeg version 6.0

克隆代码库

接着克隆代码库:

  1. #@title 克隆代码仓库
  2. !git clone https://github.com/v3ucn/Bert-vits2-V2.3.git

程序返回:

  1. Cloning into 'Bert-vits2-V2.3'...
  2. remote: Enumerating objects: 234, done.
  3. remote: Counting objects: 100% (234/234), done.
  4. remote: Compressing objects: 100% (142/142), done.
  5. remote: Total 234 (delta 80), reused 232 (delta 78), pack-reused 0
  6. Receiving objects: 100% (234/234), 4.16 MiB | 14.14 MiB/s, done.
  7. Resolving deltas: 100% (80/80), done.

安装项目依赖

随后进入项目的目录,安装依赖:

  1. #@title 安装所需要的依赖
  2. %cd /content/Bert-vits2-V2.3
  3. !pip install -r requirements.txt

下载必要的模型

新增单元格,下载模型:

  1. #@title 下载必要的模型
  2. !wget -P slm/wavlm-base-plus/ https://huggingface.co/microsoft/wavlm-base-plus/resolve/main/pytorch_model.bin
  3. !wget -P emotional/clap-htsat-fused/ https://huggingface.co/laion/clap-htsat-fused/resolve/main/pytorch_model.bin
  4. !wget -P emotional/wav2vec2-large-robust-12-ft-emotion-msp-dim/ https://huggingface.co/audeering/wav2vec2-large-robust-12-ft-emotion-msp-dim/resolve/main/pytorch_model.bin
  5. !wget -P bert/chinese-roberta-wwm-ext-large/ https://huggingface.co/hfl/chinese-roberta-wwm-ext-large/resolve/main/pytorch_model.bin
  6. !wget -P bert/bert-base-japanese-v3/ https://huggingface.co/cl-tohoku/bert-base-japanese-v3/resolve/main/pytorch_model.bin
  7. !wget -P bert/deberta-v3-large/ https://huggingface.co/microsoft/deberta-v3-large/resolve/main/pytorch_model.bin
  8. !wget -P bert/deberta-v3-large/ https://huggingface.co/microsoft/deberta-v3-large/resolve/main/pytorch_model.generator.bin
  9. !wget -P bert/deberta-v2-large-japanese/ https://huggingface.co/ku-nlp/deberta-v2-large-japanese/resolve/main/pytorch_model.bin

下载底模文件

接着下载预训练模型的底模:

  1. #@title 下载底模文件
  2. !wget -P Data/ada/models/ https://huggingface.co/OedoSoldier/Bert-VITS2-2.3/resolve/main/DUR_0.pth
  3. !wget -P Data/ada/models/ https://huggingface.co/OedoSoldier/Bert-VITS2-2.3/resolve/main/D_0.pth
  4. !wget -P Data/ada/models/ https://huggingface.co/OedoSoldier/Bert-VITS2-2.3/resolve/main/G_0.pth
  5. !wget -P Data/ada/models/ https://huggingface.co/OedoSoldier/Bert-VITS2-2.3/resolve/main/WD_0.pth

注意2.3版本的底模是4个。

切分数据集

接着把艾达王的音频素材上传到Data/ada/raw/ada.wav

随后新建单元格:

  1. #@title 切分数据集
  2. !python3 audio_slicer.py

素材就会被切分。

转写和标注

此时我们需要把切片素材转写:

  1. #@title 转写和标注
  2. !pip install git+https://github.com/openai/whisper.git
  3. !python3 short_audio_transcribe.py

注意这里单独安装whisper,很多人直接用 pip install whisper,其实这不是正确的安装方式,需要单独指定安装源:pip install git+https://github.com/openai/whisper.git,切记,否则会报错。

执行完毕后会在角色目录生成转写文件esd.list:

  1. ./Data\ada\wavs\ada_0.wav|ada|EN|I do. The kind you like.
  2. ./Data\ada\wavs\ada_1.wav|ada|EN|Now where's the amber?
  3. ./Data\ada\wavs\ada_10.wav|ada|EN|Leave the girl. She's lost no matter what.
  4. ./Data\ada\wavs\ada_11.wav|ada|EN|You walk away now, and who knows?
  5. ./Data\ada\wavs\ada_12.wav|ada|EN|Maybe you'll live to meet me again.
  6. ./Data\ada\wavs\ada_13.wav|ada|EN|And I might get you that greeting you were looking for.
  7. ./Data\ada\wavs\ada_14.wav|ada|EN|How about we continue this discussion another time?
  8. ./Data\ada\wavs\ada_15.wav|ada|EN|Sorry, nothing yet.
  9. ./Data\ada\wavs\ada_16.wav|ada|EN|But my little helper is creating
  10. ./Data\ada\wavs\ada_17.wav|ada|EN|Quite the commotion.
  11. ./Data\ada\wavs\ada_18.wav|ada|EN|Everything will work out just fine.
  12. ./Data\ada\wavs\ada_19.wav|ada|EN|He's a good boy. Predictable.
  13. ./Data\ada\wavs\ada_2.wav|ada|EN|The deal was, we get you out of here when you deliver the amber. No amber, no protection, Louise.
  14. ./Data\ada\wavs\ada_20.wav|ada|EN|Nothing personal, Leon.
  15. ./Data\ada\wavs\ada_21.wav|ada|EN|Louise and I had an arrangement.
  16. ./Data\ada\wavs\ada_22.wav|ada|EN|Don't worry, I'll take good care of it.
  17. ./Data\ada\wavs\ada_23.wav|ada|EN|Just one question.
  18. ./Data\ada\wavs\ada_24.wav|ada|EN|What are you planning to do with this?
  19. ./Data\ada\wavs\ada_25.wav|ada|EN|So, we're talking millions of casualties?
  20. ./Data\ada\wavs\ada_26.wav|ada|EN|We're changing course. Now.
  21. ./Data\ada\wavs\ada_3.wav|ada|EN|You can stop right there, Leon.
  22. ./Data\ada\wavs\ada_4.wav|ada|EN|wouldn't make me use this.
  23. ./Data\ada\wavs\ada_5.wav|ada|EN|Would you? You don't seem surprised.
  24. ./Data\ada\wavs\ada_6.wav|ada|EN|Interesting.
  25. ./Data\ada\wavs\ada_7.wav|ada|EN|Not a bad move
  26. ./Data\ada\wavs\ada_8.wav|ada|EN|Very smooth. Ah, Leon.
  27. ./Data\ada\wavs\ada_9.wav|ada|EN|You know I don't work and tell.

这里一共27条切片语音,对应27个转写文本,注意语言是英语。

音频重新采样

对素材音频进行重新采样的操作:

  1. #@title 重新采样
  2. !python3 resample.py --sr 44100 --in_dir ./Data/ada/raw/ --out_dir ./Data/ada/wavs/

预处理标签文件

接着处理转写文件,生成训练集和验证集:

  1. #@title 预处理标签文件
  2. !python3 preprocess_text.py --transcription-path ./Data/ada/esd.list --t

程序返回:

  1. pytorch_model.bin: 100% 1.32G/1.32G [00:10<00:00, 122MB/s]
  2. spm.model: 100% 2.46M/2.46M [00:00<00:00, 115MB/s]
  3. The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.
  4. 0it [00:00, ?it/s]
  5. [nltk_data] Downloading package averaged_perceptron_tagger to
  6. [nltk_data] /root/nltk_data...
  7. [nltk_data] Unzipping taggers/averaged_perceptron_tagger.zip.
  8. [nltk_data] Downloading package cmudict to /root/nltk_data...
  9. [nltk_data] Unzipping corpora/cmudict.zip.
  10. 100% 27/27 [00:00<00:00, 4457.63it/s]
  11. 总重复音频数:0,总未找到的音频数:0
  12. 训练集和验证集生成完成!

生成 BERT 特征文件

最后生成bert特征文件:

  1. #@title 生成 BERT 特征文件
  2. !python3 bert_gen.py --config-path ./Data/ada/configs/config.json

对应27个素材:

  1. 100% 27/27 [00:33<00:00, 1.25s/it]
  2. bert生成完毕!, 共有27bert.pt生成!

模型训练

万事俱备,开始训练:

  1. #@title 开始训练
  2. !python3 train_ms.py

模型会在models目录生成,项目默认设置了训练间隔是50步,可以根据自己的需求修改config.json配置文件。

模型推理

一般情况下,训练了50步或者100步左右,可以推理一下查看效果,然后继续训练:

  1. #@title 开始推理
  2. !python3 webui.py

返回:

  1. | numexpr.utils | INFO | NumExpr defaulting to 2 threads.
  2. /usr/local/lib/python3.10/dist-packages/torch/nn/utils/weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
  3. warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
  4. | utils | INFO | Loaded checkpoint 'Data/ada/models/G_150.pth' (iteration 25)
  5. 推理页面已开启!
  6. Running on local URL: http://127.0.0.1:7860
  7. Running on public URL: https://814833a6f477ba151c.gradio.live

点击第二个公网地址进行推理即可。

结语

至此,我们已经完成了基于JupyterNoteBook的数据切分、转写、预处理、训练以及推理流程。最后奉上线上GoogleColab,以飨众乡亲:

  1. https://colab.research.google.com/drive/1-H1DGG5dTy8u_8vFbq1HACXPX9AAM76s?usp=sharing

原文链接:https://www.cnblogs.com/v3ucn/p/17930355.html

 友情链接:直通硅谷  点职佳  北美留学生论坛

本站QQ群:前端 618073944 | Java 606181507 | Python 626812652 | C/C++ 612253063 | 微信 634508462 | 苹果 692586424 | C#/.net 182808419 | PHP 305140648 | 运维 608723728

W3xue 的所有内容仅供测试,对任何法律问题及风险不承担任何责任。通过使用本站内容随之而来的风险与本站无关。
关于我们  |  意见建议  |  捐助我们  |  报错有奖  |  广告合作、友情链接(目前9元/月)请联系QQ:27243702 沸活量
皖ICP备17017327号-2 皖公网安备34020702000426号