新年到,祝福来

你还在用简简单的文字祝福语嘛?

你还在为新年祝福视频发愁嘛?


看到我你就不愁啦~~~

这里啥都有已经给你准备好啦~~~

项目说明

本项目使用了Parakeet套件对语音进行了合成处理,然后使用paddleGAN套件对图片和视频进行了处理

利用准备的10种语音模板可以较好的发出这10种类型的声音比较动听

然后对图像进行处理可以使用动漫化头像或者自己的头像进行唇语合成

参考项目

Parakeet音色克隆:柯南的变声器成真啦
[鬼畜区的召唤]蜜雪冰城小giegie

特色

可以合成男女多种音色,添加图片,文字即可出现语音!

虎起生活的风帆,走向虎关通途。向着虎年奔跑,达到吉虎未年,粘粘虎年的喜气。让美梦成真,叫理想变现,要祥瑞高照。愿朋友虎年喜虎虎,如日中天发虎财!

自定义模块

自定义模块值 对应含义 数值类型
lable 对应音色 int(1-11)
sentences 需要合成的内容 str
photo_patch 图片地址 地址
custom 自定义语音地址 地址

音色选择

lable 值 对应音色
1 台湾腔小姐姐
2 小姐姐
3 蜡笔小新
4 东北老铁
5 粤语小哥哥
6 小哥哥
7 低沉大叔
8 萌娃
9 御姐音
10 萝莉音
11 自定义
lable = 1  # 根据上面的选择器写入相应的值
sentences = "虎起生活的风帆,走向虎关通途。"  # 需要写入的祝福语
photo_patch = "./靓照.jpg"  # 照片地址
custom = "./" # 自定义语音地址

特别说明

如果需要使用自己的头像进行处理的,可以查看一下!!!要用自己头像的同学注意!!!!!

素材解压

!unzip  -d /home/aistudio/data /home/aistudio/data/data126388/素材.zip 
# !unzip  -d /home/aistudio/work/ /home/aistudio/data/pretrained.zip
Archive:  /home/aistudio/data/data126388/素材.zip
  inflating: /home/aistudio/data/蜡笔小新.wav  
  inflating: /home/aistudio/data/萝莉.wav  
  inflating: /home/aistudio/data/台湾腔小姐姐.wav  
  inflating: /home/aistudio/data/小宝宝.wav  
  inflating: /home/aistudio/data/小哥哥.wav  
  inflating: /home/aistudio/data/小姐姐.wav  
  inflating: /home/aistudio/data/御姐.wav  
  inflating: /home/aistudio/data/粤语小哥哥.wav  
  inflating: /home/aistudio/data/pretrained.zip  
  inflating: /home/aistudio/data/低沉大叔.wav  
  inflating: /home/aistudio/data/东北老铁.wav  

数据前期处理

tone_gather = {1:'data/台湾腔小姐姐.wav',
2:'data/小姐姐.wav',
3:'data/蜡笔小新.wav',
4:'data/东北老铁.wav',
5:'data/粤语小哥哥.wav',
6:'data/小哥哥.wav',
7:'data/低沉大叔.wav',
8:'data/小宝宝.wav',
9:'data/御姐.wav',
10:'data/萝莉.wav'}

tone_gather[11] = custom

if (custom == "./" and lable == 11) or (lable not in [i for i in range(1,12)]):
    lable = 1

symbol = [',', '.', ',', '。','!', '!', ';', ';', ':', ":"]
sentence = ''
for i in sentences:
    if i in symbol:
        sentence = sentence[:-1] + '$'
    else:
ce[:-1] + '$'
    else:
        sentence = sentence + i + '%'

语音合成

1、环境的生成与包的导入

#下载安装Parakeet--本项目中已帮大家安装好了,无需安装,如有安装需求,可执行以下代码:
# !git clone https://gitee.com/paddlepaddle/Parakeet.git -b release/v0.3 /home/aistudio/work/Parakeet
#安装parakeet包
!pip install -e /home/aistudio/work/Parakeet/

如果出现“No module named parakeet”的错误,可以重启项目解决

# 把必要的路径添加到 sys.path,避免找不到已安装的包的
import sys
sys.path.append("/home/aistudio/work/Parakeet")
sys.path.append("/home/aistudio/work/Parakeet/examples/tacotron2_aishell3")

import numpy as np
import os
import paddle
from matplotlib import pyplot as plt
from IPython import display as ipd
import soundfile as sf
import librosa.display
from parakeet.utils import display
paddle.set_device("gpu:0")
CUDAPlace(0)
%matplotlib inline

2. 加载语音克隆模型

from examples.ge2e.audio_processor import SpeakerVerificationPreprocessor
from parakeet.models.lstm_speaker_encoder import LSTMSpeakerEncoder

# speaker encoder
p = SpeakerVerificationPreprocessor(
    sampling_rate=16000, 
    audio_norm_target_dBFS=-30, 
    vad_window_length=30, 
    vad_moving_average_width=8, 
    vad_max_silence_length=6, 
    mel_window_length=25, 
    mel_window_step=10, 
    n_mels=40, 
    partial_n_frames=160, 
    min_pad_coverage=0.75, 
    partial_overlap_ratio=0.5)
speaker_encoder = LSTMSpeakerEncoder(n_mels=40, num_layers=3, hidden_size=256, output_size=256)
speaker_encoder_params_path = "/home/aistudio/work/pretrained/ge2e_ckpt_0.3/step-3000000.pdparams"
speaker_encoder.set_state_dict(paddle.load(speaker_encoder_params_path))
speaker_encoder.eval()

# synthesizer
from parakeet.models.tacotron2 import Tacotron2
from examples.tacotron2_aishell3.chinese_g2p import convert_sentence
from examples.tacotron2_aishell3.aishell3 import voc_phones, voc_tones

from yacs.config import CfgNode
synthesizer = Tacotron2(
    vocab_size=68,
    n_tones=10,
    d_mels= 80,
    d_encoder= 512,
    encoder_conv_layers = 3,
    encoder_kernel_size= 5,
    d_prenet= 256,
    d_attention_rnn= 1024,
    d_decoder_rnn = 1024,
    attention_filters = 32,
    attention_kernel_size = 31,
    d_attention= 128,
    d_postnet = 512,
    postnet_kernel_size = 5,
    postnet_conv_layers = 5,
    reduction_factor = 1,
    p_encoder_dropout = 0.5,
    p_prenet_dropout= 0.5,
    p_attention_dropout= 0.1,
    p_decoder_dropout= 0.1,
    p_postnet_dropout= 0.5,
    d_global_condition=256,
    use_stop_token=False
)
params_path = "/home/aistudio/work/pretrained/tacotron2_aishell3_ckpt_0.3/step-450000.pdparams"
synthesizer.set_state_dict(paddle.load(params_path))
synthesizer.eval()

# vocoder
from parakeet.models import ConditionalWaveFlow
vocoder = ConditionalWaveFlow(upsample_factors=[16, 16], n_flows=8, n_layers=8, n_group=16, channels=128, n_mels=80, kernel_size=[3, 3])
params_path = "/home/aistudio/work/pretrained/waveflow_ljspeech_ckpt_0.3/step-2000000.pdparams"
vocoder.set_state_dict(paddle.load(params_path))
vocoder.eval()

3. 提取目标音色的声音特征

注意:支持音频格式为wav和flac,如有其他格式音频,建议使用软件进行转换。

ref_audio_path = tone_gather[lable]
mel_sequences = p.extract_mel_partials(p.preprocess_wav(ref_audio_path))
# print("mel_sequences: ", mel_sequences.shape)
with paddle.no_grad():
    embed = speaker_encoder.embed_utterance(paddle.to_tensor(mel_sequences))
# print("embed shape: ", embed.shape)
phones, tones = convert_sentence(sentence)
# print(phones)
# print(tones)

phones = np.array([voc_phones.lookup(item) for item in phones], dtype=np.int64)
tones = np.array([voc_tones.lookup(item) for item in tones], dtype=np.int64)

phones = paddle.to_tensor(phones).unsqueeze(0)
tones = paddle.to_tensor(tones).unsqueeze(0)
utterance_embeds = paddle.unsqueeze(embed, 0)
/home/aistudio/work/Parakeet/examples/ge2e/audio_processor.py:96: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  audio_mask = np.round(audio_mask).astype(np.bool)

4. 合成频谱

提取到了参考语音的特征向量之后,给定需要合成的文本,通过 Tacotron2 模型生成频谱。

目前只支持汉字以及两个表示停顿的特殊符号,’%‘表示句中较短的停顿,’$'表示较长的停顿。这是和 AISHELL-3 数据集内的标注一致的。更通用的文本前端会在 parakeet 后续的版本中逐渐提供。

with paddle.no_grad():
    outputs = synthesizer.infer(phones, tones=tones, global_condition=utterance_embeds)
mel_input = paddle.transpose(outputs["mel_outputs_postnet"], [0, 2, 1])
fig = display.plot_alignment(outputs["alignments"][0].numpy().T)
os.system('mkdir -p /home/aistudio/syn_audio/')
with paddle.no_grad():
    wav = vocoder.infer(mel_input)
wav = wav.numpy()[0]
sf.write(f"/home/aistudio/syn_audio/generate.wav", wav, samplerate=22050)
# librosa.display.waveplot(wav)
 98%|█████████▊| 984/1000 [00:02<00:00, 332.07it/s]


Warning! Reached max decoder steps!!!
time: 1.586832046508789s


/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/image.py:425: DeprecationWarning: np.asscalar(a) is deprecated since NumPy v1.16, use a.item() instead
  a_min = np.asscalar(a_min.astype(scaled_dtype))
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/image.py:426: DeprecationWarning: np.asscalar(a) is deprecated since NumPy v1.16, use a.item() instead
  a_max = np.asscalar(a_max.astype(scaled_dtype))

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-awg9G7w2-1643542911129)(output_24_3.png)]

5. 合成最终语音

使用 waveflow 声码器,将生成的频谱转换为音频。

# 查看生成语音
ipd.Audio(wav, rate=22050)
Logo

学大模型,用大模型上飞桨星河社区!每天8点V100G算力免费领!免费领取ERNIE 4.0 100w Token >>>

更多推荐