【谁说视频不能P】之基于PPHumanMatting 来 P 阿力木视频

通过PPHumanMatting对视频进行抠图，并合并背景视频来达到阿力木去张家界旅游视频。

AI Studio

525人浏览 · 2022-09-03 17:39:55

AI Studio · 2022-09-03 17:39:55 发布

【谁说视频不能P】之基于PPHumanMatting 来 P 阿力木视频

【谁说视频不能P】之基于PaddleSeg的几行代码P最近的那个“疆域阿力木”视频。话不多说，下面就开干。

主要流程如下：

视频-图像转换
抠图
生成绿幕并合并
合并视频
添加音频

一、环境设置

话不多说，主要做以下工作。

1.PaddleSeg下载

git下载最新版PaddleSeg2.6版本。

!git clone https://gitee.com/paddlepaddle/PaddleSeg.git --depth=1

Cloning into 'PaddleSeg'...
remote: Enumerating objects: 1815, done.[K
remote: Counting objects: 100% (1815/1815), done.[K
remote: Compressing objects: 100% (1450/1450), done.[K
remote: Total 1815 (delta 485), reused 1227 (delta 287), pack-reused 0[K
Receiving objects: 100% (1815/1815), 48.94 MiB | 3.94 MiB/s, done.
Resolving deltas: 100% (485/485), done.
Checking connectivity... done.

2.升级PaddePaddle-gpu

!pip install -U paddlepaddle-gpu >log.log

3.PaddleSeg 安装

!pip install -e ~/PaddleSeg >log.log

4.Matting 安装

!pip install -r ~/PaddleSeg/Matting/requirements.txt >log.log

二、视频切分转图片

主要是以下功能

从摄像头获取视频切分为图片帧
从视频文件获取视频切分为图片帧

# 引入必要的包
import cv2
import os
import numpy as np
from PIL import Image
# 设置显卡
%set_env GPU_NUM=1
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

env: GPU_NUM=1

1.解压缩视频

!unzip -qoa data/data162859/新疆阿力木-众所周知视频是不能p的.zip -d video

# 中文乱码很难受
!mv video/#─у╒т▒│╛░╠л╝┘┴╦#╚¤┼й#┤°╗ї╜ь╡─╠ь╗и░х#╚л═°╤░╒╥╫ю╝┘╒ц▒│╛░video.mp4 video/1video.mp4

2.视频转图片

# 视频转图片
def video2image(video_path, img_path):
    cap = cv2.VideoCapture(video_path)
    index = 0
    while(True):
        ret,frame = cap.read() 
        if ret:
            cv2.imwrite('%s/%d.jpg'%(img_path, index), frame)
            index += 1
        else:
            break
    cap.release()
    print('Video cut finish, all %d frame' % index)

!mkdir 1_img

mkdir: cannot create directory ‘1_img’: File exists

# 切阿里木视频
video2image('video/1video.mp4', '1_img')

Video cut finish, all 1024 frame

!mkdir zjj

# 切风景视频
video2image('data/data162859/zjj.mp4', 'zjj')

Video cut finish, all 2650 frame

三、PPHumanMatting人像抠图

1.人像抠图简介

Image Matting（精细化分割/影像去背/抠图）是指借由计算前景的颜色和透明度，将前景从影像中撷取出来的技术，可用于替换背景、影像合成、视觉特效，在电影工业中被广泛地使用。影像中的每个像素会有代表其前景透明度的值，称作阿法值（Alpha），一张影像中所有阿法值的集合称作阿法遮罩（Alpha Matte），将影像被遮罩所涵盖的部分取出即可完成前景的分离。

提供多种场景人像抠图模型, 可根据实际情况选择相应模型，我们提供了Inference Model，您可直接下载进行部署应用。

模型推荐：

追求精度：PP-Matting, 低分辨率使用PP-Matting-512, 高分辨率使用PP-Matting-1024。
追求速度：ModNet-MobileNetV2。
高分辨率(>2048)简单背景人像抠图：PP-HumanMatting。
提供trimap：DIM-VGG16。

模型	Params(M)	FLOPs(G)	FPS	Checkpoint	Inference Model
PP-Matting-512	24.5	91.28	28.9	model	model inference
PP-Matting-1024	24.5	91.28	13.4(1024X1024)	model	model inference
PP-HumanMatting	63.9	135.8 (2048X2048)	32.8(2048X2048)	model	model inference
ModNet-MobileNetV2	6.5	15.7	68.4	model	model inference
ModNet-ResNet50_vd	92.2	151.6	29.0	model	model inference
ModNet-HRNet_W18	10.2	28.5	62.6	model	model inference
DIM-VGG16	28.4	175.5	30.4	model	model inference

2.下载预训练模型

%cd ~/PaddleSeg/Matting

!wget https://paddleseg.bj.bcebos.com/matting/models/human_matting-resnet34_vd.pdparams

/home/aistudio/PaddleSeg/Matting
--2022-08-25 19:26:12--  https://paddleseg.bj.bcebos.com/matting/models/human_matting-resnet34_vd.pdparams
Resolving paddleseg.bj.bcebos.com (paddleseg.bj.bcebos.com)... 100.67.200.6
Connecting to paddleseg.bj.bcebos.com (paddleseg.bj.bcebos.com)|100.67.200.6|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 255754333 (244M) [application/octet-stream]
Saving to: ‘human_matting-resnet34_vd.pdparams’

human_matting-resne 100%[===================>] 243.91M  85.5MB/s    in 2.9s    

2022-08-25 19:26:15 (85.5 MB/s) - ‘human_matting-resnet34_vd.pdparams’ saved [255754333/255754333]

2.抠图示例

%cd ~/PaddleSeg/Matting
!python tools/predict.py \
    --config ./configs/human_matting/human_matting-resnet34_vd.yml \
    --model_path ./human_matting-resnet34_vd.pdparams \
    --image_path  ~/aa/ \
    --save_dir ~/output/results

/home/aistudio/PaddleSeg/Matting
2022-08-25 19:27:30 [INFO]	
---------------Config Information---------------
batch_size: 4
iters: 50000
lr_scheduler:
  boundaries:
  - 30000
  - 40000
  type: PiecewiseDecay
  values:
  - 0.001
  - 0.0001
  - 1.0e-05
model:
  backbone:
    pretrained: https://paddleseg.bj.bcebos.com/matting/models/ResNet34_vd_pretrained/model.pdparams
    type: ResNet34_vd
  if_refine: true
  pretrained: null
  type: HumanMatting
optimizer:
  momentum: 0.9
  type: sgd
  weight_decay: 4.0e-05
train_dataset:
  dataset_root: data/PPM-100
  mode: train
  train_file: train.txt
  transforms:
  - type: LoadImages
  - scale:
    - 0.3
    - 1.5
    size:
    - 2048
    - 2048
    type: RandomResize
  - crop_size:
    - 2048
    - 2048
    type: RandomCrop
  - type: RandomDistort
  - prob: 0.1
    type: RandomBlur
  - type: RandomHorizontalFlip
  - target_size:
    - 2048
    - 2048
    type: Padding
  - type: Normalize
  type: MattingDataset
val_dataset:
  dataset_root: data/PPM-100
  get_trimap: false
  mode: val
  transforms:
  - type: LoadImages
  - short_size: 2048
    type: ResizeByShort
  - mult_int: 128
    type: ResizeToIntMult
  - type: Normalize
  type: MattingDataset
  val_file: val.txt
------------------------------------------------
/home/aistudio/PaddleSeg/paddleseg/cvlibs/config.py:341: UserWarning: `dataset_root` is not found. Is it correct?
  warnings.warn("`dataset_root` is not found. Is it correct?")
W0825 19:27:30.818305   542 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 10.1
W0825 19:27:30.818343   542 gpu_resources.cc:91] device: 0, cuDNN Version: 7.6.
2022-08-25 19:27:32 [INFO]	Loading pretrained model from https://paddleseg.bj.bcebos.com/matting/models/ResNet34_vd_pretrained/model.pdparams
Connecting to https://paddleseg.bj.bcebos.com/matting/models/ResNet34_vd_pretrained/model.pdparams
Downloading model.pdparams
[==================================================] 100.00%
2022-08-25 19:27:34 [INFO]	There are 195/195 variables loaded into ResNet_vd.
2022-08-25 19:27:34 [INFO]	Number of predict images = 1
2022-08-25 19:27:34 [INFO]	Loading pretrained model from ./human_matting-resnet34_vd.pdparams
2022-08-25 19:27:35 [INFO]	There are 486/486 variables loaded into HumanMatting.
2022-08-25 19:27:35 [INFO]	Start to predict...
1/1 [==============================] - 2s 2s/step - preprocess_cost: 0.3376 - infer_cost cost: 1.0974 - postprocess_cost: 0.3286
[0m

3.批量抠图

%cd ~/PaddleSeg/Matting

!python tools/predict.py \
    --config ./configs/human_matting/human_matting-resnet34_vd.yml \
    --model_path ./human_matting-resnet34_vd.pdparams \
    --image_path  ~/1_img/ \
    --save_dir ~/output/results

/home/aistudio/PaddleSeg/Matting
/home/aistudio/PaddleSeg/Matting
2022-08-25 19:28:23 [INFO]	
---------------Config Information---------------
batch_size: 4
iters: 50000
lr_scheduler:
  boundaries:
  - 30000
  - 40000
  type: PiecewiseDecay
  values:
  - 0.001
  - 0.0001
  - 1.0e-05
model:
  backbone:
    pretrained: https://paddleseg.bj.bcebos.com/matting/models/ResNet34_vd_pretrained/model.pdparams
    type: ResNet34_vd
  if_refine: true
  pretrained: null
  type: HumanMatting
optimizer:
  momentum: 0.9
  type: sgd
  weight_decay: 4.0e-05
train_dataset:
  dataset_root: data/PPM-100
  mode: train
  train_file: train.txt
  transforms:
  - type: LoadImages
  - scale:
    - 0.3
    - 1.5
    size:
    - 2048
    - 2048
    type: RandomResize
  - crop_size:
    - 2048
    - 2048
    type: RandomCrop
  - type: RandomDistort
  - prob: 0.1
    type: RandomBlur
  - type: RandomHorizontalFlip
  - target_size:
    - 2048
    - 2048
    type: Padding
  - type: Normalize
  type: MattingDataset
val_dataset:
  dataset_root: data/PPM-100
  get_trimap: false
  mode: val
  transforms:
  - type: LoadImages
  - short_size: 2048
    type: ResizeByShort
  - mult_int: 128
    type: ResizeToIntMult
  - type: Normalize
  type: MattingDataset
  val_file: val.txt
------------------------------------------------
/home/aistudio/PaddleSeg/paddleseg/cvlibs/config.py:341: UserWarning: `dataset_root` is not found. Is it correct?
  warnings.warn("`dataset_root` is not found. Is it correct?")
W0825 19:28:23.351861   789 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 10.1
W0825 19:28:23.351899   789 gpu_resources.cc:91] device: 0, cuDNN Version: 7.6.
2022-08-25 19:28:24 [INFO]	Loading pretrained model from https://paddleseg.bj.bcebos.com/matting/models/ResNet34_vd_pretrained/model.pdparams
2022-08-25 19:28:25 [INFO]	There are 195/195 variables loaded into ResNet_vd.
2022-08-25 19:28:25 [INFO]	Number of predict images = 1024
2022-08-25 19:28:25 [INFO]	Loading pretrained model from ./human_matting-resnet34_vd.pdparams
2022-08-25 19:28:25 [INFO]	There are 486/486 variables loaded into HumanMatting.
2022-08-25 19:28:25 [INFO]	Start to predict...
1024/1024 [==============================] - 745s 727ms/step - preprocess_cost: 0.3499 - infer_cost cost: 0.0576 - postprocess_cost: 0.31

四、人像抠图

1.合并图像

def BlendImg(fore_image, base_image, output_path):
    """
    将抠出的人物图像换背景
    fore_image: 前景图片，抠出的人物图片
    base_image: 背景图片
    """
    # 读入图片
    base_image = Image.open(base_image).convert('RGB')
    fore_image = Image.open(fore_image).resize(base_image.size)

    # 图片加权合成
    scope_map = np.array(fore_image)[:,:,-1] / 255
    scope_map = scope_map[:,:,np.newaxis]
    scope_map = np.repeat(scope_map, repeats=3, axis=2)
    res_image = np.multiply(scope_map, np.array(fore_image)[:,:,:3]) + np.multiply((1-scope_map), np.array(base_image))
    
    #保存图片
    res_image = Image.fromarray(np.uint8(res_image))
    res_image.save(output_path)

def BlendHumanImg(in_path, screen_path, out_path):
    img_len=len(os.listdir(in_path))//2
    print(img_len)
    for i in range(img_len):
        img_path = os.path.join(in_path , '%d_rgba.png' % (i))
        screen_path_file = os.path.join(screen_path ,'%d.jpg' % (i))
        output_path_img = os.path.join(out_path ,'%d.png' % i)
        BlendImg(img_path, screen_path_file, output_path_img)
   

def init_canvas(width, height, color=(255, 255, 255)):
    canvas = np.ones((height, width, 3), dtype="uint8")
    canvas[:] = color
    return canvas

def GetGreenScreen(width, height, out_path):
    canvas = init_canvas(width, height, color=(0, 255, 0))
    cv2.imwrite(out_path, canvas)

2.合并人物和背景图

%cd ~
import shutil

blend_path='blend'
FrameSeg_Path='output/results'
GreenScreen_Path='zjj'
if not os.path.exists(blend_path):
    os.mkdir(blend_path)
    os.mkdir(FrameSeg_Path)
else:
    shutil.rmtree(blend_path)
    shutil.rmtree(FrameSeg_Path)
BlendHumanImg(FrameSeg_Path, GreenScreen_Path, blend_path)

/home/aistudio
1024

2.合并视频

import cv2
import os

def CombVideo(in_path, out_path, size):
    fourcc = cv2.VideoWriter_fourcc(*'mp4v')
    # 第一个参数是要保存的文件的路径
    # fourcc 指定编码器
    # fps 要保存的视频的帧率
    # frameSize 要保存的文件的画面尺寸
    # isColor 指示是黑白画面还是彩色的画面
    out = cv2.VideoWriter(out_path,fourcc, 60.0, size, True)
    files = os.listdir(in_path)
    
    for i in range(len(files)):
        filename=os.path.join(in_path , '%d.png' % i)
        # print(filename)
        img = cv2.imread(filename)
        out.write(img) #保存帧
    out.release()

ComOut_Path='./combine.mp4'
blend_path='blend'

# 需要注意图像尺寸，应和图片尺寸保持一致，否则生成视频很小，无法播放
CombVideo(blend_path, ComOut_Path,(720,1280))

3.提取原视频音频

!pip install moviepy -q

import moviepy.editor as mp

# 采样率16k 保证和paddlespeech一致
def extract_audio(videos_file_path):   
     my_clip = mp.VideoFileClip(videos_file_path,audio_fps=16000)
     if (videos_file_path.split(".")[-1] == 'MP4' or videos_file_path.split(".")[-1] == 'mp4'):
          p = videos_file_path.split('.MP4')[0]
          my_clip.audio.write_audiofile(p + '_video.wav')
          new_path = p + '_video.wav'
     return new_path

extract_audio('video/1video.mp4')

MoviePy - Writing audio in video/1video.mp4_video.wav
MoviePy - Done.





'video/1video.mp4_video.wav'

4.加入原音

from moviepy.editor import *
"""
为视频添加一个背景音乐
多轨音频合成
"""
#需添加背景音乐的视频
video_clip = VideoFileClip(r'combine.mp4')
#提取视频对应的音频，并调节音量
# video_audio_clip = video_clip.audio.volumex(0.8)

#背景音乐
audio_clip = AudioFileClip(r'video/1video.mp4_video.wav').volumex(1)
#设置背景音乐循环，时间与视频时间一致
# audio = afx.audio_loop( audio_clip, duration=video_clip.duration)
#视频声音和背景音乐，音频叠加
# audio_clip_add = CompositeAudioClip([video_audio_clip,audio])

#视频写入背景音
final_video = video_clip.set_audio(audio_clip)

#将处理完成的视频保存
final_video.write_videofile("video_result.mp4")

chunk:   5%|▍         | 36/753 [00:00<00:02, 352.49it/s, now=None]

Moviepy - Building video video_result.mp4.
MoviePy - Writing audio in video_resultTEMP_MPY_wvf_snd.mp3


chunk:  10%|█         | 79/753 [00:00<00:01, 368.99it/s, now=None]t:   2%|▏         | 22/1025 [00:00<00:04, 214.51it/s, now=None]    

MoviePy - Done.
Moviepy - Writing video video_result.mp4



t:   4%|▍         | 44/1025 [00:00<00:04, 213.68it/s, now=None]