PPSIG：静态的运动场景中人物三维渲染

使用COLMAP获取照片位姿，PP-Humanseg抠出人物，NeRF进行2D转3D的渲染

AI Studio

524人浏览 · 2022-09-03 17:25:38

AI Studio · 2022-09-03 17:25:38 发布

PPSIG：静态的运动场景中人物三维渲染

1. 项目背景

在体育比赛时，如果能360°的展示一个选手的精彩运动瞬间，那将是一个非常炫酷和赏心悦目的事
本项目使用《舞动风暴》节目中的风暴时刻的数据，以及COLMAP、PP-Humanseg、NeRF等方法来展示如何对静态的运动场景中人物进行任意视角的三维渲染

2. 效果展示

本项目的效果如下。由于运动场景中主角是人，所以使用PP-Humanseg将人像自动抠出，但由于部分图像背景剔除不干净，所以导致有的视角渲染的效果不佳

视角一	视角二

3. 数据准备

3.1 数据获取

本项目所用数据为《舞动风暴》节目中的风暴时刻的片段，片段是180°的相机布设拍摄得到的视频，从视频中获取图像数据
从B站中选用的视频片段为黄琛迪的爱不释手，部分图片如下所示

正面	侧面

3.2 像片位姿获取

使用COLMAP对像片的姿态进行获取，具体使用步骤克参考AI Studio项目COLMAP+NeRF，使用自己手机拍摄的2D照片3D渲染
使用COLMAP进行稀疏重建，解算出每张像片的位姿，示意图如下图所示：

在这里插入图片描述

注：获取的图像以及位姿数据已经压缩为Dance.zip文件，并上传到work文件夹下

3.3 PP-Humanseg抠出人像

使用PaddleSeg提供的PP-Humanseg模型处理图像数据，剔除背景并替换为黑色，具体步骤如下：

# 克隆仓库
!git clone https://github.com/PaddlePaddle/PaddleSeg

# 安装依赖
%cd PaddleSeg
!pip install -r requirements.txt

# 进入PP-Huamanseg页面
%cd ./contrib/PP-HumanSeg

/home/aistudio/PaddleSeg/contrib/PP-HumanSeg

# 下载要使用的权重
!wget https://paddleseg.bj.bcebos.com/dygraph/pp_humanseg_v2/human_pp_humansegv1_server_512x512_inference_model_with_softmax.zip

# 解压数据
%cd /home/aistudio
!unzip -qo work/Dance.zip -d work/Dance

/home/aistudio

!unzip -qo ./PaddleSeg/contrib/PP-HumanSeg/human_pp_humansegv1_server_512x512_inference_model_with_softmax.zip \
-d ./PaddleSeg/contrib/PP-HumanSeg/inference_model

!cp ./work/my_seg_demo.py ./PaddleSeg/contrib/PP-HumanSeg/src/

!python ./PaddleSeg/contrib/PP-HumanSeg/src/my_seg_demo.py \
  --config PaddleSeg/contrib/PP-HumanSeg/inference_model/human_pp_humansegv1_server_512x512_inference_model_with_softmax/deploy.yaml \
  --folder_path ./work/Dance/images/ \
  --save_dir ./work/Dance/images_result

!rm -r work/Dance/images
!mv work/Dance/images_result work/Dance/images

抠除人像之后的效果如下图所示：

正面	侧面

注：可以看到效果还是不错的，之所以没有先抠图再获取位姿，是因为进行稀疏重建时是需要特征点匹配的，如果先抠图会导致提取出来的特征点比较少

3.4 生成LIFF格式数据

获取位姿之后，要进行NeRF训练，还需要将其转换为可训练的数据格式，本项目选择的是LIFF格式

# 配置环境，除了paddlepaddle-gpu==2.2.2，还需安装requirements.txt的
%cd /home/aistudio/nerf-paddle/
!pip install -r requirements.txt

# 复制文件到相应文件夹下
!cp -r ../work/Dance ./data/nerf_llff_data/

!python gen_poses.py --scenedir ./data/nerf_llff_data/Dance

Don't need to run COLMAP
Post-colmap
Cameras 5
Images # 57
Points (4053, 3) Visibility (4053, 57)
Depth stats 0.996353436955536 24.6734418020776 6.026533703030671
Done with imgs2poses

4. 2D转3D的NeRF渲染

在生成训练所需要的LIFF数据之后，使用NeRF对该场景的隐式神经辐射场训练
训练所用的配置文件在configs文件夹下，为dance.txt文件

4.1 训练NeRF

执行以下脚本，训练Dance场景

!python run_nerf.py --config configs/dance.txt

4.2 测试NeRF

执行以下脚本，测试训练了200k次iteration的权重效果
由于该场景增加了渲染时细采样的次数，时间较久，约40分钟

!python run_nerf.py --config configs/dance.txt --ft_path ./logs/dance_test/200000.pdparams --render_only

4.3 渲染特定位姿的图像

NeRF训练好的隐式神经辐射场，是可以从任意视角渲染该场景，因此本项目增加一个脚本，可以指定要渲染的照片其拍摄的位姿，就能生成该位姿下的图像
单张影像的位姿数据为3×5的矩阵，包括3×3的旋转矩阵、3×1的位移矩阵以及3×1的相机内参，其中相机内参是确定的，其他参数可以自己设置
将位姿数据保存为.txt格式即可使用test_render_poses.py脚本生成该场景下特定位姿下拍照的图像

# 读取位姿数据并展示
import numpy as np

txt_path = r"../poses/poses_21.txt"
pose = np.loadtxt(txt_path)
print(pose)

[[ 9.08856270e-01 -6.12721000e-02 -4.12584420e-01 -1.64151251e+00
   6.80000000e+02]
 [ 9.87692700e-02  9.92634000e-01  7.01584900e-02  2.85931710e-01
   6.80000000e+02]
 [ 4.05246620e-01 -1.04514660e-01  9.08213500e-01  1.23346102e+00
   2.23211304e+03]]

# rdpose_path 为待渲染的位姿数据
!python test_render_poses.py --config configs/dance.txt --ft_path ./logs/dance_test/200000.pdparams --render_only \
--rdpose_path ../poses/poses_21.txt

/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/__init__.py:107: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import MutableMapping
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/rcsetup.py:20: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Iterable, Mapping
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/colors.py:53: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Sized
Loaded image data (680, 680, 3, 57) [ 680.          680.         2232.11305997]
Loaded ./data/nerf_llff_data/Dance 1.5951292874260443 23.42002644537729
recentered (3, 5)
[[ 1.0000000e+00  2.6874478e-09  2.5313341e-08 -4.1827821e-09]
 [-2.6874483e-09  1.0000000e+00  1.7297449e-08 -1.2548346e-08]
 [-2.5313339e-08 -1.7297449e-08  1.0000000e+00  3.7645037e-08]]
Data:
(57, 3, 5) (57, 680, 680, 3) (57, 2)
HOLDOUT view is 28
Loaded llff (57, 680, 680, 3) (1, 3, 5) [ 680.     680.    2232.113] ./data/nerf_llff_data/Dance
Auto LLFF holdout, 8
DEFINING BOUNDS
NEAR FAR 0.0 1.0
W0828 22:58:48.514714   834 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 10.1
W0828 22:58:48.519577   834 device_context.cc:465] device: 0, cuDNN Version: 7.6.
Found ckpts ['./logs/dance_test/200000.pdparams']
Reloading from ./logs/dance_test/200000.pdparams
RENDER ONLY
test poses shape [1, 3, 5]
  0%|                                                     | 0/1 [00:00<?, ?it/s]0 0.0027091503143310547
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/tensor/creation.py:130: DeprecationWarning: `np.object` is a deprecated alias for the builtin `object`. To silence this warning, use `object` by itself. Doing this will not modify any behavior and is safe. 
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  if data.dtype == np.object:
[680, 680, 3] [680, 680]
100%|█████████████████████████████████████████████| 1/1 [00:21<00:00, 21.31s/it]
Done rendering ./logs/dance_test/renderonly_path_199999
IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (680, 680) to (688, 688) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).
[1;34m[swscaler @ 0x5917280] [0m[0;33mWarning: data is not aligned! This can lead to a speed loss
[0m

# 展示渲染的该图片
import matplotlib.pyplot as plt
from PIL import Image
%matplotlib inline

render_img = Image.open(r"./logs/dance_test/renderonly_path_199999/000.png")
plt.figure(figsize=(8,8))
plt.imshow(render_img)
plt.axis("off")
plt.show()

在这里插入图片描述

5. 总结

本项目通过COLMAP获取照片位姿，PP-Humanseg抠出人物，NeRF进行2D转3D的渲染，基本能达到静态的运动场景中人物三维渲染的目的
不足之处：
- 由于图像中背景部分有颜色和躯体很接近，所以抠人像出来的时候，并没有能很好的抠出干净的人像，导致训练的NeRF场景有部分视角不是很清晰
- 训练的数据是由180°布设的相机拍摄的照片，而想推广到现实中的应用，需要做到单目、或者稀疏相机拍摄的图像就能够生成相应的渲染图，最初版本的NeRF并不满足此需求。但是近段时间开源一些优秀作品，如HumanNeRF有很大可能完成这项任务，下一阶段是把这些优秀的作品使用paddle复现出来

此文章为搬运
原项目链接