抱歉,您的浏览器无法访问本站
本页面需要浏览器支持(启用)JavaScript
了解详情 >

Hello World

人よ、幸福に生きろ!

简述

本文基于3DGS辐射场方法,提出了一种新的点云分布和优化策略框架,从而在维持3DGS visualization质量差别不大的情况下,实现个位数分钟级的训练速度。最终达到了render quality和optimization efficiency的trade-off.
该方法以不透明度确定的Gaussian importance为基础(core metric),提出(1)aggressive Gaussian densification,利用光线求交得出depth point重构点云(depth reinitialization),同时加入critical Gaussian identification&&aggresive Gaussian clone,防止depth reinitialization导致的点云数量爆炸和图像质量破坏的问题。(2)visiability Gaussian culling,每次训练都预计算visibility mask,从而加快计算过程。
最终减少点云数量,加快早期优化速度,并且使得高斯的密集化能够在一个很短的时期内完成。
在细节上,修改densification、culling框架代码,优化processCUDA光栅化进程,达到optimization speed, the number of Gaussians,and rendering quality的balance.

Gaussian importance

Compressing Volumetric Radiance Fields to 1 MB可知,大多数体素对渲染结果的影响最小,这表明网格模型中的冗余很大,并且可以在不降低渲染质量的情况下进行剪枝(prune)。

我们将Gaussian importance定义为该视图内累计不透明度前1%的高斯的不透明度。因为这种Gaussian importance对渲染结果影响较大。

aggressive Gaussian densification

depth reinitialization

找到该区域不透明度最大的椭球,将其与椭球光线求交得到的点云的中点作为深度点(depth point)

d(x)=dimaxmid(x),whereimax=argmaxiwid(x) = d^{\text{mid}}_{i_{\text{max}}}(x), \quad \text{where} \, i_{\text{max}} = \arg\max_{i} w_i

创建深度贴图,利用类似屏幕后处理的方法重建和合并点云,即depth initalization,从而加快早期的优化迭代。

然而,这种简单的点云优化方法,会导致场景不够泛化,点云数量爆炸和图像质量破坏的问题。
通过加入critical Gaussian identification&&aggresive Gaussian clone,我们可以解决这些问题。

cirtical Gaussian identification

在早期的优化迭代时期,几乎所有的训练过程中的Gaussian处于under-construction状态,单纯地进行clone可以认为是一种降低Gaussian importance的行为,不利于控制高斯数量,所以我们采用了represent the object surface。即cirtical Gaussian identification.

还是找到该区域内不透明度最大的椭球GimaxG_{i_{max}},将它的贡献作为alpha blend的估计

我们使用逆分布函数得到Gaussian critcal,其中βp\beta_p代表prune threshold,本文设置为0.99,得到的θp\theta_pGimaxG_{i_{max}}前1%的Gaussian,即Gaussian critical

θp=F1(βp)\theta_p = F^{-1}(\beta_p)

aggressive Gaussian clone

由于直接将原有的高斯作为点云的结构不利于早期优化,我们借鉴了3D Gaussian Splatting as Markov Chain Monte Carlo,达到了平滑的高斯密集化。将clone过程应用于所有的critical Gaussian,并且简化了高斯中心的计算过程,将clone数设置为2。即:

Pnew=PoldP_{\text{new}} = P_{\text{old}}

αnew=11αold\quad \alpha_{\text{new}} = 1 - \sqrt{1 - \alpha_{\text{old}}}

Σnew=(αold)2(2αnew(αnew)22)2Σold\quad \Sigma_{\text{new}} = (\alpha_{\text{old}})^2 \cdot \left( 2\alpha_{\text{new}} - \frac{(\alpha_{\text{new}})^2}{\sqrt{2}} \right)^{-2} \cdot \Sigma_{\text{old}}

Overall Aggressive Densification Pipeline

我们保留了3DGS原有的progressive densification,并且在500次迭代开始每250次进行一次critical Gaussian identification&&aggresive Gaussian clone,在2K次迭代开始进行Depth reinitialization。将整个密集化过程缩短至3K迭代,从而将整个30K次的迭代过程缩短至18K次。
并且整个渲染管线可以兼容Mini-Splatting framework,允许Mini-Splatting simplification在第3K次迭代和8K次迭代。

visibility Gaussian culling

对于第kk次训练视图,首先通过对所有与高斯 GiG_i相交的光线 jj的混合权重wijkw_{ij}^{k}求和来计算Gaussian importance IikI_i^k

具体计算公式为:

Iik=j=1JwijiI_i^k = \sum_{j=1}^{J} w_{ij}^i

其中JJ是与高斯GiG_i相交的总光线数。

可见性掩码VikV_i^k是通过一个指示函数 II来计算的,该函数比较IikI_i^k是否超过了预定义的阈值τ\tau
如果IikI_i^k大于阈值τ\tau,则Vik=1V_i^k = 1(表示可见);否则Vik=0V_i^k = 0(表示不可见)。
该计算公式为:

Vik=I(Iik>τ)V_i^k = \mathcal I(I_i^k > \tau)

确保只有在Gaussian importance前 1%的高斯才被认为是visability的。

并且在500到13K的迭代过程中,通过在每次训练视角下预计算visbility mask,剔除不必要的Gaussian,减小计算开销,加快计算过程。

Implementation

为了减小内存负担,在密集化过程中关闭了球谐系数的迭代优化,并且让训练在一个较小的分辨率下运行。

代码实现细节

cirtical Gaussian identification

这是整篇论文的关键点,这里利用分位数函数,选用了一个非常大的分位数0.99,筛选出重要性不足1%的所有高斯,每次选取的importance不同,可以达到非常好的效果,能够剪枝掉很多不必要的高斯。
这里需要理解这个概念,他是让importance排序再求前缀和,归一化,这样排序出来的就是逆分布函数,再得出阈值,进而进行剪枝。所以不是按照总数比例,而是重要性比例。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
## 逆分布函数 
def init_cdf_mask(importance, thres=1.0):
importance = importance.flatten()
if thres!=1.0:
percent_sum = thres
vals,idx = torch.sort(importance+(1e-6))
cumsum_val = torch.cumsum(vals, dim=0)
split_index = ((cumsum_val/vals.sum()) > (1-percent_sum)).nonzero().min()
split_val_nonprune = vals[split_index]

non_prune_mask = importance>split_val_nonprune
else:
non_prune_mask = torch.ones_like(importance).bool()

return non_prune_mask

depth reinitalization

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
if iteration == args.depth_reinit_iter:

num_depth = gaussians._xyz.shape[0]*args.num_depth_factor

# interesction_preserving for better point cloud reconstruction result at the early stage, not affect rendering quality
gaussians.interesction_preserving(scene, render_simp, iteration, args, pipe, background)
pts, rgb = gaussians.depth_reinit(scene, render_depth, iteration, num_depth, args, pipe, background)

gaussians.reinitial_pts(pts, rgb)

gaussians.training_setup(opt)
gaussians.init_culling(len(scene.getTrainCameras()))
mask_blur = torch.zeros(gaussians._xyz.shape[0], device='cuda')
torch.cuda.empty_cache()
# print(gaussians._xyz.shape)

先找到透明度最大的椭球

获取accum_weights,area_proj,area_max变量,我们可以看到,这里进行了一个简单的光栅化过程,
它们都是数组,(1)accum_weights是计算高斯贡献的累积不透明度,(2)area_proj是计算高斯贡献的不透明度的次数(覆盖的像素的个数),(3)area_max是计算这个高斯在每个像素渲染过程的所有高斯中贡献最大的次数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# in gaussian_renderer/init.py
def render_simp(viewpoint_camera, pc : GaussianModel, pipe, bg_color : torch.Tensor, scaling_modifier = 1.0, override_color = None, culling=None):
'''
...
'''
# Rasterize visible Gaussians to image, obtain their radii (on screen).
rendered_image, radii, \
accum_weights_ptr, accum_weights_count, accum_max_count = rasterizer.render_simp(
means3D = means3D,
means2D = means2D,
dc = dc,
shs = shs,
culling = culling,
colors_precomp = colors_precomp,
opacities = opacity,
scales = scales,
rotations = rotations,
cov3D_precomp = cov3D_precomp)

# Those Gaussians that were frustum culled or had a radius of 0 were not visible.
# They will be excluded from value updates used in the splitting criteria.
return {"render": rendered_image,
"viewspace_points": screenspace_points,
"visibility_filter" : (radii > 0).nonzero(),
"radii": radii,
"accum_weights": accum_weights_ptr,
"area_proj": accum_weights_count,
"area_max": accum_max_count,
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
// in forward.cu
template <uint32_t CHANNELS>
__global__ void __launch_bounds__(BLOCK_X * BLOCK_Y)
render_simpCUDA(
const uint2* __restrict__ ranges,
const uint32_t* __restrict__ point_list,
int W, int H,
const float2* __restrict__ points_xy_image,
const float* __restrict__ features,

float* __restrict__ accum_weights_p,
int* __restrict__ accum_weights_count,
float* __restrict__ accum_max_count,

const float4* __restrict__ conic_opacity,
float* __restrict__ final_T,
uint32_t* __restrict__ n_contrib,
const float* __restrict__ bg_color,
float* __restrict__ out_color
)
{
// Identify current tile and associated min/max pixel range.
auto block = cg::this_thread_block();
uint32_t horizontal_blocks = (W + BLOCK_X - 1) / BLOCK_X;
uint2 pix_min = { block.group_index().x * BLOCK_X, block.group_index().y * BLOCK_Y };
uint2 pix_max = { min(pix_min.x + BLOCK_X, W), min(pix_min.y + BLOCK_Y , H) };
uint2 pix = { pix_min.x + block.thread_index().x, pix_min.y + block.thread_index().y };
uint32_t pix_id = W * pix.y + pix.x;
float2 pixf = { (float)pix.x, (float)pix.y };

// Check if this thread is associated with a valid pixel or outside.
// Done threads can help with fetching, but don't rasterize
// Load start/end range of IDs to process in bit sorted list.
// Allocate storage for batches of collectively fetched data.
// Initialize helper variables
// ...

// Iterate over batches until all done or range is complete
// ...

// Iterate over current batch
for (int j = 0; !done && j < min(BLOCK_SIZE, toDo); j++)
{

// Keep track of current position in range
contributor++;

// Resample using conic matrix (cf. "Surface
// Splatting" by Zwicker et al., 2001)
float2 xy = collected_xy[j];
float2 d = { xy.x - pixf.x, xy.y - pixf.y };
float4 con_o = collected_conic_opacity[j];
float power = -0.5f * (con_o.x * d.x * d.x + con_o.z * d.y * d.y) - con_o.y * d.x * d.y;
if (power > 0.0f)
continue;

// Eq. (2) from 3D Gaussian splatting paper.
// Obtain alpha by multiplying with Gaussian opacity
// and its exponential falloff from mean.
// Avoid numerical instabilities (see paper appendix).
float alpha = min(0.99f, con_o.w * exp(power));
if (alpha < 1.0f / 255.0f)
continue;
float test_T = T * (1 - alpha);
if (test_T < 0.0001f)
{
done = true;
continue;
}

// Eq. (3) from 3D Gaussian splatting paper.
for (int ch = 0; ch < CHANNELS; ch++)
C[ch] += features[collected_id[j] * CHANNELS + ch] * alpha * T;

if(weight_max<alpha * T)
{
weight_max=alpha * T;
idx_max = collected_id[j];
flag_update = 1;
}

atomicAdd(&(accum_weights_p[collected_id[j]]), alpha * T);
atomicAdd(&(accum_weights_count[collected_id[j]]), 1);

T = test_T;

// Keep track of last range entry to update this
// pixel.
last_contributor = contributor;
}
}

if(flag_update==1)
{
atomicAdd(&(accum_max_count[idx_max]), 1);
}


// All threads that treat valid pixel write out their final
// rendering data to the frame and auxiliary buffers.
// ...
}

首先进行intersection_preserving对高斯进行筛选,对于每个训练视图,均进行光栅化,这样就能得到高斯对这些视图做的贡献相关信息,在这里,我将做出了最大贡献的高斯命名为有效高斯,剩余的就是被较高权重高斯完全覆盖的高斯,对渲染影响较小,可以舍去。重要性的计算分为outdoor和indoor,outdoor则需要加权,累计单位面积的不透明度贡献值,至于为什么这样做,我认为可能是outdoor空白场景较多,点云稀疏,高斯比较大,容易受到投影面积的影响。
接下来就是对重要性进行排序,剪枝贡献不透明度不到1%的所有高斯以及被较大透明度高斯覆盖的高斯(即Gaussian culling)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
def interesction_preserving(self, scene, render_simp, iteration, args, pipe, background):

imp_score = torch.zeros(self._xyz.shape[0]).cuda()
accum_area_max = torch.zeros(self._xyz.shape[0]).cuda()
views = scene.getTrainCameras_warn_up(iteration, args.warn_until_iter, scale=1.0, scale2=2.0).copy()
for view in views:
render_pkg = render_simp(view, self, pipe, background, culling=self._culling[:,view.uid])

accum_weights = render_pkg["accum_weights"]
area_proj = render_pkg["area_proj"]
area_max = render_pkg["area_max"]

accum_area_max = accum_area_max+area_max

if args.imp_metric=='outdoor':
mask_t=area_max!=0
temp=imp_score+accum_weights/area_proj
imp_score[mask_t] = temp[mask_t]
else:
imp_score=imp_score+accum_weights

imp_score[accum_area_max==0]=0
non_prune_mask = init_cdf_mask(importance=imp_score, thres=0.99)
self.prune_points(non_prune_mask==False)

return self._xyz, SH2RGB(self._features_dc+0)[:,0]

接下来就是depth_reinit过程了,得到accum_alpha累计透明度(这个似乎和深度没关系)和out_pts表示的向量信息(方向表示光线方向,模长是深度),且为定义的深度点(光线相交椭球的中点)。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
template <uint32_t CHANNELS>
__global__ void __launch_bounds__(BLOCK_X * BLOCK_Y)
render_depthCUDA(
const uint2* __restrict__ ranges,
const uint32_t* __restrict__ point_list,
int W, int H,
const float2* __restrict__ points_xy_image,
const float* __restrict__ features,

const float4* __restrict__ conic_opacity,
float* __restrict__ final_T,
uint32_t* __restrict__ n_contrib,
const float* __restrict__ bg_color,

float* __restrict__ out_color,
float* __restrict__ out_pts,
float* __restrict__ out_depth,
float* accum_alpha,
int* __restrict__ gidx,
float* __restrict__ discriminants,

const float* __restrict__ means3D,
const glm::vec3* __restrict__ scales,
const glm::vec4* __restrict__ rotations,

const float* __restrict__ viewmatrix,
const float* __restrict__ projmatrix,
const glm::vec3* __restrict__ cam_pos
)
{

float3 p_proj_r = { Pix2ndc(pixf.x, W), Pix2ndc(pixf.y, H), 1};

//inverse process of 'Transform point by projecting'
float p_hom_x_r = p_proj_r.x*(1.0000001);
float p_hom_y_r = p_proj_r.y*(1.0000001);
// self.zfar = 100.0, self.znear = 0.01
float p_hom_z_r = (100-100*0.01)/(100-0.01);
float p_hom_w_r = 1;


float3 p_hom_r={p_hom_x_r, p_hom_y_r, p_hom_z_r};
float4 p_orig_r=transformPoint4x4(p_hom_r, projmatrix_inv);

glm::vec3 ray_direction={
p_orig_r.x-ray_origin.x,
p_orig_r.y-ray_origin.y,
p_orig_r.z-ray_origin.z,
};
glm::vec3 normalized_ray_direction = glm::normalize(ray_direction);

// Iterate over batches until all done or range is complete
// ...


// Iterate over current batch
for (int j = 0; !done && j < min(BLOCK_SIZE, toDo); j++)
{


// Resample using conic matrix (cf. "Surface
// Splatting" by Zwicker et al., 2001)
float2 xy = collected_xy[j];
float2 d = { xy.x - pixf.x, xy.y - pixf.y };
float4 con_o = collected_conic_opacity[j];
float power = -0.5f * (con_o.x * d.x * d.x + con_o.z * d.y * d.y) - con_o.y * d.x * d.y;
if (power > 0.0f)
continue;

float alpha = min(0.99f, con_o.w * exp(power));
if (alpha < 1.0f / 255.0f)
continue;

float test_T = T * (1 - alpha);
if (test_T < 0.0001f)
{
done = true;
continue;
}

for (int ch = 0; ch < CHANNELS; ch++)
C[ch] += features[collected_id[j] * CHANNELS + ch] * alpha * T;

// compute Gaussian depth
// Normalize quaternion to get valid rotation
glm::vec4 q = rotations[collected_id[j]];// / glm::length(rot);
float rot_r = q.x;
float rot_x = q.y;
float rot_y = q.z;
float rot_z = q.w;


// Compute rotation matrix from quaternion
// ...


glm::vec3 temp={
ray_origin.x-means3D[3*collected_id[j]+0],
ray_origin.y-means3D[3*collected_id[j]+1],
ray_origin.z-means3D[3*collected_id[j]+2],
};
glm::vec3 rotated_ray_origin = R * temp;
glm::vec3 rotated_ray_direction = R * normalized_ray_direction;

///除法映射到椭球标准坐标系
glm::vec3 a_t= rotated_ray_direction/(scales[collected_id[j]]*3.0f)*rotated_ray_direction/(scales[collected_id[j]]*3.0f);
float a = a_t.x + a_t.y + a_t.z;

glm::vec3 b_t= rotated_ray_direction/(scales[collected_id[j]]*3.0f)*rotated_ray_origin/(scales[collected_id[j]]*3.0f);
float b = 2*(b_t.x + b_t.y + b_t.z);

glm::vec3 c_t= rotated_ray_origin/(scales[collected_id[j]]*3.0f)*rotated_ray_origin/(scales[collected_id[j]]*3.0f);
float c = c_t.x + c_t.y + c_t.z-1;

float discriminant=b*b-4*a*c;

float depth = (-b/2/a)/glm::length(ray_direction);


if(depth<0)
continue;

if(weight_max<alpha * T)
{
weight_max=alpha * T;
depth_max=depth;
discriminant_max=discriminant;
idx_max=collected_id[j];

point_rec = ray_origin+(-b/2/a)*normalized_ray_direction;
}

T = test_T;
last_contributor = contributor;
}

}

// All threads that treat valid pixel write out their final
// rendering data to the frame and auxiliary buffers.
if (inside)
{
final_T[pix_id] = T;
n_contrib[pix_id] = last_contributor;
for (int ch = 0; ch < CHANNELS; ch++)
out_color[ch * H * W + pix_id] = C[ch] + T * bg_color[ch];
for (int ch = 0; ch < 3; ch++)
out_pts[ch * H * W + pix_id] = point_rec[ch];

out_depth[pix_id] = depth_max;
accum_alpha[pix_id] = T;
discriminants[pix_id] = discriminant_max;
gidx[pix_id]=idx_max;
}
}

选取一定比例的深度点(比例根据累积不透明度和自己的定义调整)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
def depth_reinit(self, scene, render_depth, iteration, num_depth, args, pipe, background):

out_pts_list=[]
gt_list=[]
## 预热(warn up)时使用scale2,即(width/2 x height/2),0.5倍的分辨率
views = scene.getTrainCameras_warn_up(iteration, args.warn_until_iter, scale=1.0, scale2=2.0).copy()

## 每个视图都要计算
for view in views:
gt = view.original_image[0:3, :, :]

## 渲染深度图,获取相关信息
render_depth_pkg = render_depth(view, self, pipe, background, culling=self._culling[:,view.uid])


out_pts = render_depth_pkg["out_pts"]
accum_alpha = render_depth_pkg["accum_alpha"]


prob=1-accum_alpha

prob = prob/prob.sum()
prob = prob.reshape(-1).cpu().numpy()

## 根据深度选择采样比例
factor=1/(gt.shape[1]*gt.shape[2]*len(views)/num_depth)

N_xyz=prob.shape[0]
num_sampled=int(N_xyz*factor)

## 加权随机采样,prob越高(opacity越高)的点云越容易被选中
indices = np.random.choice(N_xyz, size=num_sampled,
p=prob,replace=False)

'''
print(f"Normalized prob: {prob}")
print(f"Reshaped and moved to numpy: {prob}")
print(f"Factor: {factor}")
print(f"N_xyz (Total samples): {N_xyz}")
print(f"Number of sampled points: {num_sampled}")
print(f"Sampled indices: {indices}")
'''

out_pts = out_pts.permute(1,2,0).reshape(-1,3)
gt = gt.permute(1,2,0).reshape(-1,3)

out_pts_list.append(out_pts[indices])
gt_list.append(gt[indices])


out_pts_merged=torch.cat(out_pts_list)
gt_merged=torch.cat(gt_list)

return out_pts_merged, gt_merged

最终只保留深度点和相应的颜色,之后在进行合并

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
## after depth reinit
def reinitial_pts(self, pts, rgb):

fused_point_cloud = pts
fused_color = RGB2SH(rgb)
features = torch.zeros((fused_color.shape[0], 3, (self.max_sh_degree + 1) ** 2)).float().cuda()
features[:, :3, 0 ] = fused_color
features[:, 3:, 1:] = 0.0

# print("Number of points at initialisation : ", fused_point_cloud.shape[0])

dist2 = torch.clamp_min(distCUDA2(fused_point_cloud), 0.0000001)
scales = torch.log(torch.sqrt(dist2))[...,None].repeat(1, 3)
rots = torch.zeros((fused_point_cloud.shape[0], 4), device="cuda")
rots[:, 0] = 1

opacities = inverse_sigmoid(0.1 * torch.ones((fused_point_cloud.shape[0], 1), dtype=torch.float, device="cuda"))

self._xyz = nn.Parameter(fused_point_cloud.contiguous().requires_grad_(True))
self._features_dc = nn.Parameter(features[:,:,0:1].transpose(1, 2).contiguous().requires_grad_(True))
self._features_rest = nn.Parameter(features[:,:,1:].transpose(1, 2).contiguous().requires_grad_(True))
self._scaling = nn.Parameter(scales.requires_grad_(True))
self._rotation = nn.Parameter(rots.requires_grad_(True))
self._opacity = nn.Parameter(opacities.requires_grad_(True))
self.max_radii2D = torch.zeros((self.get_xyz.shape[0]), device="cuda")

可以看到,作者通过控制比例,使得depth_reinit前后点云数量基本相等,但是根据深度贴图实现了点的重构

aggressive Guassian densification

根据论文,clone时保持gaussian的opacity会隐式地放大致密高斯的影响,从而破坏高斯的优化过程
所以clone时,修改高斯的表达,具体代码如下:
这里只修改scaling和opacity即可,并且做了一个trick,修改后的opacity需大于0.0051,小于(小于1的最大浮点数)

1
2
3
4
5
6
7
8
9
10
11
12
def clone(self, selected_pts_mask):

# ...

temp_opacity_old = self.get_opacity[selected_pts_mask]
new_opacity = 1-(1-temp_opacity_old)**0.5
temp_scale_old = self.get_scaling[selected_pts_mask]
new_scaling = (temp_opacity_old / (2*new_opacity-0.5**0.5*new_opacity**2)) * temp_scale_old
new_opacity = torch.clamp(new_opacity, max=1.0 - torch.finfo(torch.float32).eps, min=0.0051)

# ...

Mini-Splatting simplification

simplification 1

对高斯的累计贡献透明度进行剪枝,然后进行重要性采样,设置的采样比例为0.6

1
2
3
4
5
6
7
8
9
10
11
def interesction_sampling(self, scene, render_simp, iteration, args, pipe, background):
imp_score[accum_area_max==0]=0
prob = imp_score/imp_score.sum()
prob = prob.cpu().numpy()


factor=args.sampling_factor
N_xyz=self._xyz.shape[0]
num_sampled=int(N_xyz*factor*((prob!=0).sum()/prob.shape[0]))
indices = np.random.choice(N_xyz, size=num_sampled,
p=prob, replace=False)
simplification 2

进行与depth_reinit之前相同的重要性剪枝

Mini-Splatting2与Mini-Splatting2-D的区别

在进行Mini-Splatting simplification时,剪枝不考虑outdoor和indoor的区别,即全部使用高斯的累计不透明度
并且simplification1时不进行重要性采样

总结

这篇论文还是写的不错的,也是我独立精度的第一篇论文。这篇论文的改进主要是针对优化过程进行分析,我认为创新点有(1)使用类似屏幕后处理的方法,增加render_depth和render_simp函数,对视图进行分析,重建深度和累计不透明度。并且(2)提出重要性的概念,让高斯的累计不透明度这个信息充分发挥,同时可以很好地进行visibility Gaussian culling剪枝。
以高斯为基底进行图形学光栅化的改进是一种很好的思路,但是我感觉要做很长时间的实验才能够work

limitation

(1)aggressive densification 在indoor环境下通过importance Gaussian保留下来的Guassian较多,具有较高的storage占用,他这里说是为了保持indoor场景下floor和wall的渲染质量,也确实和重要性剪枝相对应
(2)多次visualization culling导致了大量的memory占用

Simplification过程通过修改cuda实现,必然会导致大量的memory占用
Densification过程通过重要性实现,必然会导致在某些场景上过度Densification

个人思考

mini-splatting是没有实现通用的Densification,这里实现了通用的Densification,可是在storage上面又出现了问题

    正如Mini-Splatting simplification所做,尝试在某些特定的阶段进行simplication也许可以进一步达到Gaussian number、render quality和optimization speed的trade-off?

延续mini-splatting的结合高斯权重mask的simplification。visibility culling是一个很容易想到的思路,可是毕竟是三维重建,固定视图流程的culling势必会导致memory overhead。

    尝试不同方法的simplification,或者是降低culling频率,总之是尝试更好的和Gaussian相适应的simplification?

这里他说的优化早期迭代,确实是解决了3DGS早期Gaussian under-restruction的问题,

    单纯 depth reinitialization 和 aggressive densification似乎比较突兀? 

评论