Toward Accurate Cultural Asset Digitization: Analyzing on Object-centric 3D Reconstruction

1Department of Metaverse Convergence, Chung-Ang University, South Korea    Corresponding Author
Our method achieves both high-fidelity rendering and geometrically accurate mesh reconstruction, resolving the trade-off inherent in prior dynamic Gaussian Splatting methods.

Qualitative comparison of 3D reconstruction samples for Yeonjeok, White Porcelain Ritual Vessel, and Seal. The first column shows reconstructions with the background preserved, while the second column illustrates results after applying our method.

Abstract

High-fidelity 3D reconstruction is essential for the preservation, restoration, and digital archiving of cultural asset. However, state-of-the-art methods like Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) are sensitive to background elements, which can degrade the quality of the results. This challenge is particularly pronounced for cultural assets with intricate patterns and material properties, where inaccurate background segmentation can lead to significant artifacts. This research aims to systematically analyze the impact of background segmentation precision on the quality of 3D reconstruction. To this end, we developed an Iterative Segmentation technique to generate masks of varying accuracy by progressively refining them. By applying these intermediate masks to 3DGS reconstruction at each step, we quantitatively demonstrate the overall improvement in reconstruction quality (measured by PSNR and SSIM) as segmentation accuracy increases. Furthermore, using the final refined mask as a baseline, we conduct a controlled study to compare the distinct effects of intentional “under-segmentation” (background leakage) and “over-segmentation” (object erosion) errors on the geometric and photometric fidelity of the 3D model. Our work provides a quantitative analysis of how different segmentation error types affect the reconstruction of cultural assets, underscoring the critical need for precise foreground isolation to achieve high-fidelity results.

Video Results

Each video shows our reconstructed dynamic scene (Gaussian rendering + mesh) across time.

Girlwalk

Hook

Jumping Jacks

Stand Up

Video Comparison

Drag the divider left or right to compare our method with DG-Mesh.

Horse

DG-Mesh Ours

Beagle

DG-Mesh Ours

Bird

DG-Mesh Ours

3D Results

Drag to rotate  ·  Scroll to zoom  ·  Right-drag to pan

Bird

drag / scroll / right-drag

Duck

drag / scroll / right-drag

T-Rex

drag / scroll / right-drag

Method

Overview of the proposed framework integrating MCDM, SAAD, and RBER.

Overview of our iterative mask refinement method. First, we predict the candidate masks that could be the mask. Second, we select the mask with the highest probability of being an object among the candidate masks. We repeat the first and second steps, and finally, we refine the best mask obtained through this iteration into an object-centric mask.

Iterative Segmentation
This iterative segmentation process contributes to generating more consistent and refined masks. We begin by defining a bounding box (BB) covering the entire image area, expressed as: T_"BB" =[(x_1,y_1,x_2,y_2 )]. Here, T_"BB" is a two-dimensional tensor representing the BB, where each box is specified by the top-left and bottom-right coordinates. To apply Segment Anything Model (SAM) in the entire image region, we set T_"BB" =[(0,0,1,1)]. Subsequently, SAM is applied with T_"BB" and I_"Init" as inputs. In this context, SAM is represented as the function F_"SAM" and F_"best" is best selection process (detailed in Sec. 3.2). The F_"SAM" outputs a total of three binary masks, denoted as M_L. For each of binary masks M_L, the object representation quality is evaluated, and the most suitable one is selected in each iteration level. This process is iterated L times until the generated mask converges to a sufficiently stable result. From the second iteration onward, the previous iteration's Best Mask, 〖F_"best" (M〗_(l-1)), is multiplied with initial image I_"Init" and used as the new input image instead of I_"Init" . This process can be formally described as: M_l= F_"SAM" (I_Init×〖F_"best" (M〗_(l-1)),T_"BB" ). Here, M_l represents the set of candidate masks generated at the current iteration l, while F_"best" (M_(l-1)) denotes the single best mask selected from the previous iteration, l-1. For the initial iteration where l-1, F_"best" (M_0) is effectively a mask of all ones, meaning the process starts with the original, unmasked image I_Init. This iterative process is repeated for a total of L times to ensure convergence. For our experiments, we set the total number of iterations L=5. The final background-removed image, I_final, is then produced by applying the best mask from the last iteration, M_L, to the initial image: I_final=I_Init×〖F_"best" (M〗_L)
Best Mask Selecting
The selection of the optimal mask, m^*, from the set of candidates M_l ={m_1,m_2,m_3} generated at each iteration l is formulated as an optimization problem. We aim to find the mask that maximizes a comprehensive scoring function, S(m), defined as: m^*=argmax┬(m∈M_l ) S(m). The scoring function evaluates the plausibility of a given mask by balancing a set of positive scores, S_p (m), and negative penalties, P_n (m). The overall function is expressed as: S(m) = S_p (m) - P_n (m).
Mask Refinement
Each candidate mask is scored and the highest-scoring one is refined with a four-stage pipeline. (1) Apply Gaussian smoothing to the raw mask m_raw and re-binarize at τ=0.5 to suppress small noise. (2) Perform morphological erosion to remove thin outlines and isolated artifacts. (3) Central contour selection for each contour k, compute: S_cnt (k)=d_k/(d_max )⋅ω_dist+√(A_k )/√(A_max )⋅ω_area, where d_k is the Euclidean distance from the contour centroid to the image center, A_k is the contour area. d_max and A_max normalize distance and area respectively. Here we use ω_dist=0.6 and ω_area=0.4, and retain the contour with the largest S_cnt (k). (4) Apply a final erosion–dilation step to remove residual artifacts and recover structural coherence, yielding m_refine. The background-subtracted result is I_result = I_Init× m_refine.

Qualitative Comparison

Qualitative comparison with DG-Mesh and Dynamic-2DGS on DG-Mesh and D-NeRF datasets.

Ablation of iteration count (1 vs 5) on Glass Case, Glasses Frame Manufacturing Jig, and Korea Earthenware: progressive suppression of background noise and finer surface recovery.

Ablation Study

Effectiveness of MCDM

Ablation on MCDM: baseline vs. multi-canonical deformation across an extended temporal sequence.

NO IMAGES


Effectiveness of SAAD

Ablation on SAAD: isotropic densification vs. surface-aligned anisotropic densification.

NO IMAGES

BibTeX

@article{TODO,
  author  = {TODO},
  title   = {Toward Accurate Cultural Asset Digitization: Analyzing on Object-centric 3D Reconstruction},
  journal = {TODO},
  year    = {2026}
}