Center-based 3D Object Detection and Tracking (Lab Report Sample)

Instructions:

This is a graduate student assignment to reproduce CenterPoint and MVP
and decipher key parts of the paper. The assignment consists of three parts: interpretation of the paper, resolution of important bugs in the code, and experimental results. The author received a positive review from the professor.

source..

Content:

edition
version number

time

explain

1.0

2022-3-2

Add the CenterPoint section

1.1

2022-3-2

Increase the MVP part

thesis
Center-based 3D Object Detection and Tracking
The authors excluding the original rectangular frame anchor during the rotation does not perfectly express the object.2D target detection does not encounter this problem, while 3D target detection needs to be faced.CenterPoint understands target detection as key point detection + attribute estimation.Here the detection task of key points has covered positioning and classification, the authors defined the heatmap output of k layer, k is the category, the length and width of heatmap are consistent with BEV, the output you and the target-centered Gaussian distribution, thus solving the problem of positive and negative categories of dichotomy and expanding the supervision scope of positive.For attribute regression, the authors returned to fine positioning, relative height difference, 3D size, and yaw rotation angle, respectively
From a CSDN blog listing typical anchor-free OD:
* center-based representation: such as Centerpoint
* The point-based representation: such as PointRCNN, 3DSSD
* pillar-base representation: such as POD
CenterNet vs CenterPoint.Although the overall idea is similar to CenterNet, CenterPoint also has the unique characteristics of 3 D detector:
1. In 3 D detection, the backbone network needs to learn the rotational invariance and isovariability of the target.To make the network better capture this feature, the authors added a variable convolution each between the center point prediction branch and the regression branch.Central points predict branch learning rotation invariance and regression to branch learning and other degeneration.
2. Considering the rotation invariant of the network output, the authors chose the circular pooling region rather than the square region in CenterNet.Specifically, in a bird's eye view, the object is regarded as positive only if there is no center with higher confidence within a center radius r, and the author describes the method as Circular NMS.The Circular NMS has the same inhibitory effect as the 3D IoU-based NMS, but is faster.
3. Based on the above design, the detector still does not achieve perfect rotation invariance and isodenatability.The authors thus constructed a simple set of four rotating, symmetric copies of the input point cloud, and combined this set into the CenterPoint, each producing a heat map and regression result, and then simply averaging these results.
Multimodal Virtual Point 3D Detection
The generation process of Virtual Point is divided into two truncations. The first stage completes the registration of the point cloud and RGB, and the second stage generates a dense point cloud (2 d points + depth + feature)
MVP consists of 2D instance splitter, 3D detector and mapping between the two.The image first enters the 2D instance partitizer to obtain the instance mask, then RGB and Lidar point cloud communication, after time proofreading and coordinate transformation, to remove calibration noise.In this process, the point cloud is transformed to the vehicle reference coordinate, and then passes the time transformation of T (t1-> t2) (t1 is the point cloud acquisition time, t2 is the RGB acquisition time). Finally, the point cloud under the vehicle reference coordinate frame is reflected into the RGB sensor reference coordinate.Subsequently, the transformed point cloud is segmented by perspective.
After the above calculations, we obtain the point cloud and RGB after the spatial and temporal registration, as well as the instance mask and semantic features (from 2D power segmentation device).
Next, we sample each instance to obtain a 2D set of points, then retrieve the nearest measured point within the same instance, and assign the depth of that point to the new virtual point.Finally, save the (2D point + depth + instance semantic feature) into the collection as a vitual point.
Finally, 3D object detection network inference yields target boxes and class probabilities.
Training details:
For the 2D detection, the we add cascade RoI heads [3] for instance segmentation following Zhou et al.[73].We train the detector on the nuScenes dataset using the SGD optimizer with a batch size of 16 and a learning rate of 0.02 for 90000 iterations.
For 3D detection, we train the model for 20 epochs with the AdamW [34] optimizer using the one-cycle policy [16], and with a max learning rate of 3e-3 following [66].The training takes 2.5 days on 4 V100 GPUs with a batch size of 16 (4 frames per GPU).
experimental result:
From the comparison with CenterPoint, it can be seen that MVP has 4-9 points, among which the detection mAP of Pedestrian reached 89.1, exceeding the detection accuracy of Car. However, the mAP of truck Construction Vehicle has been at a low 26.1, which still needs to be improved.
Code debugging
Code important bug with the workaround
An existing access error that occurred when running to Spconv
The Official Spconv Installation Guide:GitHub - traveller59/spconv： Spatial Sparse Convolution Library
* Follow MMDetection3D's install tutorial and failed
* Follow CenterPoint's install tutorial and failed
* Follow the solution given by issue, failed

...

Get the Whole Paper!

Not exactly what you need?

Do you need a custom essay? Order right now:

Order

Center-based 3D Object Detection and Tracking (Lab Report Sample)

Other Topics: