世界坐标系投影到像素坐标系【python实验】-CSDN博客

阿里云国内75折回扣微信号：monov8

阿里云国际，腾讯云国际，低至75折。AWS 93折免费开户实名账号代冲值优惠多多微信号：monov8 飞机：@monov6

对于三维视觉而言需要清晰了解世界坐标系和像素坐标系的对应关系。本文用python做实验。
相机内外参数的数学推导可以看我之前的博客《【AI数学】相机成像之内参数
》《【AI数学】相机成像之外参数》。

实验开始
首先明确相机的内外参数

intri = [[1111.0, 0.0, 400.0],
         [0.0, 1111.0, 400.0],
         [0.0, 0.0, 1.0]]
         
extri = [[-9.9990e-01,  4.1922e-03, -1.3346e-02, -5.3798e-02],
        [-1.3989e-02, -2.9966e-01,  9.5394e-01,  3.8455e+00],
        [-4.6566e-10,  9.5404e-01,  2.9969e-01,  1.2081e+00],
        [0.0, 0.0, 0.0, 1.0]]

我们在世界坐标系上找一个点比如p1=[0, 0, 0]我们将其投影到像素坐标系然后观察其投影坐标。

首先我们将其转换到相机坐标系:我们仅用外参数extri就可以得到相机坐标系

import numpy as np

extri = np.array(
        [[-9.9990e-01,  4.1922e-03, -1.3346e-02, -5.3798e-02],
        [-1.3989e-02, -2.9966e-01,  9.5394e-01,  3.8455e+00],
        [-4.6566e-10,  9.5404e-01,  2.9969e-01,  1.2081e+00],
        [0.0, 0.0, 0.0, 1.0]])
p1 = np.array([0, 0, 0])

R = extri[:3, :3] # rotation
T = extri[:3, -1] # trans
cam_p1 = np.dot(p1 - T, R)

print(cam_p1)

然后我们将相机坐标系转换到像素坐标系

intri = [[1111.0, 0.0, 400.0],
         [0.0, 1111.0, 400.0],
         [0.0, 0.0, 1.0]]
screen_p1 = np.array([-cam_p1[0] * intri[0][0] / cam_p1[2] + intri[0][2],
					  cam_p1[1] * intri[1][1] / cam_p1[2] + intri[1][2]
					])
# Formula:				
#                          x                                y
#            x_im = f_x * --- + offset_x      y_im = f_y * --- + offset_y
#                          z                                z
       

print(screen_p1)
# [400.00057322 400.00211168]

由此可知世界坐标系中的原点[0, 0, 0]经过这一套相机内外参数矩阵变换后投影到屏幕的位置是[400, 400]。我们的图像大小刚好是[800, 800]恰好是中间位置。

上面的实验改成面向对象的写法更方便调用

import numpy as np

class CamTrans:
    def __init__(self, intri, extri, h=None, w=None):
        self.intri = intri
        self.extri = extri
        self.h = int(2 * intri[1][2]) if h is None else h
        self.w = int(2 * intri[0][2]) if w is None else w

        self.R = extri[:3, :3]
        self.T = extri[:3, -1]
        self.focal_x = intri[0][0]
        self.focal_y = intri[1][1]
        self.offset_x = intri[0][2]
        self.offset_y = intri[1][2]

    def world2cam(self, p):
        return np.dot(p - self.T, self.R)

    def cam2screen(self, p):
        """
                          x                                y
            x_im = f_x * --- + offset_x      y_im = f_y * --- + offset_y
                          z                                z
        """
        x, y, z = p
        return [-x * self.focal_x / z + self.offset_x, y * self.focal_y / z + self.offset_y]

    def world2screen(self, p):
        return self.cam2screen(self.world2cam(p))


if __name__ == "__main__":
    p = [0, 0, 1]
    intri = [[1111.0, 0.0, 400.0],
         [0.0, 1111.0, 400.0],
         [0.0, 0.0, 1.0]]
    extri = np.array(
        [[-9.9990e-01,  4.1922e-03, -1.3346e-02, -5.3798e-02],
        [-1.3989e-02, -2.9966e-01,  9.5394e-01,  3.8455e+00],
        [-4.6566e-10,  9.5404e-01,  2.9969e-01,  1.2081e+00],
        [0.0, 0.0, 0.0, 1.0]])
    
    # another camera
    extri1 = np.array([[-3.0373e-01, -8.6047e-01,  4.0907e-01,  1.6490e+00],
        [ 9.5276e-01, -2.7431e-01,  1.3041e-01,  5.2569e-01],
        [-7.4506e-09,  4.2936e-01,  9.0313e-01,  3.6407e+00],
        [0.0, 0.0, 0.0, 1.0]])

    ct = CamTrans(intri, extri)
    print(ct.world2screen(p))
    
    ct1 = CamTrans(intri, extri1)
    print(ct1.world2screen(p))