Most 3D reconstruction tutorials out there go too deep into the theory or a combination of both.

Worse yet they use specialized datasets (like Tsukuba) and this is a bit of a problem when it comes to using the algorithms for anything outside those datasets (because of parameter tuning).

The cool thing about 3D reconstruction (and computer vision in general) is to reconstruct the world around you, not somebody else’s world (or dataset).

Simply put this tutorial will take you from scratch to point cloud USING YOUR OWN PHONE CAMERA and pictures. So without further ado, let’s get started.


This tutorial is divided into 3 parts.

  • Part 1 (theory and requirements): Covers a very very brief overview of the steps required for stereo 3D reconstruction. You are here.
  • Part 2 (Camera calibration): Covers the basics on calibrating your own camera with code.
  • Part 3(Disparity map and point cloud): Covers the basics on reconstructing pictures taken with the camera previously calibrated with code.

The steps required for 3D reconstruction.

There are many ways to reconstruct the world around but it all reduces down to getting an actual depth map.

A depth map is a picture where every pixel has depth information (instead of color information). It is normally represented like a grayscale picture.

Depth map from the Tsukuba dataset. Courtesy of OpenCV

As mentioned before there are different ways to obtain a depth map and these depend on the sensor being used. A type of sensor could be a simple camera (from now on called RGB camera in this text) but it is possible to use others like LiDAR or infrared or a combination.

The type of sensor will determine the accuracy of the depth map. In terms of accuracy it normally goes like this: LiDAR > Infrared > Cameras. Depth maps can also be colorized to better visualize depth.

A selfie of me taking a depth selfie using the Kinect camera.

Depending on the kind of sensor used, there more or fewer steps required to actually get the depth map. The Kinect camera for example uses infrared sensors combined with RGB cameras and as such you get a depth map right away (because it is the information processed by the infrared sensor).

But what if you don’t have anything else but your phone camera?. In this case you need to do stereo reconstruction. Stereo reconstruction uses the same principle your brain and eyes use to actually understand depth.

The gist of it consists of looking at the same picture from two different angles, look for the same thing in both pictures and infer depth from the difference in position. This is called stereo matching.

In order to do stereo matching it is important to have both pictures have the exact same characteristics. Put differently, both pictures shouldn’t have any distortion. This is a problem because the lens in most cameras causes distortion. This means that in order to accurately do stereo matching one needs to know the optical centers and focal length of the camera.

In most cases, this information will be unknown (especially for your phone camera) and this is why stereo 3D reconstruction requires the following steps:

  1. Camera calibration: Use a bunch of images to infer the focal length and optical centers of your camera
  2. Undistort images: Get rid of lens distortion in the pictures used for reconstruction
  3. Feature matching: Look for similar features between both pictures and build a depth map
  4. Reproject points: Use depth map to reproject pixels into 3D space.
  5. Build point cloud: Generate a new file that contains points in 3D space for visualization.
  6. Build mesh to get an actual 3D model (outside of the scope of this tutorial, but coming soon in different tutorial)

Step 1 only needs to be executed once unless you change cameras. Steps 2–5 are required every time you take a new pair of pictures…and that is pretty much it.

The actual mathematical theory (the why) is much more complicated but it will be easier to tackle after this tutorial since you will have a working example that you can experiment with by the end of it.

In the next part we will explore how to actually calibrate a phone camera, and some best practices for calibration, see you then.

Source: Becoming Human

Related posts: