07 Oct 2022 3 min read Augmented Reality

RoomPlan

The paper about RoomPlan is fascinating to read. The result may seem simple, but it is the output of carefully designed and optimized steps that open the door for improved semantic understanding of physical contexts.

Screenshot displaying a canvas with the extracted bits and annotations for the RoomPlan paper

RoomPlan uses camera and LiDAR to make a 3D floor plan of a room (including dimensions and types of furniture).

The system is made up of two parts:

3D room layout estimation (RLE)
3D object-detection pipeline (3DOD)

Timeline chart describing the sequential steps necessary for creating the RoomPlan “dollhouse” output

Room Layout Estimation

RoomPlan first finds the walls and openings, and then the result is used to figure out if the openings are doors or windows.

Figure 2: We formulate the problem as 2D detection.The wall information is used to project the input into wall planes. This is later fed to our neural network, called the 2D orthographic detector, that predicts the location of the doors and windows in 2D planes. This output is later lifted into 3D along with the wall and camera information. — Figure 2: We formulate the problem as 2D detection. The wall information is used to project the input into wall planes. This is later fed to our neural network, called the 2D orthographic detector, that predicts the location of the doors and windows in 2D planes. This output is later lifted into 3D along with the wall and camera information.

3D Object Detection

3DOD goes through a three-step process that first recognizes and categorizes objects locally, then globally for more information, and finally uses a box fusion process to create the dollhouse result, or 3D representation of the whole room.

Figure 3: We create a wider frustum view by accumulating semantic point clouds. The local detector detects the oriented bounding boxes in a frustum, and aggregates them. Global detection then works with scene contextual information to detect large furniture. The accumulated results and boxes are combined to generate the final output.

The making of an L-shaped sofa is an example of a relationship between objects during the fusion step. Also, if the object is not intersecting with any wall, then the correlation is calculated by aligning it with the closest one.

A animation showing the fusion step of a whole house scan before and after. The most significant items are combined and several walls are fixed.

🪴🪑📦 The chair’s category seems to be the most difficult to identify (83% precision) due to heavy occlusion or crowded arrangements.

Screenshot of the sample capture video failing to recognize a chair

Room Layout Estimation

3D Object Detection

You might also like...

Light Without Boundaries

The Framework Next Door

Subdivisions in RealityKit

Notes from "Towards Automated Accessibility Report Generation for Mobile Apps"

Voluntary Product Accessibility Templates