RealityKit

RealityKit Outline Selection

Cristian Díaz

Apr 2, 2026 — 7 min read

TLDR: MeshResource.contents, three passes, one PostProcessEffect

Rasterize the selected mesh into an offscreen R8Unorm silhouette mask
Dilate the mask into a pixel-wide edge ring, suppressing the interior
Composite the outline color over the source frame

One of the most commonly requested affordances when dealing with 3d is object selection. A way to denote which subset of all the entities on the screen is currently in active focus. Since the introduction of RealityKit, this has been surprisingly difficult to achieve; here are some popular explored techniques.

Bounding box

The simplest approach. RealityKit can draw an axis-aligned bounding box around any entity with no mesh access required.

A 3D model of a green industrial mechanical device, likely a submersible pump or compressor, displayed in Gantry: Scene Converter app. The device features a dark green metal frame with cylindrical components, black cables, and metallic hardware visible inside. An orange bounding box indicates the model's dimensions, with a green vertical axis marker showing "10cm" scale. The model sits on a warm-toned gradient grid floor with red and blue axis lines, and a small coordinate gizmo appears in the bottom-left corner.

It serves as an indicator of selection on simple geometry, signaling that something has been selected. On anything with an irregular silhouette, the box either clips the mesh or floats far away from it. It conveys location, not shape.

The process can become more sophisticated if you submerge yourself in the fascinating world of bounding box orientation.

13: Bounding volumes: sphere, axis-aligned bounding box (AABB), oriented bounding box (OBB), eight-direction discrete orientation polytope (8-DOP), convex hull (Ericson, 2004)

Inverted hull

This is a classic trick in Digital Content Creation (DCC). Duplicate the mesh, flip face winding so only backfaces render, apply an unlit solid color, and scale up slightly. The back-face shell extends slightly beyond the original mesh, creating an outline ring around it.

A 3D model of a Nike Air Force 1 sneaker with a black and grey camouflage pattern and neon green lightning bolt accents, displayed in Gantry: Scene Converter. The shoe features white laces, a grey sole with "AIR" branding, and a cyan outline indicating selection. A green vertical axis marker shows "10cm" scale above the shoe. The model rests on a white grid floor with red and blue axis lines, and a coordinate gizmo (showing Y, X, and Z axes) appears in the bottom-left corner.

It works well on hard-surface models with convex geometry. It breaks on:

Concavities—the scaled hull bleeds through concave areas, producing smear artifacts instead of a clean edge
Tiny models—a fixed scale factor like 1.015 produces near-zero screen-space expansion. The outline disappears
Huge models—the same factor produces a meters-thick shell
Non-uniform scale—uniform hull expansion doesn't follow a stretched mesh correctly; outline is thicker on some axes

Compensating for camera distance can partially mitigate the scale issue. But model size and normals are separate problems the technique has no good answer for.

A 3D model of a weathered, retro-futuristic robot character named "anim_sad_cutebot_01," displayed in Preflight QuickLook exnteions The robot has a cream-colored cylindrical body with red accents on its head and feet, visible wear and scratches, and a single green circular eye. Its arms hang downward in a sad pose. A cyan silhouette shadow appears behind the model, and a green vertical axis marker shows "50cm" scale. The dark background features a subtle grid floor with red and blue axis lines and a coordinate gizmo in the bottom-left. The window title shows "anim_sad_cutebot_01_exported2.usdz" with an "Open with Xcode-26.3.0" option.

Post-process outline

This approach is the one we want. It never touches the scene hierarchy. Operates at screen resolution regardless of mesh complexity, model size, or poly count. The outline is always the same pixel width. Apple uses this for some viewports, like the one in RCP, but uses private APIs and can't be replicated with public ones yet.

A screenshot of Apple's Reality Composer Pro editor showing a 3D model of the Apple Vision Pro headset with the dual-knit band. The headset is displayed in the central viewport with an orange selection outline, featuring the characteristic curved glass front, silver aluminum frame, grey fabric Light Seal, and grey knitted headband with adjustment dial. The left panel shows the Scene hierarchy with "apple_vision_pro_dual_knit" under Root. The bottom Project Browser displays two files: "Scene.usda" and "apple-vision-pro-dual-knit.usdz" (selected). The right inspector panel shows Transform, References, and Material Bindings properties for the selected object. — A screenshot of Apple's Reality Composer Pro editor showing a 3D model of the Apple Vision Pro headset with the dual-knit band.

🧚

Or that's what I thought until now. Turns out that by using PostProcessEffect and some fairy dust, we can get a pretty similar convincing outline

How It Works

// Available when SwiftUI is imported with RealityKit
/// A struct for enabling or disabling post processing effects for all content a reality view contains.
@available(iOS 26.0, macOS 26.0, tvOS 26.0, *)
@available(visionOS, unavailable)
@available(watchOS, unavailable)

PostProcessEffect is a RealityKit protocol (macOS 26+, iOS 26+) that lets you inject Metal work between RealityKit's rendered frame and the display. You receive:

sourceColorTexture — the frame as rendered by RealityKit
targetColorTexture — where you write the final output
commandBuffer — ready to encode into
projection — the current projection matrix

The outline is three passes encoded into that command buffer before the frame is presented.

Pass 1: Silhouette Mask

Render the selected mesh's geometry into a single-channel R8Unorm texture. Every pixel covered by the mesh becomes white. The result is a binary silhouette at screen resolution.

export from Xcode gpu capture: mask step showing only color in the part selected of the model (horn of a gramophone in this case)

The vertex shader needs only two things: packed positions and an MVP matrix.

vertex float4 outlineMaskVertex(
    const device packed_float3* positions [[buffer(0)]],
    constant OutlineMaskUniforms& uniforms [[buffer(1)]],
    uint vid [[vertex_id]]
) {
    return uniforms.mvp * float4(float3(positions[vid]), 1.0);
}

fragment float4 outlineMaskFragment(float4 position [[position]]) {
    return float4(1.0, 1.0, 1.0, 1.0);
}

No normals. No UVs. No materials. The mask pass only answers one question: which screen pixels are covered by this geometry?

A depth attachment is included so geometry occluded by other objects does not appear in the mask.

Pass 2: Dilation

Expand the mask outward by radius pixels. Pixels outside the mesh but within radius of the silhouette edge become 1.0. Interior pixels are explicitly suppressed to 0.0. The result is a ring at the silhouette boundary only.

export from Xcode gpu: outline of the previous step mask with some dilation

kernel void outlineDilate(
    texture2d<float, access::read>  maskTexture [[texture(0)]],
    texture2d<float, access::write> edgeTexture [[texture(1)]],
    constant int32_t&               radius      [[buffer(0)]],
    uint2 gid [[thread_position_in_grid]]
) {
    if (maskTexture.read(gid).r > 0.5) {
        edgeTexture.write(float4(0.0), gid);  // interior — suppress
        return;
    }
    bool nearMask = false;
    for (int dy = -radius; dy <= radius && !nearMask; ++dy)
        for (int dx = -radius; dx <= radius && !nearMask; ++dx) {
            if (dx*dx + dy*dy > radius*radius) continue;
            if (maskTexture.read(uint2(int(gid.x)+dx, int(gid.y)+dy)).r > 0.5)
                nearMask = true;
        }
    edgeTexture.write(float4(nearMask ? 1.0 : 0.0), gid);
}

Pass 3: Composite

Blend the outline color over the source frame wherever the edge ring is set. Write to the target texture.

export from Xcode gpu:final composite with the original color of the model + the generated outline

kernel void outlineComposite(
    texture2d<float, access::read>  sourceColor  [[texture(0)]],
    texture2d<float, access::read>  edgeMask     [[texture(1)]],
    texture2d<float, access::write> targetColor  [[texture(2)]],
    constant float4&                outlineColor [[buffer(0)]],
    uint2 gid [[thread_position_in_grid]]
) {
    float4 color = sourceColor.read(gid);
    if (edgeMask.read(gid).r > 0.5)
        color = mix(color, float4(outlineColor.rgb, 1.0), outlineColor.a);
    targetColor.write(color, gid);
}

RealityKit renders the scene first, without any alterations. The outline is integrated into the scene as a screen-space image operation after the initial rendering.

Getting Geometry Out of the Selected Entity

The mask pass requires packed float3 positions, triangle indices, and the world transform of the entity. That is all — nothing else is forwarded to the GPU.

ModelComponent gives access to MeshResource.contents. Each MeshResource.Part exposes typed positions and triangle indices directly:

let packedPositions = part.positions.elements.flatMap { [$0.x, $0.y, $0.z] }
let indexBytes = triangleIndices.withUnsafeBytes { Array($0) }

No buffer unwinding, no topology reconstruction. RealityKit provides indexed mesh data access at the CPU level.

Projection

The mask only aligns with the rendered object when the transform composition is correct:

mvp = projection * viewMatrix * modelMatrix

modelMatrix — entity world transform from entity.transformMatrix(relativeTo: nil) at extraction time
viewMatrix — camera transform inverse, updated every frame
projection — from PostProcessEffectContext

When nothing is selected

If nothing is selected, the three passes are skipped entirely, and a blit copy runs instead:

blit.copy(from: context.sourceColorTexture, ..., to: context.targetColorTexture, ...)

No render pass, no compute dispatch, no offscreen textures. The overhead when idle is effectively zero.

The one real cost

The textures are allocated fresh on every postProcess call. Allocating two full-resolution GPU textures per frame adds overhead that a production implementation would eliminate by caching them and only reallocating on viewport resize.

But aside from that, look at this beauty...