RealityKit and ARKit – What is AR project looking for when the app starts?

Question

You will understand this question better if you open Xcode, create a new Augmented Reality Project and run that project.

After the project starts running on device, you will see the image from the rear camera, shooting your room.

After 3 or 4 seconds, a cube appears.

My questions are:

what were the app doing before the cube appearance? I mean, I suppose the app were looking for tracking points on the scene, so it could anchor the cube, right?
if this is true, what elements are the app looking for?
Suppose I am not satisfied with the point the cube appeared. Is there any function I can trigger with a tap on the screen, so the tracking can search for new points again near the location I have tapped on the screen?

I know my question is generic, so please, just give me the right direction.

Andy Fedoroff · Answer 1 · 2020-07-04T01:56:10.497

ARKit and RealityKit stages

There are three stages in ARKit and RealityKit when you launch AR app:

Tracking
Scene Understanding
Rendering

Each stage may considerably increase a time required for model placement (+1...+4 seconds, depending on the device). Let's talk about each stage.

Tracking

This is initial state for your AR app. Here iPhone mixes visual data coming through RGB rear camera at 60 fps and transform data coming from IMU sensors (accelerometer, gyroscope and compass) at 1000 fps. Automatically generated Feature Points helps ARKit and RealityKit track surrounding environment and build a tracking map (whether it's a World Tracking or, for example, a Face Tracking). Feature Points are spontaneously generated on a high-contract margins of real-world objects and textures, in a well-lit environments. If you already have a previously saved World Map, it reduces a time for model placement into a scene. Also you may use a ARCoachingOverlayView for useful visual instructions that guide you during session initialization and recovery.

Scene Understanding

Second stage can include a horizontal and vertical Plane Detection, Ray-Casting (or Hit-Testing) and Light Estimation. If you have activated Plane Detection feature, it takes some time to detect a plane with a corresponding ARPlaneAnchor (or AnchorEntity(.plane)) that must tether a virtual model – cube in you case. Also there's an Advanced Scene Understanding allowing you to use a Scene Reconstruction feature. You can use scene reconstruction in gadgets with a LiDAR scanner and it gives you improved depth channel for compositing elements in a scene and People Occlusion. You can always enable an Image/Object Detection feature but you must consider it's built on machine learning algorithms that increase a model's placement time in a scene.

Rendering

The last stage is made for rendering of a virtual geometry in your scene. Scenes can contain models with shaders and textures on them, a transform or asset animations, dynamics and sound. Surrounding HDR reflections for metallic shaders are calculated by neural modules. ARKit can't render an AR scene. For 3d rendering you have to use such frameworks as RealityKit, SceneKit or Metal. These frameworks have their own rendering engines.

By default, in RealityKit there are high-quality rendering effects like Motion Blur or Ray-tracing shadows that require additional computational power. Take it into consideration.

Tip

To significantly reduce the time when placing an object in the AR scene, use a LiDAR scanner that works at nanoseconds speed. If you gadget has no LiDAR, then track only a surrounding environment where lighting conditions are good, all real-world objects are clearly distinguishable and textures on them are rich and have no repetitive patterns. Also, try not to use in your project polygonal geometry with more than 10K+ polygons and hi-res textures (jpeg or png with a size 1024x1024 considered as normal).

Also, RealityKit 1.0 by default has several heavy options enabled – Depth channel Compositing, Motion Blur and Ray-traced Contact Shadows (on A11 and earlier there are Projected Shadows). If you don't need all these features, just disable them. After it your app will be much faster.

Practical Solution I

(shadows, motion blur, depth comp, etc. are disabled)

Use the following properties to disable processor intensive effects:

override func viewDidLoad() {
    super.viewDidLoad()
    
    arView.renderOptions = [.disableDepthOfField,
                            .disableHDR,
                            .disableMotionBlur,
                            .disableFaceOcclusions,
                            .disablePersonOcclusion,
                            .disableGroundingShadows]
    
    let boxAnchor = try! Experience.loadBox()
    arView.scene.anchors.append(boxAnchor)
}

Practical Solution II

(shadows, motion blur, depth comp, etc. are enabled by default)

When you use the following code in RealityKit:

override func viewDidLoad() {
    super.viewDidLoad()

    let boxAnchor = try! Experience.loadBox()
    arView.scene.anchors.append(boxAnchor)
}

you get a Reality Composer's preconfigured scene containing horizontal plane detection property and AnchorEntity with the following settings:

AnchorEntity(.plane(.horizontal,
                classification: .any,
                 minimumBounds: [0.25, 0.25])

The problem you're having is a time lag that occurs at the moment your app launches. At the same moment starts world tracking (first stage) and then app tries simultaneously to detect a horizontal plane (second stage) and then it renders a metallic shader of a cube (third stage). To get rid of this time lag use this very simple approach (when app launches you need to track a room and then tap on a screen):

override func viewDidLoad() {
    super.viewDidLoad()
    
    let tap = UITapGestureRecognizer(target: self,
                                     action: #selector(self.tapped))
    arView.addGestureRecognizer(tap) 
}

@objc func tapped(_ sender: UITapGestureRecognizer) {

    let boxAnchor = try! Experience.loadBox()
    arView.scene.anchors.append(boxAnchor)
}

This way you reduce the simultaneous load on the CPU and GPU. So your cube is loading faster.

P.S.

Also, as an alternative you can use a loadModelAsync(named:in:) type method that allows you to load a model entity from a file in a bundle asynchronously:

static func loadModelAsync(named name: String, 
                            in bundle: Bundle?) -> LoadRequest<ModelEntity>

score 1 · Accepted Answer · answered Jun 19 '20 at 12:33

In the default Experience.rcproject the cube has an AnchoringComponent with a horizontal plane. So basically the cube will not display until the ARSession finds any horizontal plane in your scene (for example the floor or a table). Once it finds that the cube will appear.

If you want instead to create and anchor and set that as the target when catching a tap event, you could perform a raycast. Using the result of a raycast, you can grab the worldTransform and set the cube's AnchoringComponent to that transform:

Something like this:
boxAnchor.anchoring = AnchoringComponent(.world(transform: raycastResult.worldTransform))