7 Reality capture
7.1 Photogrammetry and 3D Scanning
Photogrammetry is a powerful technique that allows the creation of detailed 3D models from a series of 2D photographs. This method has become increasingly accessible and powerful in recent years, thanks to advances in computing power and the widespread availability of high-quality cameras in smartphones.
7.1.1 Basic Principles of Photogrammetry
The core concept of photogrammetry involves:
- Taking multiple photographs of an object or environment from different angles
- Using software to track features across these images
- Matching camera positions and orientations
- Generating a 3D model based on the identified common points
As I often explain to my students, photogrammetry is essentially just taking a bunch of normal photos and letting the computer chew on them. I’ve found this process can work with “any” cameras, though the quality of the output will depend on the input image quality. Importantly, smartphone cameras are often sufficient for getting started with photogrammetry projects.
7.1.2 A Personal Example
In one project, I took my normal digital camera and captured about 100 photos of construction work we were doing behind our house in Trollhättan. Using default settings, I let the computer process the images for about an hour and a half, and it produced a detailed 3D environment.
This example illustrates how accessible the technology has become. With just a standard digital camera and some freely available software, it’s possible to create a 3D model of a real-world environment in just a couple of hours.
The resulting 3D model can be exported and used in various applications, such as virtual reality environments or game engines like Unreal Engine, allowing for virtual exploration and planning.
7.1.3 Aerial Photogrammetry
For larger scale projects, aerial photogrammetry using drones can be incredibly effective. While this requires more specialized equipment, it opens up possibilities for creating detailed 3D models of expansive areas.
Once you’ve made the initial investment—learning the software, acquiring a drone, and understanding the workflow—you can create detailed 3D models for VR/AR applications in real-world settings.
This method allows for the creation of highly detailed 3D models of large environments, which can be invaluable for urban planning, archaeology, or creating immersive virtual experiences of real-world locations.
7.1.4 Large Scale Integration in Unreal Engine
Taking photogrammetry to the next level, it’s possible to create incredibly detailed and expansive virtual environments by integrating photogrammetry data into game engines like Unreal Engine.
This example shows a monastery in a spectacular location. The creators captured thousands of photographs to build this virtual environment—essentially creating a digital copy of this real-world place.
This example demonstrates how photogrammetry can be used to create highly detailed, explorable virtual environments based on real-world locations. It’s worth noting that while most of the environment is captured through photogrammetry, some elements (like water bodies) might be added or enhanced using the game engine’s capabilities.
7.1.5 Limitations and Challenges of Photogrammetry
While photogrammetry is a powerful tool, it comes with its own set of limitations and challenges:
Time-Consuming Process: It takes time for one just to gather these images.
Reflections: Reflective surfaces pose a significant challenge for photogrammetry software.
Reflections present a significant challenge because the software looks for recognizable features across images to understand spatial relationships. When you have reflections—imagine seeing a person in a mirror—the system can become confused about where the camera actually is in relation to these features.
“Featureless” Surfaces: Surfaces lacking distinct features or textures can be problematic. Surfaces lacking distinct features—like a plain white wall—pose problems because they don’t provide enough structural information for the software to work with.
Patterns: While patterns can provide some texture, strictly repeating patterns can cause issues.
Patterns can be helpful, but strictly repeating patterns create difficulties. While patterns are better than featureless surfaces, repetitive patterns make it hard for the software to distinguish one section from another.
- Varying and Sharp Lighting: Lighting conditions play a crucial role in the quality of photogrammetry results.
Varying lighting conditions, particularly outdoors with moving clouds or changing sun position, can significantly alter the appearance of objects across photos, creating additional processing challenges.
- Motion: Photogrammetry requires a static subject.
Since photogrammetry relies on multiple photos taken over time, the subject must remain completely static throughout the capture process.
7.1.6 Conclusion
Photogrammetry has evolved from a specialized technique to an accessible tool for creating 3D models from photographs. Whether you’re using a smartphone for small projects or professional equipment for large-scale environments, the principles remain the same. As technology continues to advance, we can expect even more impressive and accessible applications of photogrammetry in fields ranging from entertainment to scientific research.
7.2 3D Gaussian Splatting and Hybrid Workflows
3D Gaussian Splatting (3DGS) has matured from a research curiosity into a production-ready capture format that complements traditional photogrammetry. Instead of reconstructing polygon meshes, 3DGS stores scenes as millions of overlapping ellipsoids rendered directly by GPUs, enabling photorealistic, view-dependent effects with surprisingly small file sizes.
7.2.1 Artifact Taxonomy
- Urban walkthroughs: The Ludlow “Quality Square” capture demonstrates street-scale scans with navigable splat viewers and scanning-path visualizations.(Barrett 2024)
- Macro specimens: Honeybee macro captures use focus stacking across 1,700+ photos to showcase microscopic detail—perfect for museums and scientific storytelling.(Clarke 2024)
- Hybrid architectural sets: Hybrid real-estate demos combine synthetic interiors (generated from architectural renders) with photogrammetric backdrops for pre-visualization.(Eastcott 2024)
- Industrial inspection: Factory-floor “4D video scrubbers” compare splat captures over time to reveal maintenance issues.(Radiance Fields 2024)
Ludlow’s professional workflow compresses a 1.2 GB raw scan down to a 50 MB web-ready splat without noticeable quality loss.(Barrett 2024) Expect to budget roughly 20–60 MB for room-scale captures and 150+ MB for dense outdoor scenes, making 3DGS viable for browser delivery and headset streaming.
7.2.2 Capture Accessibility in 2025
- Quest-native capture: Meta’s Horizon Hyperscape records outdoor environments directly from Quest 3 headsets, uploading frames to the cloud for automated processing. This removes the dedicated camera requirement for student projects.(MuchRockness 2024)
- Enterprise platforms: Varjo’s Teleport service ingests photos or prebuilt splats, stitches multiple rooms into portal-connected tours, and streams content to browsers or high-end headsets.(Varjo Technologies 2024)
- Consumer experiments: Mobile apps such as Polycam and Luma export splats alongside meshes, letting you pick the representation that best fits your pipeline.
7.2.3 Preparing Splats for Engines
While many teams view 3DGS scenes in native viewers, you can now import splats into Unreal Engine or Unity via plugins (gsplat, InstantNGP integrations). When planning a project, sketch an “import checklist” alongside your photogrammetry workflow:
- Export splats plus metadata (camera positions, scale) from your capture tool.
- Convert to engine-friendly formats (PLY, Gaussian binaries) using open-source converters.
- Set up level-of-detail switches or fallback meshes for devices without splat renderers.
See the content creation pipelines in Chapter 3 for practical examples of moving captured spaces into interactive experiences.
7.3 Volumetric Video Capture
Volumetric video is an advanced reality capture technique that offers unprecedented flexibility in viewing and interacting with recorded content. This technology goes beyond traditional video by capturing three-dimensional representations of subjects or scenes, allowing for free-viewpoint experiences.
7.3.1 Understanding Volumetric Video
Volumetric video capture involves recording a three-dimensional space, including depth information for each pixel. This results in a dynamic 3D model that can be viewed from any angle.
This technique uses three different Kinect camera views, each containing depth information for every pixel. This means the system knows not just the color of each pixel, but also its distance from the camera.
This depth information is crucial for reconstructing the scene in three dimensions, enabling viewers to move around and view the content from any desired angle.
7.3.2 Large-scale Volumetric Capture
For professional applications, large-scale volumetric capture setups are used. These typically involve:
- A large space surrounded by green screens
- Numerous cameras positioned strategically around the space
- Performers acting within the capture area
This demonstrates large-scale volumetric capture with extensive green screen setups and numerous cameras—those dots visible on the walls are actually individual cameras.
7.3.3 Relightable Volumetric Video
An advanced form of volumetric capture is relightable volumetric video. This technique not only captures the three-dimensional form of performers but also records detailed information about the materials and surfaces being filmed.
This system captures detailed information about the materials and surfaces of performers. It calculates not only 3D shape but also how different colored lights should reflect off skin, clothing, and other materials.
The key advantage of this technique is the ability to relight the captured performance in post-production, allowing for seamless integration into virtual environments with dynamic lighting conditions.
7.3.4 Applications of Volumetric Video
7.3.4.1 Entertainment Industry
In the entertainment industry, volumetric video allows for innovative approaches to filmmaking and content creation. Actors can be recorded volumetrically and placed in 3D environments, offering new possibilities for editing and storytelling.
This example shows a simpler setup where the actors don’t actually have a complete 3D representation—they’re only recorded from one direction. This limits the viewing angles, but they’re still positioned in 3D space and work effectively as long as you don’t move too far from the intended viewing position.
7.3.4.2 Medical Applications
Volumetric video technology offers unique technical advantages for medical applications due to its ability to capture precise spatial relationships and multi-angle perspectives. The technology’s capacity to record depth information alongside visual data makes it particularly valuable for medical documentation and analysis.
Key technical benefits for medical use include: - Multi-perspective recording: Eliminates blind spots common in traditional video - Spatial accuracy: Preserves precise anatomical relationships - Post-capture navigation: Allows reviewers to examine procedures from optimal viewing angles - Data persistence: Creates reviewable archives for quality assurance and training
The technical challenges specific to medical environments include lighting constraints in sterile environments, equipment integration with existing medical systems, and ensuring capture quality meets professional medical standards.
For detailed applications in medical training and education, see Section 6.4.4.
7.3.5 Holoportation
Holoportation is an exciting application of volumetric video technology that enables real-time 3D capture and transmission of people and objects. This technology, when combined with augmented reality devices like Microsoft’s HoloLens, creates a sense of presence and shared space between remote participants.
This demonstrates how multiple cameras capture a person in one location and project her as a hologram in another room, allowing the researcher to see her directly through his HoloLens.
7.3.6 Challenges and Limitations
Despite its potential, volumetric video technology faces several challenges:
- Complex Setup: The capture process requires extensive equipment and carefully controlled environments.
As these examples demonstrate, volumetric capture requires incredibly complex setups with significant limitations. Even with hundreds of cameras, the capture volume typically spans only a few meters in diameter.
Capture Limitations: Even with numerous cameras, it’s challenging to capture every angle without shadows or occlusions, especially with multiple performers.
Data Management: The capture process generates enormous amounts of data that must be processed, stored, and rendered.
The process requires collecting enormous amounts of data and intensive processing. Compressing and rendering this data in real-time presents significant technical challenges.
Rendering Complexity: The resulting 3D data is complex and requires significant computational resources to render in real-time.
Cost: The equipment and processing required make this technology expensive and not widely accessible.
7.3.7 Conclusion
Volumetric video capture represents a significant leap forward in how we record and interact with visual content. From entertainment to medical training, this technology opens up new possibilities for creating immersive and interactive experiences. As hardware capabilities improve and algorithms become more sophisticated, we can expect volumetric video to become more accessible and widely adopted across various industries and applications.
7.4 Light Fields and Neural Rendering
Light field capture represents the pinnacle of 360-degree imaging technology, offering an almost compromise-free solution for immersive visual experiences. This cutting-edge technique goes beyond traditional 360° photography by allowing viewers to interact with the captured environment in ways that closely mimic real-world perception.
Note: This chapter provides comprehensive coverage of light field capture technology, neural rendering techniques, and implementation details. For display-focused aspects of light field technology, see Section 2.5.3.
7.4.1 What is a Light Field?
To understand light field capture, we must first grasp the concept of a light field itself.
The light field is a vector function that describes the amount of light flowing in every direction through every point in space.
- Wikipedia
In simpler terms, a light field encompasses all the light passing through a given area (imagine a window) from every possible direction. This comprehensive capture of light information is what enables the creation of truly interactive and dynamic visual experiences.
7.4.2 Key Characteristics of Light Fields:
- Captures ALL light passing through a defined space
- Records light from ALL directions
- Essentially creates one 180° image per pixel
The implications of this are profound. Unlike a standard photograph or even a 360° image, a light field capture allows viewers to change their perspective within the scene, revealing new details and altering reflections as if they were physically present in the environment.
7.4.3 Light Field Capture Technology
Capturing a light field requires specialized equipment and techniques. One method involves using a robotic arm equipped with a camera featuring a fisheye or wide-angle lens. This setup systematically moves the camera to capture images from multiple positions, creating a comprehensive sphere of light field data.
This process results in a dataset that allows viewers to:
- Move their perspective within the captured scene
- Observe changes in reflections and object positions based on their viewpoint
- Experience a level of immersion far beyond traditional photography or videography
7.4.4 Understanding Light Fields in Practice
To better grasp the concept of light fields, consider the following visualization:
In this example, we can see how light from a single point on an object (in this case, a goat) passes through multiple points on an imaginary window. Each of these light paths represents a different viewing angle, and a light field capture stores information for all of these paths.
For each point in the captured space, you must store light information from all possible directions—creating massive data requirements.
This comprehensive data capture is what allows for the dynamic, interactive nature of light field displays.
7.4.5 Light Field Video
Recent advancements have led to the development of light field video technology. As presented at SIGGRAPH 2020, this technology introduces an end-to-end system for capturing, reconstructing, compressing, and rendering high-quality, immersive light field video content.
We present immersive light video with a layered mesh representation. Most digital videos are either flat and two dimensional or they provide some depth perception through binocular parallax showing different but predetermined points of view for each eye. In contrast, we have built an end to end system for capturing, reconstructing, compressing and rendering high quality, immersive light field video content.
- Quote from Light field video
The capture rig for this technology consists of.
Capture rig consists of 46 times synchronized action sports cameras mounted on a 92 centimeter diameter plastic hemisphere. It is inexpensive and relatively easy to fabricate.
- Quote from Light field video
7.4.6 Neural Rendering
Neural rendering is an emerging field that combines traditional computer graphics with machine learning techniques to create more realistic and dynamic visual content. While not explicitly covered in this book, it’s worth mentioning as it’s closely related to advanced light field technology.
Neural rendering can:
- Enhance the quality of captured light fields
- Generate novel views from sparse input data
- Create photorealistic renderings of 3D scenes
This technology has the potential to overcome some of the limitations of traditional light field capture, such as the need for dense camera arrays.
7.4.7 Implications and Applications
The potential applications for light field technology and neural rendering are vast and exciting:
- Virtual Reality (VR): Creating ultra-realistic, navigable environments for VR experiences.
- Augmented Reality (AR): Enhancing real-world scenes with perfectly integrated digital elements.
- Film and Entertainment: Allowing viewers to explore scenes from different angles, adding a new dimension to storytelling.
- Scientific Visualization: Providing researchers with tools to examine complex 3D data in unprecedented detail.
- Architecture and Design: Enabling immersive walkthroughs of buildings and spaces before they’re constructed.
7.4.8 Challenges and Future Developments
While light field capture and neural rendering represent significant leaps forward in immersive imaging technology, they’re not without challenges:
- Data Volume: Capturing and storing light field data requires enormous amounts of storage and processing power.
- Capture Complexity: Current capture methods can be time-consuming and require specialized equipment.
- Display Technology: Developing displays capable of reproducing light fields accurately is an ongoing area of research.
As technology advances, we can expect these challenges to be addressed, leading to more accessible and widespread use of light field capture and neural rendering technologies.
7.4.9 Conclusion
Light field capture and neural rendering stand at the forefront of immersive imaging technology, promising to revolutionize how we capture, view, and interact with visual content. As research progresses and technology improves, we can look forward to increasingly realistic and interactive visual experiences that blur the line between the digital and physical worlds.
7.5 Integrating Captured Reality in XR Experiences
Integrating captured reality into Extended Reality (XR) experiences is a crucial aspect of creating immersive and realistic virtual environments. This process involves combining various reality capture techniques with XR technologies to create seamless blends of real and virtual elements.
7.5.1 360-Degree Video Integration
360-degree video represents a fundamental reality capture technique that captures real-world environments for immersive reproduction in XR applications. This section focuses on the technical implementation and integration aspects of 360-degree video capture systems.
7.5.1.1 Technical Implementation
360-degree video capture involves specialized camera systems and processing pipelines:
Camera Systems: - Multi-camera rigs with overlapping fields of view - Dedicated 360-degree cameras (e.g., Ricoh Theta, Insta360) - Synchronization requirements for multi-camera setups
Processing Pipeline: - Stitching algorithms to combine multiple camera feeds - Geometric correction and calibration - Resolution optimization and compression
In a VR headset, you can simply turn your head to look around in any direction. This represents one of the most effective applications for mobile VR headsets.
7.5.1.2 Reality Capture Characteristics
Technical specifications and limitations: - Degrees of Freedom: Three rotational DOF (3DOF) - pitch, yaw, roll - Interactivity Constraints: View-only interaction, no positional tracking - Resolution Distribution: Non-uniform pixel density across viewing angles - Temporal Synchronization: Frame-rate matching between capture and playback systems
7.5.1.3 Integration Challenges
Key technical challenges in 360-degree video integration: - Parallax Issues: Stitching artifacts from multi-camera setups - Motion Sickness: Vestibular-visual mismatch in mobile content - Bandwidth Requirements: High-resolution spherical video streaming - Storage Optimization: Efficient encoding for immersive content
For entertainment applications and user experience considerations of 360-degree videos, see Section 6.6.2.2.
7.5.2 Computer Vision for Reality Integration
Computer vision plays a crucial role in integrating captured reality into XR experiences. It enables systems to understand and interact with the visual world, allowing for more seamless blending of real and virtual elements.
7.5.2.1 Motion Tracking
Motion tracking is a fundamental computer vision technique used to follow objects or features across video frames. It can be implemented in two main ways:
Video-based tracking: This method relies solely on analyzing sequential video frames to detect and follow features.
Combined with IMU: Motion tracking can be enhanced by integrating data from an Inertial Measurement Unit (IMU). This fusion of visual and sensor data often provides more robust and accurate tracking results.
7.5.2.2 SLAM (Simultaneous Localization and Mapping)
SLAM is a real-time technique that allows a system to: - Map an unknown environment - Track its own position within that environment simultaneously
This technology is crucial for applications like autonomous robots and augmented reality systems that need to understand and navigate their surroundings in real-time.
SLAM was originally developed for robotics, helping robots simultaneously map their environment while tracking their own position within that space.
7.5.3 AR Cloud and Mirror Worlds
The concept of SLAM is evolving towards more ambitious applications:
- AR Cloud: A persistent, shared AR experience across devices and users.
- Mirror World: A digital twin of the physical world, continuously updated and accessible.
This technology enables the creation of digital mirror worlds—similar to the digital twin concept we discussed earlier. By continuously scanning different locations, you build digital versions of various environments. When you return to these places, the system recognizes your location much more quickly.

7.5.4 Photogrammetry Integration
Photogrammetry plays a significant role in creating realistic 3D assets for XR experiences. These photogrammetry-based models can be integrated into virtual environments to create more authentic and detailed scenes.
Once you’ve made the initial investment—learning the workflow, acquiring equipment like drones, and mastering the associated software—you can create detailed 3D models for VR/AR applications in real-world settings.
7.5.5 Challenges in Reality Integration
Integrating captured reality into XR experiences comes with several challenges:
- Data Processing: Handling large amounts of captured data in real-time.
- Seamless Blending: Ensuring that real and virtual elements blend naturally.
- Real-time Performance: Maintaining high frame rates and low latency for immersive experiences.
- Lighting and Shadows: Matching virtual lighting to real-world conditions.
- Occlusion Handling: Correctly handling cases where real objects should occlude virtual ones and vice versa.
7.5.6 Future Directions
As reality capture and XR technologies continue to evolve, we can expect to see:
- More seamless integration of real and virtual elements
- Improved real-time performance for complex captured environments
- Enhanced collaborative experiences in shared AR environments
- More sophisticated use of AI for understanding and interacting with captured reality
7.5.7 Conclusion
Integrating captured reality into XR experiences represents a frontier in creating truly immersive and realistic virtual environments. By combining various reality capture techniques with advanced XR technologies, developers can create experiences that blur the line between the real and virtual worlds. As these technologies continue to advance, we can expect to see increasingly sophisticated and seamless integrations of captured reality in XR applications across various industries.
7.6 Ethics and Privacy in Reality Capture
Reality capture technologies raise significant ethical and privacy considerations that you need to keep in mind when developing XR experiences. The ability to capture detailed 3D representations of environments and people, combined with the potential for unintended bystander capture and the large amounts of sensitive data generated, creates responsibilities around consent, data security, and representation.
Key considerations include:
- Consent protocols: Ensuring individuals who may be captured are aware and have given informed permission, particularly challenging with technologies like 360-degree cameras and photogrammetry in public spaces
- Bystander privacy: Addressing unintended capture of people who haven’t consented, especially when using tools that stream data to cloud services
- Data governance: Implementing secure storage, clear retention policies, and appropriate access controls for captured data
- Representation and bias: Being mindful of what you choose to capture and preserve, avoiding stereotypes, and respecting cultural sensitivities
- Authenticity: Maintaining transparency about what has been captured versus digitally altered, particularly important when photorealistic capture combined with AI enables sophisticated manipulation
The comprehensive discussion of privacy, consent, data governance, representation, and best practices for ethical reality capture can be found in Chapter 9. That chapter also addresses emerging concerns around deepfakes and manipulated reality, regulatory compliance requirements like GDPR, and frameworks for responsible development of reality capture applications.
7.7 Further Reading
Chapter 7 focused on the various techniques and technologies used to capture real-world environments and objects for use in XR applications. We explored methods such as photogrammetry, 3D scanning, volumetric video capture, and light field technology. The chapter also covered the integration of captured reality into XR experiences and the ethical considerations surrounding these practices. To deepen your understanding of reality capture and its applications in XR, consider the following resources:
- Reality Capture (from Epic): https://www.capturingreality.com/
- Home of the Reality Capture photogrammetry application from Epic Games.
- Capturing Reality Community: https://dev.epicgames.com/community/capturing-reality
- Community forum discussing various aspects of reality capture technologies and techniques.