3D Home Walkthroughs from Photos
Open-source 3D Gaussian Splatting for property captures. What works, what breaks, and why an agentic engineer can deliver a Matterport-killer with no subscription and a Saturday afternoon of compute.
Welcome and TL;DR #
This is a working reference for building photorealistic 3D walkthroughs of real properties using the open 3D Gaussian Splatting (3DGS) stack as of mid-2026. The framing is practical: capture with a phone, train on a rented GPU for cents, ship a static web viewer and a video flythrough and an editable 3D file. No vendor lock-in. No monthly bill.
Yes, it is feasible right now. The standard open pipeline is capture, then pose estimation (COLMAP or the newer feed-forward MASt3R / VGGT), then train a splat (Nerfstudio gsplat, Brush, or OpenSplat), then view in a browser (SuperSplat, Spark, mkkellogg's three.js viewer), then optionally extract a mesh (2D Gaussian Splatting plus SuGaR) and render a video flythrough. All of it is Apache or MIT licensed.
Two shifts between 2024 and 2026 changed the calculus. First, pose-free and feed-forward reconstruction. DUSt3R then MASt3R then VGGT (Visual Geometry Grounded Transformer) replaced the slow and brittle COLMAP step for sparse inputs. VGGT won the CVPR 2025 Best Paper award. AnySplat at SIGGRAPH Asia 2025 and InstantSplat now run end to end from raw photos in seconds.
Second, glTF KHR_gaussian_splatting is in release-candidate and expected to ratify Q2 2026. Once it lands, splats become a first-class 3D asset format like .glb, and the editable-asset deliverable becomes a single file that any compliant viewer opens.
The honest gotcha: indoor real estate is the hardest scene class for vanilla 3DGS. Mirrors, glass, white walls, and multi-room topology all break naive captures. Specialised 2025 papers like MirrorGaussian, GlassGaussian, Ref-Unlock, LighthouseGS, and FreeSplat++ address each, but most are still research code, not turnkey tools. That gap is where the engineering value lives.
Under the hood #
3D Gaussian Splatting (Kerbl et al., SIGGRAPH 2023) represents a scene not as triangles or as a neural radiance field, but as millions of small 3D ellipsoids called Gaussians. Each one has a position, orientation, scale, opacity, and a view-dependent color expressed in spherical harmonics. To render, you project them to 2D and alpha-blend front to back. To train, you compare the render to the photo, backprop the loss, and the Gaussians migrate and grow or split to fit the scene.
Two practical consequences. First, it renders fast. Sixty frames per second on commodity GPUs, and now also on phones. Second, the asset is the model. There is no separate mesh, texture, or material extraction step unless you ask for one. The file is a few hundred megabytes of point cloud plus per-point parameters.
Photogrammetry-based virtual tour tech (Matterport and its kin) stitches static panoramas at fixed dollhouse points and lets you teleport between them. A splat is a continuous scene you can fly through. The visual upgrade is closer to "a photo come alive" than to "a better 360 viewer." Zillow shipped this as SkyTour in July 2025, and Matterport responded with its own 3D Exteriors product.
Variants that matter #
The 2023 paper has spawned a small zoo of follow-ups. These are the ones to know.
| Variant | What it adds | When to reach for it |
|---|---|---|
| Original 3DGS | Per-point ellipsoid plus SH color | Reference quality. Research-only license, important for commercial use. |
| Splatfacto (Nerfstudio) | Production-friendly 3DGS in the gsplat library, Apache licensed | The default open-source trainer. |
| 2D Gaussian Splatting | Disks instead of ellipsoids, better surface alignment | When you want a clean mesh afterward. |
| Mip-Splatting | Anti-aliasing across zoom levels | Walkthroughs where scale changes a lot. |
| Scaffold-GS / Octree-GS | Anchor-based, much lower VRAM | Whole-house scenes that blow up vanilla 3DGS. |
| Hierarchical 3DGS / LODGE / LOBE-GS | Level-of-detail for huge scenes | Estates, multi-floor, mansions. |
| 3DGUT (NVIDIA, in gsplat) | Unscented transform, proper handling of fisheye and rolling shutter | Anything captured with a real-world camera. Big quality bump. |
| MirrorGaussian / GlassGaussian / Ref-Unlock | Explicit reflection and transmission modeling | Bathrooms, kitchens, modern homes with glass walls. |
| LighthouseGS | Indoor structure-aware splatting for panoramic mobile captures | The 360-camera route. |
| FreeSplat++ | Generalizable indoor whole-scene reconstruction | Multi-room indoor sequences. |
For a first project on Nerfstudio, you only need to know two: Splatfacto for the default run, and 2DGS if you also want a mesh. Everything else is targeted at a specific failure mode.
The feed-forward turn #
The old loop was slow. COLMAP runs Structure-from-Motion to recover camera poses, often takes thirty minutes or more, and fails on textureless walls. Then 3DGS training adds another half hour to several hours. The new generation skips or replaces COLMAP entirely.
DUSt3R (CVPR 2024, Naver Labs) predicts dense pointmaps from image pairs without calibration or poses. MASt3R extends it with feature matching and longer-range correspondence. VGGT (CVPR 2025 Best Paper, Meta) processes more than two images at once in a single transformer and outputs cameras plus dense 3D directly, with no global-optimization post step. The reported gain over COLMAP on sparse inputs is up to fifty percent in completeness.
InstantSplat wraps this into an end-to-end pipeline: a deep model initializes dense points, then iteratively co-optimizes Gaussians and poses. The paper reports roughly thirty times faster than COLMAP plus 3DGS. AnySplat uses VGGT for initialization and feeds forward Gaussians from unconstrained views. MV-DUSt3R+ (CVPR 2025 Oral) reconstructs scenes from sparse views in around two seconds.
For a thirty to eighty photo capture of one room, these feed-forward methods can produce a usable splat in under a minute on a single GPU. For a whole-house capture of several hundred photos, COLMAP (or glomap, its modern replacement) is still the safe path.
Capture #
The cheapest viable rig is a modern phone. Notes from the Volinga capture guide, the Polyvia mobile capture guide, and an NIH study on capture method impact:
- Coverage. Aim for around seventy percent overlap between adjacent frames. Orbit each room in at least three layers at different heights. Tiny rooms want eighty to a hundred and fifty photos. Open-plan living rooms want two hundred to four hundred.
- Lighting. Kill the variable. Close blinds, turn on every lamp, avoid mixed indoor and outdoor light. Cloudy days are great for exteriors. View-dependent lighting baked into the splat at one moment is fine; lighting that changes between frames ruins it.
- Motion. Walk slowly, landscape orientation, manual exposure lock, autofocus locked or fixed if possible.
- Mirrors and glass. Cover them, tape them over, or accept ghosting and floaters there. The 2025 reflection-aware variants help but most are not yet in Nerfstudio mainline.
- White walls. COLMAP fails because there are no features. This is where the MASt3R or VGGT pose initialization pays off, since they learn priors and do not rely on local features.
- Chunking. Large scenes degrade. Split a house into "front yard", "main floor", "upstairs", train each separately, then composite in SuperSplat or in your viewer.
Capture devices, ranked for hobby and listings
Camera pose estimation #
Pick one path.
Classical. COLMAP is the well-trodden free baseline. glomap is a faster modern SfM, around ten times quicker, used in the current Nerfstudio docs. If you have a normal indoor scene with reasonable texture, this still works.
Modern feed-forward. MASt3R or VGGT. Robust on featureless walls and sparse views. Especially good for the indoor white-wall problem.
End-to-end skip. InstantSplat or AnySplat. Let the optimizer figure out poses jointly with Gaussians.
Nerfstudio's CLI handles COLMAP transparently with ns-process-data images. For VGGT or MASt3R you will run a small script to dump a transforms.json and feed it into the trainer.
Training #
Top recommendation: Nerfstudio plus gsplat. The de-facto open standard. Apache licensed, actively maintained, integrates NVIDIA's 3DGUT, supports F-Theta cameras for fisheye and 360 capture, and trains in RGB, depth, and normal modes. As of gsplat 1.5.x it has native 2DGS support with roughly four gigabytes of VRAM overhead, compression, and the new GsplatViewer.
Alternatives by use case
- Brush. Rust plus WebGPU plus Burn. Runs on macOS Apple Silicon, Windows, Linux, Android, and the browser. No CUDA dependency. If you are on a Mac without an NVIDIA GPU, this is the cleanest local option. Trains from COLMAP or Nerfstudio data and outputs a standard PLY file.
- OpenSplat. Production-grade C++ with a CPU fallback. Apache licensed.
- Grendel-GS (ICLR 2025). Distributed multi-GPU training for large scenes. Only matters for whole-estate captures.
- Postshot. Closed-source, fifteen dollars a month, crisper output than Splatfacto on many real-estate scenes per a long Nerfstudio issue tracking this. The difference appears to come mostly from its SfM step, not the splatting itself. Worth knowing as a quality bar but not the tool if "fully open" is your pitch.
Training time, rough numbers on consumer hardware (3060 or 4070 class)
| Scope | Pose method | Total time |
|---|---|---|
| One room, ~200 photos | COLMAP | ~15 min SfM plus 25 min splat = ~40 min |
| One room, ~30 photos | VGGT / InstantSplat | ~1 to 3 min |
| Whole house, ~1500 photos | COLMAP plus Splatfacto on a single 4090 | 1 to 3 hours |
Cloud GPU pricing for context. As of 2026, Vast.ai vs RunPod sits at roughly twenty-five cents per hour for an RTX 4090 on Vast and forty cents on RunPod. A thirty-minute training run costs four to twenty cents. A whole-house run on an A100 or H100 still lands under five dollars.
Web viewing #
Almost everything here is open, browser-based, and works on iOS Safari.
For a listing, the simplest deployable artifact is a single static HTML page that loads a PLY or SPLAT file from a CDN. That is an iframe a realtor can paste into MLS or a Squarespace site. No back end required.
Mesh extraction #
When you want to drop the property into Blender, Unreal, or Unity as a real mesh (for floor plans, renovation visualization, or game-engine integration):
- 2D Gaussian Splatting plus TSDF fusion. Currently the cleanest path. The 2D disks align to surfaces, so depth maps are consistent and TSDF integration yields a watertight mesh. Now native in gsplat.
- SuGaR (CVPR 2024). Regularizes 3D Gaussians to lie on surfaces, then runs Poisson reconstruction. Fast: a mesh in around two hours including training. The KIRI 3DGS Render Blender plugin wraps a Gaussian Frosting variant.
- 2D-SuGaR (2025). Combines both ideas, uses pretrained normal and depth priors. Best geometric accuracy reported to date for indoor scenes.
Once you have a mesh, it is the standard OBJ / GLB workflow. Texture-bake from the splat colors, drop into Blender for cleanup, export. The splat itself remains your photoreal deliverable; the mesh is the editable one.
Video flythrough #
Two options.
In Nerfstudio's viewer or GsplatViewer. Define a keyframed camera path, export to MP4. Built in. Fastest path.
In Blender after meshing. Standard camera animation with the splat as a Blender geometry-nodes asset (via the KIRI plugin) or the mesh as a regular object. Higher production value and slower.
For a drone-flyover plus interior-walkthrough cinematic in the Zillow style, the workflow is: drone footage to exterior splat, interior phone capture per room to interior splats, composite in SuperSplat, camera path in the viewer, export.
Game-engine integration #
If you ever want the asset deliverable to be something a client can walk through in VR.
bevy_gaussian_splatting compatible with Brush output.SaaS incumbents #
What you are competing with, and what is actually under their hood.
| Tool | What it is | Real cost | Under the hood |
|---|---|---|---|
| Matterport | Market default since around 2014. Tripod 360 scanner. | ~$5,495 Pro3 camera plus $69 to $309 per month cloud. | Pre-3DGS: stitched 360 panoramas plus depth at fixed dollhouse points. You teleport between scan points. Recently added exterior 3DGS via "Matterport 3D Exteriors." |
| Zillow SkyTour | Drone-only exterior tours. | Free to the consumer. Cost falls on Zillow. | Drone footage to SfM to 3DGS. Confirmed open splat tech wrapped in a proprietary capture and processing pipeline. |
| Polycam | Phone capture app plus cloud. | Freemium. Pro tier around $15 per month. | Photogrammetry plus 3DGS plus LiDAR fusion. Pro features gate larger captures. |
| Luma AI | Phone capture app plus web viewer. | Freemium. | NeRF heritage, now 3DGS too. Closed pipeline, polished UX. Sells a UE plugin. |
| KIRI Engine | Phone or web plus cloud. | Free tier limited to three exports per week. | Photogrammetry plus 3DGS plus their Blender add-on (Apache 2.0). |
| Scaniverse (Niantic) | Phone capture, on-device 3DGS. | Free, unlimited. | The most generous consumer free tier. Reconstruction runs locally on the iPhone. |
| Postshot | Desktop trainer plus cloud option. | From $15 per month. | High-quality 3DGS, tuned for arch and real-estate. The closed-source quality bar. |
You will not beat Matterport on polish, dollhouse mode, and MLS integrations cheaply. Their moat is workflow and integrations, not the tech. You will beat them on photorealism plus exteriors plus custom branding, because 3DGS is a real step up in fidelity. You can beat all of them on cost-per-project at low volume: a self-hosted splat on a static CDN is effectively zero dollars a month after a one-time training run. And you can beat them on "I own the asset" — the deliverable is a PLY file your client keeps, hosted wherever they like, with no vendor lock-in. That story sells.
A realistic professional 3DGS contractor in 2026 charges $2,250 to $5,000 per single property (per Future3D's pricing pages). The hobby positioning is below that. The agentic-engineering pitch is in the process automation: capture, upload, pipeline, deliverable URL, mostly hands-off.
Feasibility and gotchas #
What works well
- Exterior captures of detached homes, especially with a drone.
- Single rooms with diffuse lighting and textured surfaces.
- Wood floors, tile, brick, fabric: anything with surface texture.
- Outdoor architectural shots in flat overcast light.
What still hurts
- Mirrors and large glass panes. The "mirror in the bathroom" problem. Solutions exist (Seeing Through Reflections, MirrorGaussian, GlassGaussian, Ref-Unlock) but require manual mirror masking or research-grade integration. Practical workaround: cover mirrors during capture.
- Featureless white walls. Vanilla COLMAP fails. Use MASt3R or VGGT pose initialization, or add visual texture during capture (taped paper markers you remove in SuperSplat later).
- Low light or mixed light. Standardize lighting before shooting. Don't trust the algorithm to compensate.
- Multi-floor topology. Splats have no semantic floor or wall understanding. Connections between rooms can become ghost geometry. Best practice: capture rooms separately and composite.
- Editability of details. You can crop, delete, and recolor splats in SuperSplat. You cannot move a chair or change wall paint. The mesh-extraction path is for that.
- Scale and privacy. A splat captures everything: framed photos on the wall, labels on the medicine cabinet, kid's drawings. You will need a pre-publish cleanup pass.
Where the engineering value sits #
This is the answer to the question, "what does an agentic engineer add that a SaaS button cannot." Five things.
- Pipeline orchestration. A script that takes a Google Drive or iCloud folder of photos and returns a hosted splat URL, an MP4, and a GLB mesh. No consumer free button does all three.
- Quality recovery. Detecting and re-running failure cases automatically: mirror artifacts, drifting poses, undertrained regions.
- Privacy and cleanup automation. Scripted masking of personal items, blurring of license plates, removing the dog.
- Cost containment. Caching, incremental training, spot-instance scheduling. A whole-property pipeline costs cents to a few dollars, not a subscription.
- Branded deliverable. A single static page on Cloudflare Pages or Vercel with the listing's domain, a custom intro animation, and the splat viewer. That is the iframeable artifact you hand a realtor.
Most consumer apps (Polycam, Luma, KIRI, even Postshot) are wrapping the same open splat algorithms with a UI and a subscription. The honest engineering pitch is not "I invented better tech." It is "I built the operations layer around the open tech and pass the savings on." That is a true story you can deliver.
First project recipe #
To learn the tech and produce a portfolio piece in one go, here is a minimal sequence.
- Pick a small property you have permission to shoot. Outdoors first, to avoid the indoor failure modes on your first try.
- Capture two hundred to four hundred photos with a phone. Slow walk, around seventy percent overlap, manual exposure lock, overcast or shaded.
- Stand up a workstation. Either a local PC with a 3060+ GPU, an M-series Mac running Brush, or a RunPod RTX 4090 pod at around forty cents per hour.
- Run Nerfstudio:
ns-process-data images --data ./photos --output-dir ./processed ns-train splatfacto --data ./processed - View in
ns-viewerto QC the result. Iterate on capture if you see floaters or holes. - Clean up in SuperSplat (browser). Crop the bounding box, delete street noise, remove yourself and any camera artifacts.
- Export the cleaned PLY.
- Build a static viewer page with mkkellogg / GaussianSplats3D or Spark. Single HTML file plus the PLY on R2 or S3.
- Optional: mesh with 2DGS plus TSDF fusion to GLB for the editable-asset deliverable.
- Optional: render a flythrough video via Nerfstudio's camera-path tool to MP4.
That is your portfolio item. One URL, one MP4, one GLB. Three deliverables out of one capture, all open source, hosted for essentially zero dollars per month. The story for prospects: this used to require a five-thousand-dollar Matterport camera and a monthly cloud bill, and now it is a Saturday afternoon and an npm run build.
What to watch through 2026 #
- glTF KHR_gaussian_splatting ratification (Q2 2026). Once browsers, Blender, and DCC tools all import the same GLB-with-splats format, integration friction goes to zero.
- AnySplat and VGGT as the new default. Watch for them becoming the de-facto Nerfstudio pose step, replacing COLMAP entirely.
- Reflection-aware models in mainline Nerfstudio. Whichever of MirrorGaussian, Ref-Unlock, or GlassGaussian gets cleanly integrated will dramatically improve indoor capture quality on bathrooms, kitchens, and modern glass-heavy homes.
- Mobile training. Brush on Android, Scaniverse on iPhone. On-device capture-and-train is the consumer endgame and worth watching for what it does to SaaS pricing.
Sources #
State of the field
- The State of Gaussian Splatting in 2026 — The Future 3D
- 7 Cutting-Edge Open-Source Gaussian Splatting Tools for 2026 — Cybergarden
- Gaussian Splatting vs Matterport — The Future 3D
- Gaussian Splatting for Real Estate — The Future 3D
- Virtual Tours Built a Billion-Dollar Industry. Gaussian Splatting Is the Disruptor — SplatLabs
Trainers and frameworks
- Nerfstudio / gsplat (GitHub)
- gsplat 1.5.3 Released — Radiance Fields
- Nerfstudio integrates 3DGUT — Radiance Fields
- Nerfstudio releases GsplatViewer — Radiance Fields
- Splatfacto documentation — Nerfstudio
- Brush: 3D Reconstruction for All (GitHub)
- Brush — Cross-Platform Local Gaussian Splatting Trainer — Radiance Fields
- Postshot quality discussion — Nerfstudio issue #3421
Web viewers and editors
- SuperSplat — The Future 3D
- mkkellogg / GaussianSplats3D (three.js viewer)
- Spark (Niantic) — Hacker News announcement
Pose-free and feed-forward methods
- InstantSplat (arXiv)
- AnySplat (GitHub, SIGGRAPH Asia 2025)
- MV-DUSt3R+ project page (CVPR 2025 Oral)
- VGGT for Dense 3D Reconstruction — LearnOpenCV
- Evaluation of DUSt3R / MASt3R / VGGT on Aerial Blocks (arXiv)
- Awesome-DUST3R curated list (GitHub)
Mesh extraction
- SuGaR (CVPR 2024, GitHub)
- SuGaR arXiv paper
- 2D-SuGaR (arXiv 2025)
- How to Extract 3D Meshes from Gaussian Splats — Inverse Render
Large scenes, indoor specifics, reflections
- FreeSplat++ Indoor Scene Reconstruction (arXiv)
- LighthouseGS (arXiv)
- LODGE Level-of-Detail Gaussian Splatting (arXiv)
- LOBE-GS Load-Balanced GS (arXiv)
- MirrorGaussian (ECCV 2024)
- Seeing Through Reflections (arXiv 2025)
- Reflections Unlock / Ref-Unlock (arXiv)
- GlassGaussian (ResearchGate)
Capture, tooling, and engine integration
- Capturing Gaussian Splats: Lessons from the Field — Volinga
- Impact of Data Capture Methods on 3D Reconstruction — NIH / PMC
- Gaussian Splatting Mobile Capture Guide — Polyvia3D
- Plugins for Blender, Unreal, Unity — Radiance Fields
- Gaussian Splatting Unity and Unreal Guide — Polyvia3D
- UnrealSplat (GitHub)
- Scaniverse — Niantic
- Best Free 3D Scanner Apps 2026 — KIRI Engine
Standards and industry
- Khronos Announces glTF Gaussian Splatting Extension
- 3D Gaussian Splats Added to glTF Asset Standard — OGC Blog
- 3D Gaussian Splats added to glTF standard — CG Channel
Zillow SkyTour and industry adoption
- Zillow SkyTour debut — Zillow newsroom
- Zillow's AI strategy — Tech Brew
- Zillow SkyTour uses Gaussian Splats — Lidar News
- Zillow Adds Gaussian Splatting Support with SkyTour — Radiance Fields
- Inside Zillow's summer launch — Real Estate News