3D
Pics to Walkthrough 3DGS field manual
v1.0 / 2026-05
Field manual / 2026 edition

3D Home Walkthroughs from Photos

Open-source 3D Gaussian Splatting for property captures. What works, what breaks, and why an agentic engineer can deliver a Matterport-killer with no subscription and a Saturday afternoon of compute.

Chapter 00·~3 min

Welcome and TL;DR #

This is a working reference for building photorealistic 3D walkthroughs of real properties using the open 3D Gaussian Splatting (3DGS) stack as of mid-2026. The framing is practical: capture with a phone, train on a rented GPU for cents, ship a static web viewer and a video flythrough and an editable 3D file. No vendor lock-in. No monthly bill.

The short version

Yes, it is feasible right now. The standard open pipeline is capture, then pose estimation (COLMAP or the newer feed-forward MASt3R / VGGT), then train a splat (Nerfstudio gsplat, Brush, or OpenSplat), then view in a browser (SuperSplat, Spark, mkkellogg's three.js viewer), then optionally extract a mesh (2D Gaussian Splatting plus SuGaR) and render a video flythrough. All of it is Apache or MIT licensed.

Two shifts between 2024 and 2026 changed the calculus. First, pose-free and feed-forward reconstruction. DUSt3R then MASt3R then VGGT (Visual Geometry Grounded Transformer) replaced the slow and brittle COLMAP step for sparse inputs. VGGT won the CVPR 2025 Best Paper award. AnySplat at SIGGRAPH Asia 2025 and InstantSplat now run end to end from raw photos in seconds.

Second, glTF KHR_gaussian_splatting is in release-candidate and expected to ratify Q2 2026. Once it lands, splats become a first-class 3D asset format like .glb, and the editable-asset deliverable becomes a single file that any compliant viewer opens.

The honest gotcha: indoor real estate is the hardest scene class for vanilla 3DGS. Mirrors, glass, white walls, and multi-room topology all break naive captures. Specialised 2025 papers like MirrorGaussian, GlassGaussian, Ref-Unlock, LighthouseGS, and FreeSplat++ address each, but most are still research code, not turnkey tools. That gap is where the engineering value lives.

Chapter 01·The algorithm

Under the hood #

3D Gaussian Splatting (Kerbl et al., SIGGRAPH 2023) represents a scene not as triangles or as a neural radiance field, but as millions of small 3D ellipsoids called Gaussians. Each one has a position, orientation, scale, opacity, and a view-dependent color expressed in spherical harmonics. To render, you project them to 2D and alpha-blend front to back. To train, you compare the render to the photo, backprop the loss, and the Gaussians migrate and grow or split to fit the scene.

Two practical consequences. First, it renders fast. Sixty frames per second on commodity GPUs, and now also on phones. Second, the asset is the model. There is no separate mesh, texture, or material extraction step unless you ask for one. The file is a few hundred megabytes of point cloud plus per-point parameters.

Why this matters for homes

Photogrammetry-based virtual tour tech (Matterport and its kin) stitches static panoramas at fixed dollhouse points and lets you teleport between them. A splat is a continuous scene you can fly through. The visual upgrade is closer to "a photo come alive" than to "a better 360 viewer." Zillow shipped this as SkyTour in July 2025, and Matterport responded with its own 3D Exteriors product.

Chapter 02·Pick the right tool

Variants that matter #

The 2023 paper has spawned a small zoo of follow-ups. These are the ones to know.

VariantWhat it addsWhen to reach for it
Original 3DGSPer-point ellipsoid plus SH colorReference quality. Research-only license, important for commercial use.
Splatfacto (Nerfstudio)Production-friendly 3DGS in the gsplat library, Apache licensedThe default open-source trainer.
2D Gaussian SplattingDisks instead of ellipsoids, better surface alignmentWhen you want a clean mesh afterward.
Mip-SplattingAnti-aliasing across zoom levelsWalkthroughs where scale changes a lot.
Scaffold-GS / Octree-GSAnchor-based, much lower VRAMWhole-house scenes that blow up vanilla 3DGS.
Hierarchical 3DGS / LODGE / LOBE-GSLevel-of-detail for huge scenesEstates, multi-floor, mansions.
3DGUT (NVIDIA, in gsplat)Unscented transform, proper handling of fisheye and rolling shutterAnything captured with a real-world camera. Big quality bump.
MirrorGaussian / GlassGaussian / Ref-UnlockExplicit reflection and transmission modelingBathrooms, kitchens, modern homes with glass walls.
LighthouseGSIndoor structure-aware splatting for panoramic mobile capturesThe 360-camera route.
FreeSplat++Generalizable indoor whole-scene reconstructionMulti-room indoor sequences.

For a first project on Nerfstudio, you only need to know two: Splatfacto for the default run, and 2DGS if you also want a mesh. Everything else is targeted at a specific failure mode.

Chapter 03·2024 to 2026

The feed-forward turn #

The old loop was slow. COLMAP runs Structure-from-Motion to recover camera poses, often takes thirty minutes or more, and fails on textureless walls. Then 3DGS training adds another half hour to several hours. The new generation skips or replaces COLMAP entirely.

DUSt3R (CVPR 2024, Naver Labs) predicts dense pointmaps from image pairs without calibration or poses. MASt3R extends it with feature matching and longer-range correspondence. VGGT (CVPR 2025 Best Paper, Meta) processes more than two images at once in a single transformer and outputs cameras plus dense 3D directly, with no global-optimization post step. The reported gain over COLMAP on sparse inputs is up to fifty percent in completeness.

InstantSplat wraps this into an end-to-end pipeline: a deep model initializes dense points, then iteratively co-optimizes Gaussians and poses. The paper reports roughly thirty times faster than COLMAP plus 3DGS. AnySplat uses VGGT for initialization and feeds forward Gaussians from unconstrained views. MV-DUSt3R+ (CVPR 2025 Oral) reconstructs scenes from sparse views in around two seconds.

Rule of thumb

For a thirty to eighty photo capture of one room, these feed-forward methods can produce a usable splat in under a minute on a single GPU. For a whole-house capture of several hundred photos, COLMAP (or glomap, its modern replacement) is still the safe path.

Chapter 04·The pipeline begins

Capture #

The cheapest viable rig is a modern phone. Notes from the Volinga capture guide, the Polyvia mobile capture guide, and an NIH study on capture method impact:

  • Coverage. Aim for around seventy percent overlap between adjacent frames. Orbit each room in at least three layers at different heights. Tiny rooms want eighty to a hundred and fifty photos. Open-plan living rooms want two hundred to four hundred.
  • Lighting. Kill the variable. Close blinds, turn on every lamp, avoid mixed indoor and outdoor light. Cloudy days are great for exteriors. View-dependent lighting baked into the splat at one moment is fine; lighting that changes between frames ruins it.
  • Motion. Walk slowly, landscape orientation, manual exposure lock, autofocus locked or fixed if possible.
  • Mirrors and glass. Cover them, tape them over, or accept ghosting and floaters there. The 2025 reflection-aware variants help but most are not yet in Nerfstudio mainline.
  • White walls. COLMAP fails because there are no features. This is where the MASt3R or VGGT pose initialization pays off, since they learn priors and do not rely on local features.
  • Chunking. Large scenes degrade. Split a house into "front yard", "main floor", "upstairs", train each separately, then composite in SuperSplat or in your viewer.

Capture devices, ranked for hobby and listings

Recent iPhone or Pixel
Totally fine. Pro iPhones with LiDAR help on textureless surfaces if you go through Polycam's hybrid pipeline.
Insta360 or Ricoh Theta
360 cameras. Extract perspective tiles from each frame, then treat as normal photos. LighthouseGS targets this case.
Mirrorless or DSLR
Only if you want maximum quality and don't mind the capture time.
Consumer drone
Essential for exteriors. This is what Zillow's SkyTour uses end to end.
Chapter 05·Where are the cameras

Camera pose estimation #

Pick one path.

Classical. COLMAP is the well-trodden free baseline. glomap is a faster modern SfM, around ten times quicker, used in the current Nerfstudio docs. If you have a normal indoor scene with reasonable texture, this still works.

Modern feed-forward. MASt3R or VGGT. Robust on featureless walls and sparse views. Especially good for the indoor white-wall problem.

End-to-end skip. InstantSplat or AnySplat. Let the optimizer figure out poses jointly with Gaussians.

Nerfstudio's CLI handles COLMAP transparently with ns-process-data images. For VGGT or MASt3R you will run a small script to dump a transforms.json and feed it into the trainer.

Chapter 06·Build the splat

Training #

Top recommendation: Nerfstudio plus gsplat. The de-facto open standard. Apache licensed, actively maintained, integrates NVIDIA's 3DGUT, supports F-Theta cameras for fisheye and 360 capture, and trains in RGB, depth, and normal modes. As of gsplat 1.5.x it has native 2DGS support with roughly four gigabytes of VRAM overhead, compression, and the new GsplatViewer.

Alternatives by use case

  • Brush. Rust plus WebGPU plus Burn. Runs on macOS Apple Silicon, Windows, Linux, Android, and the browser. No CUDA dependency. If you are on a Mac without an NVIDIA GPU, this is the cleanest local option. Trains from COLMAP or Nerfstudio data and outputs a standard PLY file.
  • OpenSplat. Production-grade C++ with a CPU fallback. Apache licensed.
  • Grendel-GS (ICLR 2025). Distributed multi-GPU training for large scenes. Only matters for whole-estate captures.
  • Postshot. Closed-source, fifteen dollars a month, crisper output than Splatfacto on many real-estate scenes per a long Nerfstudio issue tracking this. The difference appears to come mostly from its SfM step, not the splatting itself. Worth knowing as a quality bar but not the tool if "fully open" is your pitch.

Training time, rough numbers on consumer hardware (3060 or 4070 class)

ScopePose methodTotal time
One room, ~200 photosCOLMAP~15 min SfM plus 25 min splat = ~40 min
One room, ~30 photosVGGT / InstantSplat~1 to 3 min
Whole house, ~1500 photosCOLMAP plus Splatfacto on a single 40901 to 3 hours

Cloud GPU pricing for context. As of 2026, Vast.ai vs RunPod sits at roughly twenty-five cents per hour for an RTX 4090 on Vast and forty cents on RunPod. A thirty-minute training run costs four to twenty cents. A whole-house run on an A100 or H100 still lands under five dollars.

Chapter 07·The web deliverable

Web viewing #

Almost everything here is open, browser-based, and works on iOS Safari.

PlayCanvas's browser-based editor and viewer. MIT licensed. Loads PLY, SPLAT, and SPZ. Lets you crop, clean, recolor, and re-export. The closest open-source thing to a Matterport-style end-user view.
Niantic's 2025 three.js integration library. The "drop a splat into your existing three.js scene" path.
The most mature three.js viewer, MIT licensed and well documented.
gsplat.js
Kevin Kwok / antimatter15's minimal WebGL viewer.
Brush web build
Brush itself runs in the browser via WebGPU. Same engine for training and viewing.

For a listing, the simplest deployable artifact is a single static HTML page that loads a PLY or SPLAT file from a CDN. That is an iframe a realtor can paste into MLS or a Squarespace site. No back end required.

Chapter 08·For Blender and Unreal

Mesh extraction #

When you want to drop the property into Blender, Unreal, or Unity as a real mesh (for floor plans, renovation visualization, or game-engine integration):

  • 2D Gaussian Splatting plus TSDF fusion. Currently the cleanest path. The 2D disks align to surfaces, so depth maps are consistent and TSDF integration yields a watertight mesh. Now native in gsplat.
  • SuGaR (CVPR 2024). Regularizes 3D Gaussians to lie on surfaces, then runs Poisson reconstruction. Fast: a mesh in around two hours including training. The KIRI 3DGS Render Blender plugin wraps a Gaussian Frosting variant.
  • 2D-SuGaR (2025). Combines both ideas, uses pretrained normal and depth priors. Best geometric accuracy reported to date for indoor scenes.

Once you have a mesh, it is the standard OBJ / GLB workflow. Texture-bake from the splat colors, drop into Blender for cleanup, export. The splat itself remains your photoreal deliverable; the mesh is the editable one.

Chapter 09·Pre-rendered flythrough

Video flythrough #

Two options.

In Nerfstudio's viewer or GsplatViewer. Define a keyframed camera path, export to MP4. Built in. Fastest path.

In Blender after meshing. Standard camera animation with the splat as a Blender geometry-nodes asset (via the KIRI plugin) or the mesh as a regular object. Higher production value and slower.

For a drone-flyover plus interior-walkthrough cinematic in the Zillow style, the workflow is: drone footage to exterior splat, interior phone capture per room to interior splats, composite in SuperSplat, camera path in the viewer, export.

Chapter 10·Unity, Unreal, Blender

Game-engine integration #

If you ever want the asset deliverable to be something a client can walk through in VR.

Unity
Aras Pranckevičius's UnityGaussianSplatting. Open, Unity 6 LTS, all render pipelines. Requires Vulkan or D3D12.
Unreal Engine 5.5+
UnrealSplat (Niagara-based, ~2M splats at 60 fps), or Luma's commercial UE plugin.
Blender
KIRI 3DGS Render v4.0, Apache 2.0.
Bevy (Rust)
bevy_gaussian_splatting compatible with Brush output.
Chapter 11·Know the field

SaaS incumbents #

What you are competing with, and what is actually under their hood.

ToolWhat it isReal costUnder the hood
MatterportMarket default since around 2014. Tripod 360 scanner.~$5,495 Pro3 camera plus $69 to $309 per month cloud.Pre-3DGS: stitched 360 panoramas plus depth at fixed dollhouse points. You teleport between scan points. Recently added exterior 3DGS via "Matterport 3D Exteriors."
Zillow SkyTourDrone-only exterior tours.Free to the consumer. Cost falls on Zillow.Drone footage to SfM to 3DGS. Confirmed open splat tech wrapped in a proprietary capture and processing pipeline.
PolycamPhone capture app plus cloud.Freemium. Pro tier around $15 per month.Photogrammetry plus 3DGS plus LiDAR fusion. Pro features gate larger captures.
Luma AIPhone capture app plus web viewer.Freemium.NeRF heritage, now 3DGS too. Closed pipeline, polished UX. Sells a UE plugin.
KIRI EnginePhone or web plus cloud.Free tier limited to three exports per week.Photogrammetry plus 3DGS plus their Blender add-on (Apache 2.0).
Scaniverse (Niantic)Phone capture, on-device 3DGS.Free, unlimited.The most generous consumer free tier. Reconstruction runs locally on the iPhone.
PostshotDesktop trainer plus cloud option.From $15 per month.High-quality 3DGS, tuned for arch and real-estate. The closed-source quality bar.
Honest competitive read

You will not beat Matterport on polish, dollhouse mode, and MLS integrations cheaply. Their moat is workflow and integrations, not the tech. You will beat them on photorealism plus exteriors plus custom branding, because 3DGS is a real step up in fidelity. You can beat all of them on cost-per-project at low volume: a self-hosted splat on a static CDN is effectively zero dollars a month after a one-time training run. And you can beat them on "I own the asset" — the deliverable is a PLY file your client keeps, hosted wherever they like, with no vendor lock-in. That story sells.

A realistic professional 3DGS contractor in 2026 charges $2,250 to $5,000 per single property (per Future3D's pricing pages). The hobby positioning is below that. The agentic-engineering pitch is in the process automation: capture, upload, pipeline, deliverable URL, mostly hands-off.

Chapter 12·What breaks

Feasibility and gotchas #

What works well

  • Exterior captures of detached homes, especially with a drone.
  • Single rooms with diffuse lighting and textured surfaces.
  • Wood floors, tile, brick, fabric: anything with surface texture.
  • Outdoor architectural shots in flat overcast light.

What still hurts

  • Mirrors and large glass panes. The "mirror in the bathroom" problem. Solutions exist (Seeing Through Reflections, MirrorGaussian, GlassGaussian, Ref-Unlock) but require manual mirror masking or research-grade integration. Practical workaround: cover mirrors during capture.
  • Featureless white walls. Vanilla COLMAP fails. Use MASt3R or VGGT pose initialization, or add visual texture during capture (taped paper markers you remove in SuperSplat later).
  • Low light or mixed light. Standardize lighting before shooting. Don't trust the algorithm to compensate.
  • Multi-floor topology. Splats have no semantic floor or wall understanding. Connections between rooms can become ghost geometry. Best practice: capture rooms separately and composite.
  • Editability of details. You can crop, delete, and recolor splats in SuperSplat. You cannot move a chair or change wall paint. The mesh-extraction path is for that.
  • Scale and privacy. A splat captures everything: framed photos on the wall, labels on the medicine cabinet, kid's drawings. You will need a pre-publish cleanup pass.
Chapter 13·The pitch

Where the engineering value sits #

This is the answer to the question, "what does an agentic engineer add that a SaaS button cannot." Five things.

  1. Pipeline orchestration. A script that takes a Google Drive or iCloud folder of photos and returns a hosted splat URL, an MP4, and a GLB mesh. No consumer free button does all three.
  2. Quality recovery. Detecting and re-running failure cases automatically: mirror artifacts, drifting poses, undertrained regions.
  3. Privacy and cleanup automation. Scripted masking of personal items, blurring of license plates, removing the dog.
  4. Cost containment. Caching, incremental training, spot-instance scheduling. A whole-property pipeline costs cents to a few dollars, not a subscription.
  5. Branded deliverable. A single static page on Cloudflare Pages or Vercel with the listing's domain, a custom intro animation, and the splat viewer. That is the iframeable artifact you hand a realtor.
The "wrapper" critique, sharpened

Most consumer apps (Polycam, Luma, KIRI, even Postshot) are wrapping the same open splat algorithms with a UI and a subscription. The honest engineering pitch is not "I invented better tech." It is "I built the operations layer around the open tech and pass the savings on." That is a true story you can deliver.

Chapter 14·Do the thing

First project recipe #

To learn the tech and produce a portfolio piece in one go, here is a minimal sequence.

  1. Pick a small property you have permission to shoot. Outdoors first, to avoid the indoor failure modes on your first try.
  2. Capture two hundred to four hundred photos with a phone. Slow walk, around seventy percent overlap, manual exposure lock, overcast or shaded.
  3. Stand up a workstation. Either a local PC with a 3060+ GPU, an M-series Mac running Brush, or a RunPod RTX 4090 pod at around forty cents per hour.
  4. Run Nerfstudio:
    ns-process-data images --data ./photos --output-dir ./processed
    ns-train splatfacto --data ./processed
  5. View in ns-viewer to QC the result. Iterate on capture if you see floaters or holes.
  6. Clean up in SuperSplat (browser). Crop the bounding box, delete street noise, remove yourself and any camera artifacts.
  7. Export the cleaned PLY.
  8. Build a static viewer page with mkkellogg / GaussianSplats3D or Spark. Single HTML file plus the PLY on R2 or S3.
  9. Optional: mesh with 2DGS plus TSDF fusion to GLB for the editable-asset deliverable.
  10. Optional: render a flythrough video via Nerfstudio's camera-path tool to MP4.

That is your portfolio item. One URL, one MP4, one GLB. Three deliverables out of one capture, all open source, hosted for essentially zero dollars per month. The story for prospects: this used to require a five-thousand-dollar Matterport camera and a monthly cloud bill, and now it is a Saturday afternoon and an npm run build.

Chapter 15·Active fronts

What to watch through 2026 #

  • glTF KHR_gaussian_splatting ratification (Q2 2026). Once browsers, Blender, and DCC tools all import the same GLB-with-splats format, integration friction goes to zero.
  • AnySplat and VGGT as the new default. Watch for them becoming the de-facto Nerfstudio pose step, replacing COLMAP entirely.
  • Reflection-aware models in mainline Nerfstudio. Whichever of MirrorGaussian, Ref-Unlock, or GlassGaussian gets cleanly integrated will dramatically improve indoor capture quality on bathrooms, kitchens, and modern glass-heavy homes.
  • Mobile training. Brush on Android, Scaniverse on iPhone. On-device capture-and-train is the consumer endgame and worth watching for what it does to SaaS pricing.
Chapter 16·Primary references

Sources #

State of the field

Trainers and frameworks

Web viewers and editors

Pose-free and feed-forward methods

Mesh extraction

Large scenes, indoor specifics, reflections

Capture, tooling, and engine integration

Standards and industry

Zillow SkyTour and industry adoption

Cloud GPU and cost