MapTrace: Teaching AI to Navigate the World Using Synthetic Data

2/19/2026

Look at a theme park map, and you instantly trace a route from the entrance to the roller coaster, instinctively avoiding fences and fountains. For Multimodal Large Language Models (MLLMs), however, this task has historically been a stumbling block. While AI can identify objects within a map, it lacks the "spatial grammar" to understand connectivity—often drawing paths straight through walls or obstacles. Today, a new research initiative titled "MapTrace" bridges this gap between seeing and navigating. https://storage.googleapis.com/gweb-research2023-media/images/MapTrace-1.width-1250.png Solving the Data Bottleneck The core problem was a lack of training data that explicitly teaches the rules of navigation. Collecting millions of hand-annotated real-world maps is impossible due to scale and privacy. The solution? A fully automated, scalable pipeline for synthetic data generation. Using Gemini 2.5 Pro and Imagen-4, the researchers created a dataset of 2 million annotated map images with valid, logical paths. https://storage.googleapis.com/gweb-research2023-media/images/MapTrace-2.width-1250.png The pipeline operates in four intelligent stages: Generation: An LLM drafts diverse prompts (e.g., "a zoo with interconnected habitats"), which are rendered into images. https://storage.googleapis.com/gweb-research2023-media/images/MapTrace-3.width-1250.png Mask Critic: An AI analyzes the map to identify "walkable" pixels, filtering out unrealistic layouts. https://storage.googleapis.com/gweb-research2023-media/images/MapTrace-4.width-1250.png Graphing: The visual data is converted into a structured graph of nodes and edges. https://storage.googleapis.com/gweb-research2023-media/images/MapTrace-5.width-1250.png Path Critic: Finally, an AI reviews generated routes to ensure they follow human-like logic before adding them to the dataset. A Leap for Gemma and Gemini The results of training on this synthetic data are undeniable. When evaluated on the MapBench benchmark, fine-tuned models showed massive improvements. The Gemini 2.5 Flash model saw its error metric (NDTW) drop significantly from 1.29 to 0.87, achieving state-of-the-art performance. Similarly, the open Gemma 3 27B model increased its success rate by 6.4 points, proving that spatial reasoning is an acquired skill that can be taught. https://storage.googleapis.com/gweb-research2023-media/images/MapTrace-6.width-1250.png What’s next? This capability unlocks a future where robots can navigate warehouses solely by looking at a floor plan, and accessibility tools can guide visually impaired users through complex buildings with precise, step-by-step descriptions. By teaching AI to respect the geometry of our world, we are moving from static recognition to dynamic interaction.