There's a race happening in the AI industry that most people misunderstand. The headlines are about scale — who has the most data, the most parameters, the most tokens in their training set. OpenAI, Google, Anthropic, and a dozen well-funded startups are competing to build the largest, most comprehensive datasets in history. The assumption behind the race is simple: more data equals better AI.

For general knowledge, this is probably true. A model that's read more of the internet knows more about history, science, coding, and the broad shape of the world. Scale wins.

For local commerce, this assumption is completely wrong. A global dataset that knows a little about every restaurant in the world is less useful than a deep dataset that knows everything about restaurants in Calgary. And understanding why requires rethinking what "better data" actually means when the query is inherently local.

The locality of intent

Consider the queries that drive local commerce. "Find me a quiet restaurant near Kensington with a private room for eight." "I need a physiotherapist in Edmonton who specializes in shoulder rehab." "What hotels in Banff allow large dogs with no fee?" "Who has the best deal on an electric SUV in Calgary right now?"

Every one of these queries has a geographic anchor. The user doesn't want any restaurant — they want one in Kensington. Not any physio — one in Edmonton. Not any hotel — one in Banff. The intent is inherently local, which means the data needed to satisfy it is inherently local.

A global dataset might know that Banff has hotels and that some are pet-friendly. But it almost certainly doesn't know that the Ridgeline Inn has no weight limit, no fee, and direct trail access — or that they just had a cancellation for next weekend and have three king suites available at $219. That level of detail only exists in a dataset focused specifically on Banff hospitality.

The locality of intent means that depth in a specific market beats breadth across all markets, every time. A model that knows five general facts about ten thousand cities is less useful for any individual user than a model that knows five hundred facts about the user's specific city.

Depth versus breadth in practice

Let's make this concrete. A family in Toronto is planning a winter vacation in Banff. They ask their AI agent: "Plan us a four-day trip to Banff. We have a 70-pound dog. We want to ski, eat well, and the kids want to try dog sledding."

What a broad, global dataset produces: "Banff is a popular ski destination in the Canadian Rockies. Some hotels are pet-friendly. There are several dog sledding operators in the area. Popular restaurants include [list from 2024 travel blog]. Here are some options to consider." This is a starting point, not a plan. The family still has to research pet policies, check availability, verify which dog sledding operators run in January, and figure out restaurants themselves.

What a deep, local dataset produces: "Stay at the Ridgeline Inn — they allow dogs of any size with no fee and you're steps from Fenland Trail for morning walks. They have a family suite available January 15-18 at $229/night. For skiing, the Norquay shuttle picks up from your hotel at 8:30 AM. Howling Dog Tours runs their winter dog sledding experience Wednesdays and Saturdays through March and just had a cancellation for January 16 — they can take your group at 1 PM. Dinner on your first night at The Bison on Bear Street — the chef's winter tasting menu is running this week at $85/person and they have a table for four at 7:30. The restaurant is ten minutes' walk from your hotel. For the kids' second night, Block Kitchen on Banff Avenue does excellent pizza and has a dog-friendly heated patio."

The difference isn't about the model's intelligence. The same model could produce either answer. The difference is entirely about the data. The first answer comes from a dataset that covers Banff at the same thin depth it covers ten thousand other destinations. The second comes from a dataset that covers Banff deeply — specific properties, current availability, real policies, live signals, and the kind of operational detail that only exists when you have dense coverage of a local market.

Why global players can't build depth

The obvious question: why don't the big players just build depth in every market? Google has resources to cover every restaurant in every city. OpenAI could scrape enough data to know everything about everywhere. Why can't global scale solve the local depth problem?

Three structural reasons.

Depth requires relationships, not crawlers. The data that makes a local recommendation truly useful — the hotel's real pet policy, the chef's unadvertised tasting menu, the dealership's actual margin flexibility, the physio's specific specialization — doesn't exist on any crawlable web page. It lives in the heads of business operators and can only be captured through direct relationships. Building those relationships in one city takes months. Building them in a thousand cities simultaneously is organizationally impossible for any company, regardless of resources.

Freshness requires local presence. Local data goes stale fast. The hotel's availability changes daily. The restaurant's specials change weekly. The dealership's inventory changes with every sale and trade-in. Keeping this data fresh requires ongoing communication with businesses — not a one-time scrape, but a continuous channel where businesses share updates as they happen. A global platform trying to maintain real-time data freshness across thousands of markets faces a maintenance burden that scales linearly with markets. A local-first platform that maintains deep freshness in one market before expanding to the next has a manageable, focused operation.

Context is local. Understanding what makes a business recommendation good in Banff requires understanding Banff. The fact that trail-adjacent hotels matter more in Banff than downtown hotels. The fact that "pet-friendly" in Banff often comes with aggressive weight limits because of wildlife concerns. The fact that January availability is tighter than March because of holiday overflow. This contextual knowledge is market-specific and can't be generalized from a global dataset. A data layer built by people who understand the Banff market produces better data for Banff than a global layer built by people who cover Banff as one of ten thousand markets.

The geographic moat

Here's the strategic insight that makes local-first data strategies so compelling: geographic density creates a self-reinforcing moat that's nearly impossible to replicate.

Once a data layer has deep coverage in a market — say, fifty businesses across hospitality, dining, activities, and services in Banff — the network effect takes over. Agents query this data and produce excellent recommendations. Excellent recommendations drive transactions. Transactions demonstrate value to businesses, which encourages more businesses to contribute. More businesses mean deeper coverage, which means even better recommendations.

A global competitor trying to enter this market faces the classic cold start problem. They don't have the business relationships, so they don't have the data. Without the data, agents don't query them. Without agent queries, there are no transactions to demonstrate value. Without demonstrated value, businesses don't contribute. The local-first player has a functioning flywheel. The global entrant has a chicken-and-egg problem.

This is why the Uber model worked: win city by city, not globally. This is why DoorDash overtook Grubhub: by achieving density in specific markets rather than spreading thin across many. Every geographically-networked business learns the same lesson — depth in one market is worth more than shallow coverage of a hundred markets.

The playbook: dominate one market, then expand

The local-first strategy follows a specific playbook that applies to any data layer targeting local commerce.

Step one: Choose a market with high query volume and concentrated business density. Banff is ideal: a small geographic area with a high concentration of hospitality and tourism businesses, plus enormous query volume from travelers worldwide planning trips. The market is small enough to achieve density quickly but high-value enough to demonstrate clear commercial value.

Step two: Build deep coverage across key sectors. Not just hotels — hotels, restaurants, activity providers, transportation, services. The goal is to enable composite recommendations: a full trip itinerary, not just a hotel booking. Composite recommendations are what cross the threshold from "useful" to "transformative" in the user's experience.

Step three: Use the first market as proof for adjacent markets. When agents produce consistently excellent recommendations for Banff, the same agents want equivalent data for Canmore, Lake Louise, Jasper, Whistler, and the Okanagan. The first market isn't just a revenue center — it's a demonstration that unlocks every adjacent market. Hotels in Jasper see what's happening in Banff and want the same thing.

Step four: Expand to adjacent geographies and sectors. From mountain resort towns to urban centers — Calgary, Vancouver, Toronto, Montreal. From hospitality to auto dealerships, professional services, pet care, dining. Each new market follows the same density playbook: achieve depth, demonstrate value, expand.

The playbook is sequential, not parallel. You don't try to cover ten markets simultaneously with shallow data. You achieve density in one, use that as proof, and expand from a position of demonstrated strength.

What this means for the data race

The AI data race is real, but the winners of the local commerce layer won't be the companies with the most data globally. They'll be the companies with the deepest data locally.

Google, with its massive global infrastructure, will continue to provide broad coverage. OpenAI and Anthropic, through their partnerships and retrieval systems, will access global web data. These are powerful resources for general AI tasks. But for the specific task of recommending local businesses with the depth and freshness that agents need, they face a structural disadvantage against purpose-built local data layers.

The hotel owner in Banff doesn't need a data partner that covers every hotel in the world. She needs a data partner that covers every relevant query for Banff so deeply that every AI agent recommends her property when it's the right fit. The auto dealer in Calgary doesn't need his inventory visible in a global database. He needs it visible to every agent that handles a car-buying query in Calgary.

Local beats global because local queries need local depth, and local depth requires local relationships that global players can't efficiently build. The race isn't won by the biggest dataset. It's won by the deepest dataset in each market that matters. And that race is being won market by market, right now, by data layers that understand this fundamental truth: in local commerce, depth is density, density is value, and value compounds geographically.

The global players will catch on eventually. But by then, the local-first players will have the relationships, the data, and the network effects that make catching up not just difficult but structurally uneconomical. The time to build depth is now — before the market decides who owns each geography.

Why Local Beats Global in the AI Data Race