AirDNA Estimates vs. Scraped Airbnb Data

AirDNA and STRecon differ at the methodological level. AirDNA produces occupancy estimates through statistical modeling of available signals. STRecon reads every individual Airbnb calendar to report what is actually booked. When the two diverge — and they often do by 15–25 percentage points on the same listing — the investor underwriting on estimates carries the risk.

This entry covers how each methodology works, where they diverge, and which one fits which decision.

How AirDNA estimates occupancy

AirDNA is an established short-term rental data provider. Its occupancy numbers are statistical estimates, not direct observations. The underlying model is proprietary, but AirDNA's published methodology describes building occupancy projections from inputs like review velocity, listing metadata, pricing patterns, and historical platform data.

Estimates have legitimate uses. They scale to every market Airbnb operates in, produce trailing historical data, and cover dimensions scraping cannot reach directly — realized revenue, for instance, which requires more than forward calendar visibility. For market-level macro analysis, estimates are a reasonable input.

What estimates cannot do is report the actual state of a specific listing's calendar on a given night. That is not a critique of the model; it is a description of what statistical modeling is.

How STRecon measures occupancy

We scrape the public Airbnb calendar of every listing in the markets our users submit. Each calendar is read individually, one listing at a time, covering the next 90 days.

For each night on each calendar, we record whether it is available or not available, and we compute the calendar booked rate over the relevant window. No modeling layer sits between the calendar and the number we publish. If Airbnb's own calendar shows 24 of the next 30 nights as unavailable, we report 24 of 30.

The tradeoff is honest: we see what the calendar shows, no more. We do not see realized revenue, cancellations after the scrape, or historical occupancy prior to the scrape window. We see what is booked, forward-looking, as of last night.

The output is not a single number. STRecon classifies each listing into one of six tiers — Exceptional, Performer, Potential, Watch, Avoid, or Unreliable — and rolls those into a Market Signal of STRONG, MODERATE, CAUTION, or INCONCLUSIVE. The classification combines the 75/55 calendar test with a review-cadence test that checks whether the listing is actually generating reviews at the 2–3-per-month rhythm of a genuinely-booked Airbnb. AirDNA's headline output is a single occupancy figure; STRecon's headline output is a tier distribution and a market verdict.

Methodology side-by-side

Dimension	AirDNA estimate	STRecon scraped data
Source	Statistical model of multiple signals	Direct read of each listing's calendar
Output	Projected occupancy	Observed booked/blocked nights
Time direction	Forward and backward	Forward, next 90 days
Granularity	Market and listing (modeled)	Listing-level (observed)
Covers realized revenue	Yes (modeled)	No
Covers forward calendar	Yes (modeled)	Yes (observed)
Refresh cadence	Varies	Nightly for scoped markets

When estimates and scraped data diverge — why it happens

The two numbers diverge for specific, identifiable reasons:

Review lag. Estimators lean on review velocity to infer bookings. A listing active for years with slow review cadence can book heavily and still show a conservative estimate. A new listing with a review spike can look stronger than its forward calendar supports.
Host calendar behavior. Hosts block nights for personal use, maintenance, or to enforce minimum stays. The calendar shows "unavailable" for all of these. A scraper sees the block; an estimator trained on booking patterns does not. STRecon addresses this at the tier-classification step: a listing with a high calendar booked rate but anemic review cadence lands in the Watch tier and does not count toward the Top Performer share that drives the Market Signal. In markets with high absentee-host density or active arbitrage operators, the Watch tier can be substantial — and treating those listings as validated performers is how investors end up in markets that look hot on the calendar and disappoint in practice.
Market composition shift. When a market adds supply quickly, yesterday's average occupancy is not today's. Models updated on trailing data lag reality. Calendars don't.
Seasonal turning points. At the transition between peak and shoulder, estimates smooth the change; calendars show it sharply.

In our data, divergence is most common in markets with thin review density, rapid supply growth, or unusually long booking lead times.

Which one should you use

The honest answer depends on the decision:

Macro market research, multi-year historical comparison, revenue modeling that requires trailing data: estimates are the right tool. AirDNA's data coverage extends further back in time than any scraper can reproduce.
Listing-level underwriting, "should I buy this specific property" decisions, market-timing calls based on current demand: scraped calendar data is ground truth. Estimates are one step removed from what a listing is actually doing right now.

For most investors evaluating whether to enter a market or acquire a specific property, your question is: what is booked right now, and what is booked 30–60 days out? That's a question about the current state of real calendars, which is what we measure.

Related concepts

Calendar booked rate — the metric scraping produces directly.
The 75/55 rule — the framework we apply on top of scraped data.
How STRecon works — end-to-end methodology.

Run your market at strecon.app. Draw your target area, get the verdict by morning.