How the score is calculated

A precise, slightly nerdy walkthrough of every input, formula, threshold, and reality check that produces tonight’s 0–10. Aimed at people who want the receipts.

The shape of it

The score is one number from 0 to 10 that answers: "is the sky tonight worth stepping outside for?" Two physical realities matter for naked-eye stargazing — opaque clouds in the way, and a bright moon washing out the dim stars. Everything else is downstream of that.

We compute the score in six small steps from a weather forecast, then run two reality checks against it (one against multiple independent models, one against actual airport observations after sunset). A separate user-feedback loop calibrates regional bias over time.

1. Forecasting cloud cover (Open-Meteo)

We pull an 8-day hourly forecast from Open-Meteo for your exact lat/lng. Importantly, we ask for cloud cover split across three altitude bands, not the rolled-up "total" number that most weather apps show.

Each band is a percentage in [0, 100] and they overlap — a sky with 60% low cloud and 40% high cirrus reports as both, not as 100%. The split lets us weight altitudes differently in step 2.

// Open-Meteo, per location, every hour:
cloud_cover_low    // 0–100%   (< 6500 ft / ~2 km AGL)
cloud_cover_mid    // 0–100%   (6500–20000 ft / 2–6 km)
cloud_cover_high   // 0–100%   (> 20000 ft / 6+ km, mostly cirrus)

2. Layer-weighting (low > mid > high)

For naked-eye stargazing, low clouds are the enemy. Stratus and stratocumulus blot out the entire sky behind them. Mid clouds (altostratus, altocumulus) dim most stars. High thin cirrus, on the other hand, just slightly fuzzes things — you can still see Orion through it.

weightedCloud =
    cloud_low  × 0.5    // blocks everything
  + cloud_mid  × 0.3    // blocks most stars
  + cloud_high × 0.2    // thin cirrus dims, doesn't block

So we weight by altitude before averaging. A sky full of cirrus and a sky full of stratus produce very different stargazing experiences; the unweighted "total cloud cover" number that most apps use can’t tell them apart.

3. Time-weighting (prime hours > late night)

Stargazers decide based on what they’ll actually see at 22:00, not what’s true at 03:30. So when we average the night, we weight the "prime" hours (21:00 → 00:59 local) at 2× the late-night tail. A clear 04:00 can no longer drag a cloudy 22:00 up by half a point.

headlineCloud = Σ ( weightedCloud(h) × w(h) ) / Σ w(h)

  w(h) = 2   if 21 ≤ h ≤ 24  or  h = 0    // prime hours
       = 1   otherwise                     // late-night tail

The hours we average over are the astronomical-darkness window — sun ≥ 18° below the horizon, when twilight glow is fully gone. That can extend well past 04:00 in summer, which is why the time-weighting matters more in May–August.

4. Moon penalty (illumination, with a late-night discount)

A bright moon doesn’t darken the sky — but it does drown out everything except the brightest stars. We apply a penalty proportional to illumination, in three tiers:

  • 0% penalty if illumination < 25% (new moon and crescent).
  • illumination × 0.5 if 25–75% (around quarter moon).
  • illumination × 1.0 if > 75% (gibbous and full).

If the best 2-hour window starts between 00:00 and 03:00 AND illumination is over 50%, we multiply the penalty by 0.6 — the moon is likely setting during the window, so the actual sky impact is less than the raw illumination suggests.

5. Visibility gate

Even with a perfect new-moon sky, an overcast night should not score above zero. The visibility ramp enforces that: as the sky goes from fully overcast (clearness 0%) to 30% clear, the score scales linearly to its full value. Below 30% clearness, even a perfect dark sky is multiplied by less than 1.

clearness  = 100 − headlineCloud
visibility = min(1, clearness / 30)

  // clearness 0–30%  →  visibility ramps 0 → 1
  // clearness 30%+   →  visibility = 1

Without this gate, a 100%-cloudy night with a new moon scored ~4/10 from the moon-darkness term alone — physically impossible. Now it correctly scores 0.0.

6. The headline formula

darkness = 100 − moonPenalty
score    = visibility × ( clearness × 0.6 + darkness × 0.4 ) / 10
         // clamped to 0–10, rounded to 1 decimal

That’s it for the headline. Cloud term gets 60% of the weight, moon-darkness term gets 40%, and the visibility gate vetoes the whole thing if there are too many clouds. Same formula computes the per-hour timeline, the rolling 2-hour Peak window, and the fixed 21:00–01:00 Prime window.


Two reality checks on top

A single forecast can be confidently wrong. Two layers above the headline catch that.

Neither check rewrites the score upward. They only flag uncertainty (multi-model) or pull the score down toward observed reality (nowcast).

Reality check 1: multi-model disagreement

Open-Meteo will happily give you the same forecast pulled separately from three different global weather models. We ask for all three and compare what they say about the prime hours.

// Three independent global numerical weather models:
//   ECMWF IFS         (European Centre, broadly best in Europe)
//   GFS               (NOAA, best US coverage)
//   ICON-EU           (Deutscher Wetterdienst regional)

primeSpread = max pairwise | model_i − model_j |
              over hours 21:00 → 01:00 local

confidence = "low"   if primeSpread > 25
           = "high"  otherwise

If they disagree by more than 25 percentage points on average across 21:00–01:00, the forecast is wobbly. The score still ships, but the card surfaces a "Low confidence" badge so you know to look outside before committing.

Reality check 2: post-sunset observation clamp

Once the sun is down, "tonight" is no longer a forecast — it’s starting to be observable reality. We pull the most recent METAR (mandatory hourly observation from every commercial airport) within ~65 km of you, parsing the FEW/SCT/BKN/OVC layer codes into the same low/mid/high split we use internally.

If the observed weighted cloud is materially higher than what the forecast averaged for tonight, we drop the headline score to match reality. The thresholds depend on the source: METAR is real measurement so we trust small divergences; Open-Meteo current is model + station blend (the same model that produced the forecast), so we keep a conservative threshold to avoid double-counting model error.

// Post-sunset only.
// Source priority:
//   1. METAR  via aviationweather.gov bbox (~65 km, 130 km fallback)
//   2. Open-Meteo current  (model + station blend, last resort)

// Trigger thresholds depend on source trust:
//   METAR:           Δ > 15 pts  AND  observed > 35%
//   Open-Meteo cur.: Δ > 30 pts  AND  observed > 50%

// If triggered:
score = min( forecastScore, observationScore )
        // never raised — only lowered

Crucially, the clamp only ever lowers the score. If reality looks better than the forecast we don’t bump up — the forecast already had its shot, and over-promising hurts trust more than under-promising.


Calibration: every alert email is a learning loop

Models drift. Specific cities have specific bugs (sea breeze nights in Amsterdam, late inversions in Salt Lake, marine layer in coastal California). The only way to learn the local bias of a model is to compare what it predicted against what people actually saw.

So every alert email contains three buttons: ✨ Clear, 🌤 Partly, ☁️ Cloudy. Each is an HMAC-signed link carrying the entire prediction context — no login, no DB lookup, one tap. We store the click in starsout_feedback.

// Each alert email contains 3 buttons. Each button is an
// HMAC-SHA256 signed link carrying the entire prediction context:
{
  v: 1,
  sub: <subscriber_id>,
  d:   "2026-05-07",
  s:   8.2,        // predicted score
  c:   18,         // predicted cloud cover %
  lat, lng,
  conf: "high",
  src:  "metar",
  city, locale,
  exp:  <unix epoch + 7 days>,
  o:    "clear" | "partly" | "cloudy"
}

// Token is stand-alone: no DB lookup needed to validate.
// One click → upsert into starsout_feedback.

// Per-city rollup (view starsout_feedback_calibration):
hit_rate(city, month) =
  100 × clear_or_partly_alerts_with_score≥7 / total_alerts_with_score≥7

A weekly job rolls feedback up per city per month. Once we have ≥4 weeks of feedback per city we have enough signal to apply per-city bias correction live in the score. Until then, we just publish the calibration view internally and watch for systemic miss patterns.


What’s not in the score (and why)

  • Light pollution (Bortle class): shapes the "Visible Tonight" deep-sky list — which galaxies and nebulae are realistic to chase from your spot — but doesn’t move the headline number. The number is "is the sky cooperating?", not "is your sky pristine?".
  • Wind, humidity, precipitation: visible in the timeline if you scroll, but they don’t change the score. They affect comfort and seeing, not whether the sky is opaque to starlight.
  • Atmospheric seeing (turbulence): matters for telescope users (planetary detail, double-star splits) but not for naked-eye stargazing. Hand-wavy proxies from surface weather aren’t reliable enough to include in the headline.
  • Aerosols / wildfire smoke: real driver of star magnitude during smoke season but not yet in the model. Open-Meteo exposes some of this; folding it into the visibility gate is on the roadmap.

Sources

  • Open-Meteo hourly cloud + sunrise + multi-model spread. Free, no auth, attribution-required.
  • Aviation Weather Center METAR observations from every commercial airport, free.
  • Synodic phase calculation pure JS, reference epoch 2000-01-06, period 29.530588861 days. No external API.
  • Forecasts refresh roughly every 30 minutes; METAR every hour; multi-model fetch on every score request, cached 10 minutes.

Why this works the way it does

A stargazing score should fail honestly. Most weather apps tell you "30% cloud" when the sky is overcast because their model is wrong; their UI then implies confidence the data doesn’t deserve. We’d rather show 7.9 with a "low confidence" pill than 8.5 with no caveat.

The score will never be perfect — atmospheric science is hard, and most stargazing failure modes are hyper-local. But every layer here exists because we caught the score lying in a specific way at least once. The calibration loop is how we keep teaching it.

How the StarsOut score is calculated · Methodology