Quick tech note: Adapting AlphaEarth embeddings to new workflows

Ben Strong
|
Oct 30, 2025

As remote sensing foundation model embeddings become more mainstream, we’re entering interesting new territory where there is increasing attention paid to not just how to create embeddings but also how to use them. Tools like Earth Index, which make it easy to interact with embeddings through a UI, are clearly part of the solution. (See our paper on the role of digital apps.)

In addition to the apps layer, there is an emerging opportunity to explore the embeddings layer itself. Sometimes a given problem requires remixing embeddings, to better understand problems at a different spatial or temporal scale than how an embeddings dataset was originally produced.

A recent example we have tackled involves AlphaEarth, Google’s embeddings dataset. AlphaEarth embeddings are available at 10m resolution, a useful resolution for many applications. However, we faced a problem when trying to adapt AlphaEarth embeddings for use in Earth Index. Earth Index makes the planet searchable by tiling the Earth into 320m tiles. So the question we faced was: how can we aggregate 10m pixel level embeddings into a single 320m tile embedding?

We attempted two strategies for this aggregation work:

  • Simple bandwise average: A simple bandwise average over all 10m pixels within a tile. This is the strategy that Google suggests for the problem. This results in a 64 dimensional tile embedding.
  • Additional summary statistics: A strategy that includes additional bandwise summary statistics over a tile: mean, standard deviation, min and max. This results in a 64 x 4 = 256 dimensional tile embedding.

The results from our benchmarking tests on several datasets are listed below. The table reports a measure of retrieval quality, precision@250, where we construct a query vector from random positive examples and test what fraction of its 250 closest neighbors belong to the positive class. This process is repeated over many trials to get an average precision@250 value.

I had a hunch that the additional summary statistics would outperform the simple average; after all, they contain strictly more information. Interestingly, the effect of adding additional summary statistics seemed to vary depending on the particular problem that we tested. Some problems were pretty easy for the simple average already, resulting in benchmark saturation or small improvements with additional statistics (“Kansas / Garden City area feedlots” and “Tapajós-Jacareacanga mining”). The “Dhaka brick kilns” benchmark showed similar results on either method, indicating that this problem might require innovations beyond embedding aggregation strategies.

For two problems, “Delmarva CAFOs” and “Roraima / Mucajai mining”, we saw impressive improvements in performance with additional statistics. This indicates that the AlphaEarth pixel-level embeddings that are contained in these tiles must have some diagnostic distribution that is not captured by simple averaging alone. For the Delmarva CAFOs data, for example, an individual poultry CAFO might be a small overall percentage of pixels in a tile. The CAFO “signal”, then, could be averaged out by doing a simple average over the tile.

Other problems to explore in combining embeddings

In addition to the tile aggregation problem, there are many other problems that we’re eager to continue to explore in the embeddings layer, including:

  • Change over time: How can you compare embeddings from two different time periods?
  • Dimension reduction: How can we effectively squeeze high dimensional embeddings into smaller ones?
  • Cross-model combinations: What steps can we take to combine embeddings from different models (including across sensors), in order to leverage their unique strengths?
  • Context window/tile size: Should we adjust the 320m tile size we have used in Earth Index?
  • Embeddings fields: AlphaEarth already incorporates spatial context into each 10m embedding. How can we determine what is being captured in this context?

Have you faced similar challenges in working with embeddings? Reach out! We’d love to collaborate.

Other articles

How we made Amazon Mining Watch more accessible for reporting and enforcement

At COP30, our partners Amazon Conservation and Pulitzer Center launched the new version of Amazon Mining Watch.

Bernardo Loureiro
|
Nov 20, 2025

Building Climate TRACE’s new global view

In preparation for New York Climate Week 2025, we set out to do just that by displaying the global dataset as 3D extrusions on a globe. This is how we built it.

Andrew Milligan
|
Nov 20, 2025

How We Built Climate TRACE’s Animated Plume Layer

I’ve always believed the best visualizations are simple, tangible, and evoke emotion.

Dan McCarey
|
Oct 21, 2025