Earth observation foundation model embeddings released in the commons
We have shared global embeddings derived from satellite imagery on source.coop. This common resource of preprocessed data allows researchers and developers to skip over steps of extensive computation, to accelerate and focus on development of solutions that benefit Earth.
In building the Earth Index application, we have generated multiple large-scale datasets to power it. That includes the second version of our global, cloud-optimized satellite imagery composite (source.coop), and the embeddings Earth Index searches over. Our imagery processing and embeddings generation pipelines have been highly optimized, resulting in substantial cost savings orders of magnitude cheaper than other methods, meaning we can responsibly steward our resources to rapidly process new data and new models when available.
These resources have potential to be widely useful. We’re committed to being a contributor to the Earth+AI community. We use openly available data from Sentinel-2 and models from the Technical University of Munich. Now we’re openly sharing this processed data for use in new and unexpected ways. Thanks to Patrick J. McGovern Foundation and AWS for supporting this work and contribution to the open ecosystem.
Global Embeddings
Embeddings act like unique ‘fingerprints’ for geographic locations, capturing the visual characteristics present in the improved Earth Index v2 imagery.
The embeddings are generated using the Softcon model from Zhu XLabs and result in an embedding of length 384. Each embedding captures a 320 square meter patch of the earth, gridded using our MajorTom-based grid. We’ve encoded these embeddings, their IDs and centroids in geoparquet. The GeoParquet is named similarly to the imagery and references the original MGRS/UTM tile which the imagery covered.
What does this enable? As we explored when first launching global Earth Index coverage, embeddings unlock powerful new capabilities:
- Content-Based Search: Find locations visually similar to a query image or location, anywhere on Earth. Want to find landscapes that resemble a specific type of wetland or agricultural practice? Embeddings make this possible at scale.
- Change Detection: Compare embeddings over time to identify subtle or drastic landscape changes.
- Object and Pattern Recognition: Train models to identify specific features (e.g., renewable energy infrastructure, new settlements, deforestation patches) far more efficiently.
Why source.coop?
We believe that foundational planetary data should be open and accessible. Releasing these datasets on source.coop, a platform dedicated to fostering collaboration around open data for public good, aligns perfectly with this vision. Source.coop provides the infrastructure necessary to host and share these large-scale datasets, enabling the community to easily access, explore, and build upon them. Thank you Radiant Earth for Source Cooperative.
What Can You Build?
The combination of high-quality global imagery and corresponding embeddings opens up vast possibilities:
- Monitor deforestation or reforestation efforts with greater accuracy.
- Track the expansion of urban areas or infrastructure development globally.
- Identify potential sites for renewable energy installations based on terrain and land cover similarity.
- Analyze agricultural patterns and land use change across continents.
- Support disaster response by quickly identifying affected areas or similar pre-disaster landscapes.
Get Started Today
We are incredibly excited to see what the global community will do with these resources. Both the imagery and the Global Embeddings datasets are now available on source.coop. Build and share your findings and creations with the broader community.
Access Earth Index Global Embeddings:
Access Earth Index Imagery:
.png)
