Gabriel Tseng, Ivan Zvonkov, Catherine Nakalembe, Hannah Kerner
Remote sensing datasets pose a number of interesting challenges to machine learning researchers and practitioners, from domain shift (spatially, semantically and temporally) to highly imbalanced labels. In addition, the outputs of models trained on remote sensing datasets can contribute to positive societal impacts, for example in food security and climate change. However, there are many barriers that limit the accessibility of satellite data to the machine learning community, including a lack of large labeled datasets as well as an understanding of the range of satellite products available, how these products should be processed, and how to manage multi-dimensional geospatial data. To lower these barriers and facilitate the use of satellite datasets by the machine learning community, we present CropHarvest---a satellite dataset of more than 90,000 geographically-diverse samples with agricultural labels. The data and accompanying python package are available at https://github.com/nasaharvest/cropharvest.