Link Search Menu Expand Document (external link)

Out-of-Domain Land Cover Classification

If you have any questions about this dataset, please reach out to Sherrie Wang (sherwang@berkeley.edu).

While ground truth labels are scarce in low-income regions, they can be plentiful in high-income regions. This suggests that a second strategy for increasing performance in label-scarce regions is to transfer knowledge learned from classifying land cover in high-income regions to low-income ones.

Dataset Overview

We release the global dataset of satellite time series from [1]. The dataset contains data from 692 regions of size 10km x 10km around the globe; for each region, 500 latitude/longitude coordinates are sampled for their satellite time series and land cover type.

Input

The input is time series from the MODIS satellite over the course of a year. Specifically, in each region, 500 points were sampled uniformly at random. At each point, the MODIS Terra Surface Reflectance 8-Day time series was exported for January 1, 2018 to December 31, 2018. MODIS collects 7 bands and NDVI was computed as an eighth feature, resulting in time series of dimension 8 x 46.

Output

The output is land cover type at the pixel in 2018. Global land cover labels came from the MODIS Terra+Aqua Combined Land Cover Product, which classifies every 500m-by-500m pixel into one of 17 land cover classes (e.g., grassland, cropland, desert).

Task

[1] defined a task as 1-shot, 2-way land cover classification tasks in each region. Unlike other classification benchmarks in SustainBench, this benchmark uses the kappa statistic to evaluate models because accuracy and F1-scores can vary widely across regions depending on the class distribution, and it is not clear whether an accuracy or F1-score is good or bad from the values alone.

Meta-training, meta-validation, and meta-test set splits

The authors in [1] sampled 1000 regions uniformly at random from the Earth’s land surface, and removed regions that have fewer than 2 unique land cover classes and regions where one land cover type comprises more than 80% of the region’s area. This resulted in 692 regions. The authors placed the 103 regions from Sub-Saharan Africa into the meta-test set and split the remainder into 485 meta-train and 104 meta-val regions at random. We provide the user with the option of placing any continent into the meta-test set and splitting the other continents’ regions at random between the meta-train and meta-val sets.

We note that, as previously mentioned, existing land cover products tend to be less accurate in low-income regions such as Sub-Saharan Africa than in high-income regions. As a result, the MODIS land cover product used as ground truth will have errors in low-income regions. We suggest users also apply meta-learning and other transfer learning algorithms using other continents (e.g., North America, Europe) as the meta-test set for algorithm evaluation purposes.

Dataloader Configuration

To load the Out-of-Domain Land Cover Classification dataset, use out_of_domain_land_cover in the SustainBench dataloader.

Download

The data can be downloaded here.

Citation

@inproceedings{wang2020meta,
  author={Wang, Sherrie and Rußwurm, Marc and Körner, Marco and Lobell, David B.},
  booktitle={IGARSS 2020 - 2020 IEEE International Geoscience and Remote Sensing Symposium},
  title={Meta-Learning For Few-Shot Time Series Classification},
  year={2020},
  pages={7041-7044},
  doi={10.1109/IGARSS39084.2020.9441016}}

References

[1] S. Wang, M. Rußwurm, M. Körner, and D. B. Lobell. Meta-learning for few-shot time series classification. In IGARSS 2020 - 2020 IEEE International Geoscience and Remote Sensing Symposium, pages 7041–7044, 2020. doi: 10.1109/IGARSS39084.2020.9441016.