The HistAerial dataset was built from grayscale images representing 9 square kilometer areas (~ 6k per 6k pixels per image). These images were acquired between 1970 and 1990 over the France country with an optical camera mounted on an aircraft system. The acquisitions were done during sunny periods when the sun was at its zenith in order to avoid the presence of clouds and shadows. Only grayscale images were provided for the studied period. They have been stored and made freely available on the website of the French National Institut of Geography (IGN).
The historical aerial images have been manually segmented by the geographers from the Cancer & Environment department of the Léon Bérard Center. Seven classes corresponding to coarse ground occupancy labels have been provided : Orchard, Vineyard, Urban, Forest, Water, Crop and Land. A supplementary class corresponding to the "other" possibilities has been drawn but not included in the HistAerial dataset.
Non overlaping patches of different sizes have been extracted in a supervised manner from the manually segmented historical aerial images. Only the patches corresponding to one and unique class have been considered. The class was determined across the whole patch. If it at least two pixels did not remained to the same class, the patch was rejected. Since the patches have been acquired without any overlap, the number of retrieved patches decreases when the size of the patches increases. The maximal size has been set to 100 per 100 pixels in order to keep enough patches available for each size and allow the HistAerial dataset to be used with deep learning architectures.
The use of patches has several advantages. It is possible to compare data that include the same quantity of information. It is also possible to resize the patches in a bilinear manner without distording their relative textures and shapes. This is particularly useful for the Deep Convolutional Neural Networks that require a fixed size sample at their input layer. The patches provide a natural way to compare the impact of the spatial context on the ground occupancy classification.
More than 4.9 millions of non-overlaping samples are provided in the HistAerial dataset (1 set and 2 subsets counted separately).
The historical aerial images have been acquired during a 20 years period (1970-1990).
Three different scales are provided for the non-overlaping samples of the HistAerial dataset. They represent the same classes with a different level of spatial context.