Processing

The SPAI Library has several functions to easily manage satellite image processing in the processing module.

This is the structure of the processing module:

/processing
 |- autocategorize1D.py
 |- colorize_raster.py
 |- convert_array_to_vector.py
 |- mask_raster.py
 |- normalised_difference.py
 |- px_count.py
 |- read_raster.py
 |- save_table.py

`autocategorize1D`

This function is designed to categorize pixels in a raster into discrete clusters. It utilizes an iterative method to find the centers of predefined cluster categories and then assigns each pixel to the nearest cluster center.

def autocategorize1D(raster, iterations=200, centers=[0.1, 0.35, 0.50, 0.70]):

Parameters:

raster: An array representing the pixel values of an image or a signal to be categorized. It should be a NumPy array with numerical data that represents some property (e.g., intensity, color, elevation).
iterations (optional): An integer specifying the number of iterations for the clustering process to refine the centers of the clusters. The default value is 200. This number will be clipped to lie within the range [1, 200].
centers (optional): An initial list of numeric values indicating the starting centers for categorization. The default value is [0.1, 0.35, 0.50, 0.70]. These should be chosen based on the range and nature of the raster values and should represent distinct categories that you want to classify the pixels into.

Returns: The function returns an array of the same shape as the input raster, with each pixel’s value replaced by the index of the cluster it belongs to. These indices correspond to the nearest center from the provided (centers) or computed list of centers.

`colorize_raster`

The purpose of this function is to apply a color map to a single-band raster based on discrete values, creating a visually interpretable RGB image. This colorization is useful for visualizing categorical or classified raster data, where each unique value in the raster can be represented by a specific color.

def colorize_raster(raster, colors, colorize_zero=False):

Parameters:

raster: A NumPy array representing a raster map with a single data band. The raster should have two dimensions or be a three-dimensional array with a single band (i.e., third dimension size is 1).
colors: A list of color names or hex color codes that correspond to the values in the raster. The color mapping will start with the first raster unique value being assigned the first color in the list, and so on.
colorize_zero: Whether to colorize pixels with 0 or leave them transparent.

Returns: This function returns a NumPy array of the same width and height as the input raster but with four channels (RGBA) representing the applied color mapping.

Process/Behavior:

The function begins by ensuring the input raster is squeezed to remove any singleton dimensions, particularly the third dimension if it has size 1 (e.g., from (1, height, width) to (height, width)).
It generates a mapping dictionary color_mapping, which correlates each raster value to a corresponding color from the colors list.
An empty RGBA image, colored_image, is created with the same height and width as the input raster and four channels initialised to zeros. This will hold the colorized output.
It iterates through color_mapping, applying the RGBA color (converted from the color name or hex code) to each pixel in colored_image where the raster value matches the mapping key.
The colorized RGBA image is returned, suitable for visualization or further image processing.

Example:

single_band_raster = np.array([
    [1, 2, 1],
    [3, 1, 2],
    [2, 3, 1]
])
colors = ['darkgreen', 'red', 'blue']
colorized_image = colorize_raster(single_band_raster, colors=colors)

This would colorize values of 1 in darkgreen, 2 in red, and 3 in blue in the output colorized_image.

`convert_array_to_vector`

This function is designed to convert a binary mask array into a vector file format (shapefile), typically used for geographic information systems (GIS). This operation can be particularly useful for representing areas of interest (such as regions segmented by image classification algorithms) as geometric shapes.

def convert_array_to_vector(mask: np.array, img_path: str) -> gpd.GeoDataFrame:

Parameters:

mask: A NumPy array representing a binary mask. The mask array should have the same dimensions as the raster file pointed to by img_path and contours the features to be converted to vector format.
img_path: A string that specifies the file path of the raster image file used for generating the binary mask. This raster file’s geographic transformation and coordinate reference system information are used to spatially align the generated vector shapes correctly.

Returns: The function yields a GeoPandas DataFrame (GeoDataFrame) which is akin to a shapefile. The GeoDataFrame consists of polygon geometries representing the areas where the binary mask is present, preserving the spatial properties and attributes necessary for GIS operations.

`mask_raster`

This function is designed to mask a raster dataset with the geometries provided in a GeoDataFrame (gdf). This operation will “crop” the raster to the extent of the geometries, which can be useful for focusing on specific areas within the raster data, such as isolating particular areas of interest (AOI).

def mask_raster(raster_name, gdf, storage):

Parameters:

raster_name: A string representing the name or identifier of the raster dataset that will be masked.
gdf: A GeoDataFrame containing geometries that will be used to mask the raster dataset. These geometries define the spatial extent to which the raster will be cropped.
storage: An spai object Storage representing the storage system where the raster data resides. This object should have a read method capable of loading the raster dataset given its name or identifier.

Raises:

Exception: If the raster named raster_name cannot be found within the storage, an exception is raised indicating the raster was not found.

Returns: The function returns a tuple containing two elements:

A masked raster array where pixels outside the geometries in the gdf are set to nodata values.
A set of transformation metadata which includes affine transform parameters that relate pixel coordinates in the raster array to geographic coordinates in the spatial reference system of the original raster.

`normalised_difference`

This function calculates the normalized difference between two specified bands within a given raster dataset. It is commonly used to compute vegetation indices like NDVI (Normalized Difference Vegetation Index) or other similar indices, depending on the bands selected.

def normalised_difference(raster, bands=[1, 2]):

Parameters:

raster: A multiband raster dataset loaded into a NumPy array. The raster bands are expected to be arranged along the first array dimension.
bands (optional): A list of two integers indicating the band numbers to be used for calculating the normalized difference. The default band numbers are [1, 2]. Band numbers are assumed to be 1-indexed (i.e., the first band is 1, not 0).

Returns: A NumPy array representing the normalized difference between the two selected bands. The result has the same spatial dimensions as the input bands.

`px_count`

The function px_count is designed to count the occurrences of each distinct pixel value in a raster image and optionally filter these counts based on specified values.

def px_count(raster, values=None):

Parameters:

raster: A raster image represented as a 2D NumPy array. The raster is composed of pixels, each holding an integer value that typically corresponds to a certain class, feature, or measurement.
values (optional): An array or list of pixel values for which the user wishes to obtain counts. If None, the function should count all unique pixel values. If an empty list is provided, the function should count the number of non-zero unique pixel values.

Returns: A NumPy array containing the counts of pixel values. If values is None, the array includes the counts of all pixel values followed by the total count of pixels. If values is an empty list, the array includes the counts of pixel values followed by the count of non-zero unique pixel values. Otherwise, it returns the counts for the specified values followed by their total count.

The read_raster function is designed to read and extract band information from a raster file, utilizing a specific storage system interface. The primary intention of the function is to handle satellite imagery or similar raster datasets, allowing users to retrieve the data associated with one or more specified spectral bands.

def read_raster(image_name, storage, bands=None):

Parameters:

image_name: A string representing the name or identifier of the raster file to be read from the storage.
storage: An object representing the storage system interface which has the capability to read the raster data files. This could be a file system, database, or cloud storage system that contains the raster imagery.
bands (optional): A list of integers indicating the band numbers that should be read from the raster file. The default value is None.

Returns: The function returns a tuple with two elements:

ds: The dataset object that contains metadata and structures to interact with the entire raster file.
raster: A data array representing the pixel values of the selected bands extracted from the raster file.

The save_table function is designed to save a data row into a table within a specified storage system. It ensures the columns provided match the data array’s length, checks if the table already exists, and either appends new data to it or creates a new table if it doesn’t exist. The function supports tables with a pd.DatetimeIndex and can accommodate time-stamped data entries.

def save_table(data, columns, table_name, date, storage):

Parameters:

data: An array or list of data values that corresponds to the columns provided. This data will be added as a new row in the table.
columns: A list of column names for the table that must match the structure of the data provided. The length of columns must equal the length of the data array.
table_name: The name of the table where the data row will be saved. This table will either be updated or created within the storage system.
date: The date associated with the data row. This will form the index of the data row if the table is created or will be used for updating the existing table’s index.
storage: The storage object that contains methods to manage the tables (list, read, and create). It represents the database or file system where the table is stored.

Returns: The save_table function returns a pandas DataFrame that contains the updated table after the new data row has been inserted.

Troubleshooting

If you encounter any issues during the installation of SPAI, please get in touch with use through our Discord server.