Thanks for the awesome package. I have been experimenting with different neural networks architectures for document image binarization (DIBCO), and after trying multiple implementations of DIBCO metrics, yours are much faster, accelerating the training loop (using Hugging Face transformers' trainer API) by a few orders of magnitude.
Below I will leave a few open points or requests, but let me know if you want me to open separate issues:
import numpy as np
import numpy.typing as np_typing
import scipy
Bitmap = np_typing.NDArray[np.bool_]
G123_LUT = np.array([
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,
0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0,
1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0
], dtype=bool)
G123P_LUT = np.array([
0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
], dtype=bool)
def bwmorph_thin(bitmap: Bitmap, num_iters: int = -1) -> Bitmap:
"""The bwmorph thinning algorithm
For more information on the algorithm, see
https://it.mathworks.com/help/images/ref/bwmorph.html#bui7chk-1
"""
if bitmap.ndim not in [2, 3]:
raise ValueError("The bitmap must be a 2D array or a batched 3D tensor")
if not np.all(np.in1d(bitmap.flat, (0, 1))):
raise ValueError("The bitmap contains values other than 0 and 1")
if num_iters <= 0 and num_iters != -1:
raise ValueError("num_iters must be > 0 or equal to -1")
bitmap = np.array(bitmap).astype(np.uint8)
batched3d = bitmap.ndim == 3
if not batched3d:
bitmap = np.expand_dims(bitmap, 0)
# The neighborhood kernel
kernel = np.array([
[ 8, 4, 2],
[16, 0, 1],
[32, 64, 128]
], dtype=np.uint8)
finished = np.zeros(bitmap.shape[0], dtype=bool)
batch_size = bitmap.shape[0]
num_pixels_before = np.sum(bitmap, axis=(1, 2))
while num_iters != 0:
# The two subiterations
for lut in [G123_LUT, G123P_LUT]:
for idx in range(batch_size): # It is faster than the batched operation
if finished[idx]:
continue
N = scipy.ndimage.correlate(bitmap[idx], kernel, mode="constant")
D = np.take(lut, N)
bitmap[idx][D] = 0
num_pixels = np.sum(bitmap, axis=(1, 2))
finished = num_pixels == num_pixels_before
if np.all(finished):
break
num_pixels_before = num_pixels
num_iters -= 1
if not batched3d:
bitmap = np.squeeze(bitmap, axis=0)
return bitmap.astype(bool)
def pseudo_fmeasure(references: Bitmap, preds: Bitmap, eps: float = 1e-6, **kwargs) -> np_typing.NDArray[np.float_]:
"""The pseudo F-measure metric"""
neg_references = 1 - references
neg_preds = 1 - preds
skeletons = bwmorph_thin(neg_references, **kwargs).astype(np.uint8)
tpositives = neg_preds * neg_references
fpositives = neg_preds * references
num_tpositives = np.sum(tpositives, axis=(1, 2))
num_fpositives = np.sum(fpositives, axis=(1, 2))
precision = num_tpositives / (num_fpositives + num_tpositives + eps)
pseudo_tpositives = neg_preds * skeletons
pseudo_fnegatives = preds * skeletons
num_pseudo_tpositives = np.sum(pseudo_tpositives, axis=(1, 2))
num_pseudo_fnegatives = np.sum(pseudo_fnegatives, axis=(1, 2))
pseudo_recall = num_pseudo_tpositives / (num_pseudo_fnegatives + num_pseudo_tpositives + eps)
pseudo_nume = 2 * (precision * pseudo_recall)
pseudo_deno = precision + pseudo_recall + eps
pseudo_score = pseudo_nume / pseudo_deno
return pseudo_score