It seems that torch.kron is just a subset of ConvTranspose2d with the kernel size equal to the stride. I did a simple benchmark for the following tensor: torch.kron took 41 us, while ConvTranspose2d took 26 us. torch.kron could be replaced with ConvTranspose2d.

import torch
import torch.nn.functional as F

k_s = (5, 5)
x_s = (1024, 1024)

weights = torch.randn(k_s)
inputs = torch.randn(x_s)

out_from_kron = torch.kron(inputs, weights)

out_from_convtranspose2d = F.conv_transpose2d(inputs.reshape(1, 1, *x_s), weights.reshape(1, 1, *k_s), stride=k_s)

print(torch.allclose(out_from_kron, out_from_convtranspose2d))