the img2col function in swift

Here is an implementation of img2col function in Swift:

main.swift
func img2col(_ input: [[[[Double]]]], kernelSize: Int, stride: Int) -> [[Double]] {
    let batchSize = input.count
    let inputHeight = input[0].count
    let inputWidth = input[0][0].count
    let inputChannel = input[0][0][0].count
    let outputHeight = (inputHeight - kernelSize) / stride + 1
    let outputWidth = (inputWidth - kernelSize) / stride + 1
    var output: [[Double]] = Array(repeating: Array(repeating: 0, count: kernelSize * kernelSize * inputChannel), count: batchSize * outputHeight * outputWidth)
    var outputIndex = 0
    for b in 0..<batchSize {
        for i in stride(from: 0, to: inputHeight - kernelSize + 1, by: stride) {
            for j in stride(from: 0, to: inputWidth - kernelSize + 1, by: stride) {
                for c in 0..<inputChannel {
                    for k in 0..<kernelSize {
                        for l in 0..<kernelSize {
                            output[outputIndex][c * kernelSize * kernelSize + k * kernelSize + l] = input[b][i + k][j + l][c]
                        }
                    }
                }
                outputIndex += 1
            }
        }
    }
    return output
}
1154 chars
26 lines

The function takes input as a 4D array, where the first dimension is the batch size, the second dimension is the height of the image, the third dimension is the width of the image, and the fourth dimension is the number of channels. kernelSize is the size of the kernel, and stride is the stride of the convolution.

The output of the function is a 2D array, where each row represents one patch. The number of columns is equal to kernelSize * kernelSize * inputChannel.

related categories

gistlibby LogSnag