# Pytorch tensor operations

This post covers some of the key operations used in pytorch

# argmax

Returns the indices of the maximum value of all elements in the input tensor.

This is the second value returned by torch.max()

a = torch.randn(4, 3)
a
>>
tensor([[ 2.0149, 1.0420, -1.3816],
[-1.0265, -0.5212, -0.7570],
[-0.5141, 0.5674, 0.1039],
[-0.1549, -0.3003, -0.1086]])
torch.argmax(a)
>>
tensor(0)b = torch.randn(4)
b
>>
tensor([0.6022, 1.1465, 0.3250, 1.0555])torch.argmax(b)
>>
tensor(1)

# max

torch.max(input)

Returns the maximum value of all elements in the input tensor.

torch.max(input, dim, keepdim=False, out=None)

Returns a namedtuple (values, indices) where values is the maximum value of each row of the input tensor in the given dimension dim. And indices is the index location of each maximum value found (argmax).

If keepdim is True, the output tensors are of the same size as input except in the dimension dim where they are of size 1. Otherwise, dim is squeezed (see torch.squeeze()), resulting in the output tensors having 1 fewer dimension than input.

torch.max(a)
>>
tensor(2.0149)

dim 0 -> max over all rows (i.e. for each column):

torch.max(a, 0)
>>
torch.return_types.max(
values=tensor([2.0149, 1.0420, 0.1039]),
indices=tensor([0, 0, 2]))

dim 1 -> max over all columns (i.e. for each row)

torch.max(a, 1)
>>
torch.return_types.max(
values=tensor([ 2.0149, -0.5212, 0.5674, -0.1086]),
indices=tensor([0, 1, 1, 2]))

To be continued — up to view

# view

Returns a new tensor with the same data as the self tensor but of a different shape.

The returned tensor shares the same data and must have the same number of elements, but may have a different size. For a tensor to be viewed, the new view size must be compatible with its original size and stride, i.e., each new view dimension must either be a subspace of an original dimension, or only span across original dimensions d,d+1,…,d+kd, d+1, \dots, d+kd,d+1,…,d+k that satisfy the following contiguity-like condition that ∀i=0,…,k−1\forall i = 0, \dots, k-1∀i=0,…,k−1

stride[i]=stride[i+1]×size[i+1]\text{stride}[i] = \text{stride}[i+1] \times \text{size}[i+1]stride[i]=stride[i+1]×size[i+1]

Otherwise, contiguous() needs to be called before the tensor can be viewed. See also: reshape(), which returns a view if the shapes are compatible, and copies (equivalent to calling contiguous()) otherwise.

x = torch.randn(4, 3)x.size()>> torch.Size([4, 3])y = x.view(12)y.size()>> torch.Size([12])

Without a -1 need to get dimension correct

z = x.view(-1, 2)  # the size -1 is inferred from other dimensionsz.size()>> torch.Size([6, 2])w = x.view(6, -1)  # the size -1 is inferred from other dimensionsw.size()>> torch.Size([6, 2])

# sum

## torch.sum(input, dtype=None)

Returns the sum of all elements in the input tensor.

## torch.sum(input, dim, keepdim=False, dtype=None)

Returns the sum of each row of the input tensor in the given dimension dim. If dim is a list of dimensions, reduce over all of them.

If keepdim is True, the output tensor is of the same size as input except in the dimension(s) dim where it is of size 1. Otherwise, dim is squeezed (see torch.squeeze()), resulting in the output tensor having 1 (or len(dim)) fewer dimension(s).

a = torch.randn(1, 3)a>> tensor([[ 1.7224, -1.3243,  0.3586]])torch.sum(a)>> tensor(0.7567)a = torch.randn(4, 3)a>> tensor([[-1.1065,  0.5816,  1.1932],
[ 0.3565, 1.9991, 0.2112],
[ 0.9671, -0.3203, -1.0331],
[-2.0222, -0.4018, -1.8219]])
#sum over all columns (i.e. for each row)torch.sum(a, 1)>> tensor([ 0.6683, 2.5669, -0.3863, -4.2459])

# mean

## torch.mean(input)

Returns the mean value of all elements in the input tensor.

## torch.mean(input, dim, keepdim=False, out=None)

Returns the mean value of each row of the input tensor in the given dimension dim. If dim is a list of dimensions, reduce over all of them.

If keepdim is True, the output tensor is of the same size as input except in the dimension(s) dim where it is of size 1. Otherwise, dim is squeezed (see torch.squeeze()), resulting in the output tensor having 1 (or len(dim)) fewer dimension(s).

a = torch.randn(4, 3)a>> tensor([[ 1.1041, -0.4993,  1.8628],
[-0.6035, 0.6425, -1.3106],
[-0.6543, -0.4198, -0.4286],
[ 2.0873, 0.4965, -0.7824]])
a.mean()>> tensor(0.1245)

dim = 1 -> over all columns (for each row)

torch.mean(a, 1)>> tensor([ 0.8225, -0.4239, -0.5009,  0.6004])torch.mean(a, 1, True)>> tensor([[ 0.8225],
[-0.4239],
[-0.5009],
[ 0.6004]])
torch.mean(a, 0)>> tensor([ 0.4834, 0.0549, -0.1647])a = torch.randn(4, 3, 2)a>>tensor([[[-0.5509, -0.8295],
[-0.1816, 0.8299],
[-0.7890, 0.0698]],
[[-0.3103, -1.1878],
[-1.2422, -1.8429],
[-0.8061, -0.2843]],
[[ 0.3603, -1.9474],
[-0.2442, -0.8164],
[ 1.2880, 0.1848]],
[[-0.2814, -1.2271],
[ 0.2662, 0.3517],
[ 0.0496, 0.0306]]])
b=torch.mean(a,0)b>> tensor([[-1.9556e-01, -1.2980e+00],
[-3.5046e-01, -3.6942e-01],
[-6.4377e-02, 2.2051e-04]])
b.shape>> torch.Size([3, 2])c=torch.mean(a,1)c>> tensor([[-0.5072, 0.0234],
[-0.7862, -1.1050],
[ 0.4680, -0.8597],
[ 0.0115, -0.2816]])
c.shape>> torch.Size([4, 2])

# Random

Returns a tensor filled with random numbers from a normal distribution with mean 0 and variance 1 (also called the standard normal distribution).

torch.randn(*size, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor

torch.randn(2, 3)>> tensor([[-0.5951,  1.4342, -1.2732],
[ 0.1727, 1.2753, -0.8301]])

# Flatten

Flattens a contiguous range of dims in a tensor.

torch.flatten(input, start_dim=0, end_dim=-1) → Tensor

input (Tensor) – the input tensor.    start_dim (python:int) – the first dim to flatten    end_dim (python:int) – the last dim to flattent = torch.tensor([[[1, 2],[3, 4]],[[5, 6],[7, 8]]])t.shape>> torch.Size([2, 2, 2])torch.flatten(t)>> tensor([1, 2, 3, 4, 5, 6, 7, 8])

We can use the start_dim to flatten a single dimension — hany when you have a dimension of 1

torch.flatten(t, start_dim=1)>> tensor([[1, 2, 3, 4],
[5, 6, 7, 8]])
a = torch.randn(4, 1, 3, 3)a.shape>> torch.Size([4, 1, 3, 3])b = torch.flatten(a, start_dim=1, end_dim=2)b.shape>> torch.Size([4, 3, 3])b>> tensor([[[ 1.0668, 0.6964, -2.1182],
[-1.0142, 0.5931, -1.3457],
[ 0.7723, -0.5258, 1.3341]],
[[-0.1119, 1.6734, -1.6325],
[ 0.5137, -0.7176, -0.5566],
[-0.5263, -0.3947, 1.7352]],
[[-1.3183, 1.1556, 0.5092],
[-1.2826, -0.4203, -1.0321],
[-0.3116, -0.1535, -0.6810]],
[[-0.8669, 0.4939, 1.1409],
[ 0.2214, 0.0935, -0.2618],
[ 0.4363, -0.9791, 1.2344]]])
a> >tensor([[[[ 1.0668, 0.6964, -2.1182],
[-1.0142, 0.5931, -1.3457],
[ 0.7723, -0.5258, 1.3341]]],
[[[-0.1119, 1.6734, -1.6325],
[ 0.5137, -0.7176, -0.5566],
[-0.5263, -0.3947, 1.7352]]],
[[[-1.3183, 1.1556, 0.5092],
[-1.2826, -0.4203, -1.0321],
[-0.3116, -0.1535, -0.6810]]],
[[[-0.8669, 0.4939, 1.1409],
[ 0.2214, 0.0935, -0.2618],
[ 0.4363, -0.9791, 1.2344]]]])

# Eye

Returns a 2-D tensor with ones on the diagonal and zeros elsewhere.

n (python:int) — the number of rows
m (python:int, optional) — the number of columns with default being n
out (Tensor, optional) — the output tensor.

dtype (torch.dtype, optional) — the desired data type of returned tensor. Default: if None, uses a global default (see torch.set_default_tensor_type()).

layout (torch.layout, optional) — the desired layout of returned Tensor. Default: torch.strided.

device (torch.device, optional) — the desired device of returned tensor. Default: if None, uses the current device for the default tensor type (see torch.set_default_tensor_type()). device will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.

requires_grad (bool, optional) — If autograd should record operations on the returned tensor. Default: False.

torch.eye(3)>> tensor([[1., 0., 0.],
[0., 1., 0.],
[0., 0., 1.]])

# Range

Returns a 1-D tensor of size ⌈(end−start) / step⌉ with values from the interval (start, end) taken with common difference step beginning from start.

torch.arange(start=0, end, step=1, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor

torch.arange(5)>> tensor([0, 1, 2, 3, 4])torch.arange(1, 2.5, 0.5)>> tensor([1.0000, 1.5000, 2.0000])

# Einsum

vec=torch.tensor([0, 1, 2, 3])aten=torch.tensor([[11, 12, 13, 14],[21, 22, 23, 24],[31, 32, 33, 34],[41, 42, 43, 44]])bten=torch.tensor([[1, 1, 1, 1],[2, 2, 2, 2],[3, 3, 3, 3],[4, 4, 4, 4]])

## 1) Matrix multiplication

PyTorch: torch.matmul(aten, bten) ; aten.mm(bten)

NumPy : np.einsum(“ij, jk -> ik”, arr1, arr2)

## Dot Product

torch.einsum('ij, jk -> ik', aten, bten)>> tensor([[130, 130, 130, 130],
[230, 230, 230, 230],
[330, 330, 330, 330],
[430, 430, 430, 430]])
#### Elementwise multiplicationtorch.einsum('ij, ij -> ij', aten, bten)>> tensor([[ 11, 12, 13, 14],
[ 42, 44, 46, 48],
[ 93, 96, 99, 102],
[164, 168, 172, 176]])

## 2) Extract elements along the main-diagonal

PyTorch: torch.diag(aten)

NumPy : np.einsum(“ii -> i”, arr)

torch.einsum('ii -> i', aten)>> tensor([11, 22, 33, 44])

3) Hadamard product (i.e. element-wise product of two tensors)

PyTorch: aten * bten

NumPy : np.einsum(“ij, ij -> ij”, arr1, arr2)

torch.einsum('ij, ij -> ij', aten, bten)>> tensor([[ 11,  12,  13,  14],
[ 42, 44, 46, 48],
[ 93, 96, 99, 102],
[164, 168, 172, 176]])

4) Element-wise squaring

PyTorch: aten ** 2

NumPy : np.einsum(“ij, ij -> ij”, arr, arr)

torch.einsum('ij, ij -> ij', aten, aten)>> tensor([[ 121,  144,  169,  196],
[ 441, 484, 529, 576],
[ 961, 1024, 1089, 1156],
[1681, 1764, 1849, 1936]])

General: Element-wise nth power can be implemented by repeating the subscript string and tensor n times. For e.g., computing element-wise 4th power of a tensor can be done using:

torch.einsum('ij, ij, ij, ij -> ij', aten, aten, aten, aten)>> tensor([[  14641,   20736,   28561,   38416],
[ 194481, 234256, 279841, 331776],
[ 923521, 1048576, 1185921, 1336336],
[2825761, 3111696, 3418801, 3748096]])

5) Trace (i.e. sum of main-diagonal elements)

PyTorch: torch.trace(aten) NumPy einsum: np.einsum(“ii -> “, arr)

torch.einsum('ii -> ', aten)>> tensor(110)

6) Matrix transpose

PyTorch: torch.transpose(aten, 1, 0)

NumPy einsum: np.einsum(“ij -> ji”, arr)

torch.einsum('ij -> ji', aten)>> tensor([[11, 21, 31, 41],
[12, 22, 32, 42],
[13, 23, 33, 43],
[14, 24, 34, 44]])

7) Outer Product (of vectors)

PyTorch: torch.ger(vec, vec)

NumPy einsum: np.einsum(“i, j -> ij”, vec, vec)

torch.einsum('i, j -> ij', vec, vec)>> tensor([[0, 0, 0, 0],
[0, 1, 2, 3],
[0, 2, 4, 6],
[0, 3, 6, 9]])

8) Inner Product (of vectors) PyTorch: torch.dot(vec1, vec2)

NumPy einsum: np.einsum(“i, i -> “, vec1, vec2)

torch.einsum('i, i -> ', vec, vec)>> tensor(14)

9) Sum along axis 0

PyTorch: torch.sum(aten, 0) NumPy einsum: np.einsum(“ij -> j”, arr)

torch.einsum('ij -> j', aten)>> tensor([104, 108, 112, 116])

10) Sum along axis 1

PyTorch: torch.sum(aten, 1)

NumPy einsum: np.einsum(“ij -> i”, arr)

torch.einsum('ij -> i', aten)>> tensor([ 50,  90, 130, 170])

11) Batch Matrix Multiplication

PyTorch: torch.bmm(batch_tensor_1, batch_tensor_2)

NumPy : np.einsum(“bij, bjk -> bik”, batch_tensor_1, batch_tensor_2)

batch_tensor_1 = torch.arange(2 * 4 * 3).reshape(2, 4, 3)batch_tensor_2 = torch.arange(2 * 3 * 4).reshape(2, 3, 4)torch.bmm(batch_tensor_1, batch_tensor_2)>> tensor([[[  20,   23,   26,   29],
[ 56, 68, 80, 92],
[ 92, 113, 134, 155],
[ 128, 158, 188, 218]],
[[ 632, 671, 710, 749],
[ 776, 824, 872, 920],
[ 920, 977, 1034, 1091],
[1064, 1130, 1196, 1262]]])
# sanity check with the shapestorch.bmm(batch_tensor_1, batch_tensor_2).shape>> torch.Size([2, 4, 4])
# batch matrix multiply using einsumtorch.einsum("bij, bjk -> bik", batch_tensor_1, batch_tensor_2)>> tensor([[[ 20, 23, 26, 29],
[ 56, 68, 80, 92],
[ 92, 113, 134, 155],
[ 128, 158, 188, 218]],
[[ 632, 671, 710, 749],
[ 776, 824, 872, 920],
[ 920, 977, 1034, 1091],
[1064, 1130, 1196, 1262]]])
# sanity check with the shapestorch.einsum("bij, bjk -> bik", batch_tensor_1, batch_tensor_2).shape>> torch.Size([2, 4, 4])

12) Sum along axis 2

PyTorch: torch.sum(batch_ten, 2)

NumPy einsum: np.einsum(“ijk -> ij”, arr3D)

torch.einsum("ijk -> ij", batch_tensor_1)>> tensor([[ 3, 12, 21, 30],
[39, 48, 57, 66]])

13) Sum all the elements in an nD tensor

PyTorch: torch.sum(batch_ten)

NumPy einsum: np.einsum(“ijk -> “, arr3D)

torch.einsum("ijk -> ", batch_tensor_1)>> tensor(276)

14) Sum over multiple axes (i.e. marginalization)

PyTorch: torch.sum(arr, dim=(dim0, dim1, dim2, dim3, dim4, dim6, dim7))

NumPy: np.einsum(“ijklmnop -> n”, nDarr)

# 8D tensornDten = torch.randn((3,5,4,6,8,2,7,9))nDten.shape>> torch.Size([3, 5, 4, 6, 8, 2, 7, 9])# marginalize out dimension 5 (i.e. "n" here)esum = torch.einsum("ijklmnop -> n", nDten)esum>> tensor([-111.1110, -263.9169])# marginalize out axis 5 (i.e. sum over rest of the axes)tsum = torch.sum(nDten, dim=(0, 1, 2, 3, 4, 6, 7))torch.allclose(tsum, esum)>> False

15) Double Dot Products (same as: torch.sum(hadamard-product) cf. 3)

PyTorch: torch.sum(aten * bten)

NumPy : np.einsum(“ij, ij -> “, arr1, arr2)

torch.einsum("ij, ij -> ", aten, bten)>> tensor(1300)## Numpy Elipsisfrom numpy import arangea = arange(16).reshape(2,2,2,2)a>> array([[[[ 0,  1],
[ 2, 3]],
[[ 4, 5],
[ 6, 7]]],
[[[ 8, 9],
[10, 11]],
[[12, 13],
[14, 15]]]])
a[..., 0].flatten()>> array([ 0, 2, 4, 6, 8, 10, 12, 14])

Equivalent to:

a[:,:,:,0].flatten()>> array([ 0,  2,  4,  6,  8, 10, 12, 14])

# Expand size of tensor along non singleton dimension

eg extend dim 2

cuda0 = torch.device('cuda:0')a=torch.ones([2, 1, 2, 2]).to(cuda0)a.shape>> torch.Size([2, 1, 2, 2])a> >tensor([[[[1., 1.],
[1., 1.]]],
[[[1., 1.],
[1., 1.]]]], device='cuda:0')
out = torch.cat([a, torch.zeros(2,1,1,2).to(cuda0)], 2)out.shape>> torch.Size([2, 1, 3, 2])out> >tensor([[[[1., 1.],
[1., 1.],
[0., 0.]]],
[[[1., 1.],
[1., 1.],
[0., 0.]]]], device='cuda:0')

# Device

cuda0 = torch.device('cuda:0')a = torch.randn((2,3), device=cuda0)a.device>> device(type='cuda', index=0)b=torch.zeros(2,4).to(a.device)b.device>> device(type='cuda', index=0)

# Scatter

Writes all values from the tensor src into self at the indices specified in the index tensor. For each value in src, its output index is specified by its index in src for dimension != dim and by the corresponding value in index for dimension = dim.

For a 3-D tensor, self is updated as:

self[index[i][j][k]][j][k] = src[i][j][k]  # if dim == 0
self[i][index[i][j][k]][k] = src[i][j][k] # if dim == 1
self[i][j][index[i][j][k]] = src[i][j][k] # if dim == 2
x = torch.rand(2, 5)x>> tensor([[0.4183, 0.0121, 0.0719, 0.2705, 0.7525],
[0.1310, 0.4384, 0.3306, 0.8629, 0.6674]])
y = torch.zeros(3, 5)scatter_pattern = torch.tensor([[0, 1, 2, 0, 0], [2, 0, 0, 1, 2]])y.scatter_(0, scatter_pattern, x)>> tensor([[0.4183, 0.4384, 0.3306, 0.2705, 0.7525],
[0.0000, 0.0121, 0.0000, 0.8629, 0.0000],
[0.1310, 0.0000, 0.0719, 0.0000, 0.6674]])

The scatter says “send the elements of x to the following indices in torch.zeros, according to ROW-WISE (dim 0)”.

i.e. for each element in the original x tensor, we specify a row index (0, 1 or 2) to send it to in the tensor we are scattering into (y).

# Permute

x = torch.randn(2, 3, 5)x.size()>> torch.Size([2, 3, 5])x.permute(2, 0, 1).size()>> torch.Size([5, 2, 3])

# Cat

Concatenates the given sequence tensors in the given dimension.

## 2D:

x = torch.randint(10, size=(2,3))y = torch.randint(10, size=(2,3))x,y>> (tensor([[5, 1, 6],
[0, 9, 8]]),
tensor([[2, 6, 5],
[3, 0, 0]]))
torch.cat((x, y), 0)>> tensor([[5, 1, 6],
[0, 9, 8],
[2, 6, 5],
[3, 0, 0]])
torch.cat((x, y), 1)>> tensor([[5, 1, 6, 2, 6, 5],
[0, 9, 8, 3, 0, 0]])

## 3D:

z = torch.randint(10, size=(2,3))torch.cat((x, y, z), 0)>> tensor([[5, 1, 6],
[0, 9, 8],
[2, 6, 5],
[3, 0, 0],
[7, 0, 6],
[0, 2, 7]])
torch.cat((x, y, z), 1)>> tensor([[5, 1, 6, 2, 6, 5, 7, 0, 6],
[0, 9, 8, 3, 0, 0, 0, 2, 7]])