This post covers some of the key operations used in pytorch
argmax
Returns the indices of the maximum value of all elements in the input tensor.
This is the second value returned by torch.max()
a = torch.randn(4, 3)
a
>>tensor([[ 2.0149, 1.0420, -1.3816],
[-1.0265, -0.5212, -0.7570],
[-0.5141, 0.5674, 0.1039],
[-0.1549, -0.3003, -0.1086]])torch.argmax(a)
>>tensor(0)b = torch.randn(4)
b
>>tensor([0.6022, 1.1465, 0.3250, 1.0555])torch.argmax(b)
>>tensor(1)
max
torch.max(input)
Returns the maximum value of all elements in the input tensor.
torch.max(input, dim, keepdim=False, out=None)
Returns a namedtuple (values, indices) where values is the maximum value of each row of the input tensor in the given dimension dim. And indices is the index location of each maximum value found (argmax).
If keepdim is True, the output tensors are of the same size as input except in the dimension dim where they are of size 1. Otherwise, dim is squeezed (see torch.squeeze()), resulting in the output tensors having 1 fewer dimension than input.
torch.max(a)
>>tensor(2.0149)
dim 0 -> max over all rows (i.e. for each column):
torch.max(a, 0)
>>torch.return_types.max(
values=tensor([2.0149, 1.0420, 0.1039]),
indices=tensor([0, 0, 2]))
dim 1 -> max over all columns (i.e. for each row)
torch.max(a, 1)
>>torch.return_types.max(
values=tensor([ 2.0149, -0.5212, 0.5674, -0.1086]),
indices=tensor([0, 1, 1, 2]))
To be continued — up to view
view
Returns a new tensor with the same data as the self tensor but of a different shape.
The returned tensor shares the same data and must have the same number of elements, but may have a different size. For a tensor to be viewed, the new view size must be compatible with its original size and stride, i.e., each new view dimension must either be a subspace of an original dimension, or only span across original dimensions d,d+1,…,d+kd, d+1, \dots, d+kd,d+1,…,d+k that satisfy the following contiguity-like condition that ∀i=0,…,k−1\forall i = 0, \dots, k-1∀i=0,…,k−1
stride[i]=stride[i+1]×size[i+1]\text{stride}[i] = \text{stride}[i+1] \times \text{size}[i+1]stride[i]=stride[i+1]×size[i+1]
Otherwise, contiguous() needs to be called before the tensor can be viewed. See also: reshape(), which returns a view if the shapes are compatible, and copies (equivalent to calling contiguous()) otherwise.
x = torch.randn(4, 3)x.size()>> torch.Size([4, 3])y = x.view(12)y.size()>> torch.Size([12])
Without a -1 need to get dimension correct
z = x.view(-1, 2) # the size -1 is inferred from other dimensionsz.size()>> torch.Size([6, 2])w = x.view(6, -1) # the size -1 is inferred from other dimensionsw.size()>> torch.Size([6, 2])
sum
torch.sum(input, dtype=None)
Returns the sum of all elements in the input tensor.
torch.sum(input, dim, keepdim=False, dtype=None)
Returns the sum of each row of the input tensor in the given dimension dim. If dim is a list of dimensions, reduce over all of them.
If keepdim is True, the output tensor is of the same size as input except in the dimension(s) dim where it is of size 1. Otherwise, dim is squeezed (see torch.squeeze()), resulting in the output tensor having 1 (or len(dim)) fewer dimension(s).
a = torch.randn(1, 3)a>> tensor([[ 1.7224, -1.3243, 0.3586]])torch.sum(a)>> tensor(0.7567)a = torch.randn(4, 3)a>> tensor([[-1.1065, 0.5816, 1.1932],
[ 0.3565, 1.9991, 0.2112],
[ 0.9671, -0.3203, -1.0331],
[-2.0222, -0.4018, -1.8219]])#sum over all columns (i.e. for each row)torch.sum(a, 1)>> tensor([ 0.6683, 2.5669, -0.3863, -4.2459])
mean
torch.mean(input)
Returns the mean value of all elements in the input tensor.
torch.mean(input, dim, keepdim=False, out=None)
Returns the mean value of each row of the input tensor in the given dimension dim. If dim is a list of dimensions, reduce over all of them.
If keepdim is True, the output tensor is of the same size as input except in the dimension(s) dim where it is of size 1. Otherwise, dim is squeezed (see torch.squeeze()), resulting in the output tensor having 1 (or len(dim)) fewer dimension(s).
a = torch.randn(4, 3)a>> tensor([[ 1.1041, -0.4993, 1.8628],
[-0.6035, 0.6425, -1.3106],
[-0.6543, -0.4198, -0.4286],
[ 2.0873, 0.4965, -0.7824]])a.mean()>> tensor(0.1245)
dim = 1 -> over all columns (for each row)
torch.mean(a, 1)>> tensor([ 0.8225, -0.4239, -0.5009, 0.6004])torch.mean(a, 1, True)>> tensor([[ 0.8225],
[-0.4239],
[-0.5009],
[ 0.6004]])torch.mean(a, 0)>> tensor([ 0.4834, 0.0549, -0.1647])a = torch.randn(4, 3, 2)a>>tensor([[[-0.5509, -0.8295],
[-0.1816, 0.8299],
[-0.7890, 0.0698]], [[-0.3103, -1.1878],
[-1.2422, -1.8429],
[-0.8061, -0.2843]], [[ 0.3603, -1.9474],
[-0.2442, -0.8164],
[ 1.2880, 0.1848]], [[-0.2814, -1.2271],
[ 0.2662, 0.3517],
[ 0.0496, 0.0306]]])b=torch.mean(a,0)b>> tensor([[-1.9556e-01, -1.2980e+00],
[-3.5046e-01, -3.6942e-01],
[-6.4377e-02, 2.2051e-04]])b.shape>> torch.Size([3, 2])c=torch.mean(a,1)c>> tensor([[-0.5072, 0.0234],
[-0.7862, -1.1050],
[ 0.4680, -0.8597],
[ 0.0115, -0.2816]])c.shape>> torch.Size([4, 2])
Random
Returns a tensor filled with random numbers from a normal distribution with mean 0 and variance 1 (also called the standard normal distribution).
torch.randn(*size, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor
torch.randn(2, 3)>> tensor([[-0.5951, 1.4342, -1.2732],
[ 0.1727, 1.2753, -0.8301]])
Flatten
Flattens a contiguous range of dims in a tensor.
torch.flatten(input, start_dim=0, end_dim=-1) → Tensor
input (Tensor) – the input tensor. start_dim (python:int) – the first dim to flatten end_dim (python:int) – the last dim to flattent = torch.tensor([[[1, 2],[3, 4]],[[5, 6],[7, 8]]])t.shape>> torch.Size([2, 2, 2])torch.flatten(t)>> tensor([1, 2, 3, 4, 5, 6, 7, 8])
We can use the start_dim to flatten a single dimension — hany when you have a dimension of 1
torch.flatten(t, start_dim=1)>> tensor([[1, 2, 3, 4],
[5, 6, 7, 8]])a = torch.randn(4, 1, 3, 3)a.shape>> torch.Size([4, 1, 3, 3])b = torch.flatten(a, start_dim=1, end_dim=2)b.shape>> torch.Size([4, 3, 3])b>> tensor([[[ 1.0668, 0.6964, -2.1182],
[-1.0142, 0.5931, -1.3457],
[ 0.7723, -0.5258, 1.3341]], [[-0.1119, 1.6734, -1.6325],
[ 0.5137, -0.7176, -0.5566],
[-0.5263, -0.3947, 1.7352]], [[-1.3183, 1.1556, 0.5092],
[-1.2826, -0.4203, -1.0321],
[-0.3116, -0.1535, -0.6810]], [[-0.8669, 0.4939, 1.1409],
[ 0.2214, 0.0935, -0.2618],
[ 0.4363, -0.9791, 1.2344]]])a> >tensor([[[[ 1.0668, 0.6964, -2.1182],
[-1.0142, 0.5931, -1.3457],
[ 0.7723, -0.5258, 1.3341]]], [[[-0.1119, 1.6734, -1.6325],
[ 0.5137, -0.7176, -0.5566],
[-0.5263, -0.3947, 1.7352]]],
[[[-1.3183, 1.1556, 0.5092],
[-1.2826, -0.4203, -1.0321],
[-0.3116, -0.1535, -0.6810]]],
[[[-0.8669, 0.4939, 1.1409],
[ 0.2214, 0.0935, -0.2618],
[ 0.4363, -0.9791, 1.2344]]]])
Eye
Returns a 2-D tensor with ones on the diagonal and zeros elsewhere.
n (python:int) — the number of rows
m (python:int, optional) — the number of columns with default being n
out (Tensor, optional) — the output tensor.
dtype (torch.dtype, optional) — the desired data type of returned tensor. Default: if None, uses a global default (see torch.set_default_tensor_type()).
layout (torch.layout, optional) — the desired layout of returned Tensor. Default: torch.strided.
device (torch.device, optional) — the desired device of returned tensor. Default: if None, uses the current device for the default tensor type (see torch.set_default_tensor_type()). device will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.
requires_grad (bool, optional) — If autograd should record operations on the returned tensor. Default: False.
torch.eye(3)>> tensor([[1., 0., 0.],
[0., 1., 0.],
[0., 0., 1.]])
Range
Returns a 1-D tensor of size ⌈(end−start) / step⌉ with values from the interval (start, end) taken with common difference step beginning from start.
torch.arange(start=0, end, step=1, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor
torch.arange(5)>> tensor([0, 1, 2, 3, 4])torch.arange(1, 2.5, 0.5)>> tensor([1.0000, 1.5000, 2.0000])
Einsum
after https://stackoverflow.com/questions/55894693/understanding-pytorch-einsum
vec=torch.tensor([0, 1, 2, 3])aten=torch.tensor([[11, 12, 13, 14],[21, 22, 23, 24],[31, 32, 33, 34],[41, 42, 43, 44]])bten=torch.tensor([[1, 1, 1, 1],[2, 2, 2, 2],[3, 3, 3, 3],[4, 4, 4, 4]])
1) Matrix multiplication
PyTorch: torch.matmul(aten, bten) ; aten.mm(bten)
NumPy : np.einsum(“ij, jk -> ik”, arr1, arr2)
Dot Product
torch.einsum('ij, jk -> ik', aten, bten)>> tensor([[130, 130, 130, 130],
[230, 230, 230, 230],
[330, 330, 330, 330],
[430, 430, 430, 430]])#### Elementwise multiplicationtorch.einsum('ij, ij -> ij', aten, bten)>> tensor([[ 11, 12, 13, 14],
[ 42, 44, 46, 48],
[ 93, 96, 99, 102],
[164, 168, 172, 176]])
2) Extract elements along the main-diagonal
PyTorch: torch.diag(aten)
NumPy : np.einsum(“ii -> i”, arr)
torch.einsum('ii -> i', aten)>> tensor([11, 22, 33, 44])
3) Hadamard product (i.e. element-wise product of two tensors)
PyTorch: aten * bten
NumPy : np.einsum(“ij, ij -> ij”, arr1, arr2)
torch.einsum('ij, ij -> ij', aten, bten)>> tensor([[ 11, 12, 13, 14],
[ 42, 44, 46, 48],
[ 93, 96, 99, 102],
[164, 168, 172, 176]])
4) Element-wise squaring
PyTorch: aten ** 2
NumPy : np.einsum(“ij, ij -> ij”, arr, arr)
torch.einsum('ij, ij -> ij', aten, aten)>> tensor([[ 121, 144, 169, 196],
[ 441, 484, 529, 576],
[ 961, 1024, 1089, 1156],
[1681, 1764, 1849, 1936]])
General: Element-wise nth power can be implemented by repeating the subscript string and tensor n times. For e.g., computing element-wise 4th power of a tensor can be done using:
torch.einsum('ij, ij, ij, ij -> ij', aten, aten, aten, aten)>> tensor([[ 14641, 20736, 28561, 38416],
[ 194481, 234256, 279841, 331776],
[ 923521, 1048576, 1185921, 1336336],
[2825761, 3111696, 3418801, 3748096]])
5) Trace (i.e. sum of main-diagonal elements)
PyTorch: torch.trace(aten) NumPy einsum: np.einsum(“ii -> “, arr)
torch.einsum('ii -> ', aten)>> tensor(110)
6) Matrix transpose
PyTorch: torch.transpose(aten, 1, 0)
NumPy einsum: np.einsum(“ij -> ji”, arr)
torch.einsum('ij -> ji', aten)>> tensor([[11, 21, 31, 41],
[12, 22, 32, 42],
[13, 23, 33, 43],
[14, 24, 34, 44]])
7) Outer Product (of vectors)
PyTorch: torch.ger(vec, vec)
NumPy einsum: np.einsum(“i, j -> ij”, vec, vec)
torch.einsum('i, j -> ij', vec, vec)>> tensor([[0, 0, 0, 0],
[0, 1, 2, 3],
[0, 2, 4, 6],
[0, 3, 6, 9]])
8) Inner Product (of vectors) PyTorch: torch.dot(vec1, vec2)
NumPy einsum: np.einsum(“i, i -> “, vec1, vec2)
torch.einsum('i, i -> ', vec, vec)>> tensor(14)
9) Sum along axis 0
PyTorch: torch.sum(aten, 0) NumPy einsum: np.einsum(“ij -> j”, arr)
torch.einsum('ij -> j', aten)>> tensor([104, 108, 112, 116])
10) Sum along axis 1
PyTorch: torch.sum(aten, 1)
NumPy einsum: np.einsum(“ij -> i”, arr)
torch.einsum('ij -> i', aten)>> tensor([ 50, 90, 130, 170])
11) Batch Matrix Multiplication
PyTorch: torch.bmm(batch_tensor_1, batch_tensor_2)
NumPy : np.einsum(“bij, bjk -> bik”, batch_tensor_1, batch_tensor_2)
batch_tensor_1 = torch.arange(2 * 4 * 3).reshape(2, 4, 3)batch_tensor_2 = torch.arange(2 * 3 * 4).reshape(2, 3, 4)torch.bmm(batch_tensor_1, batch_tensor_2)>> tensor([[[ 20, 23, 26, 29],
[ 56, 68, 80, 92],
[ 92, 113, 134, 155],
[ 128, 158, 188, 218]], [[ 632, 671, 710, 749],
[ 776, 824, 872, 920],
[ 920, 977, 1034, 1091],
[1064, 1130, 1196, 1262]]])# sanity check with the shapestorch.bmm(batch_tensor_1, batch_tensor_2).shape>> torch.Size([2, 4, 4])
# batch matrix multiply using einsumtorch.einsum("bij, bjk -> bik", batch_tensor_1, batch_tensor_2)>> tensor([[[ 20, 23, 26, 29],
[ 56, 68, 80, 92],
[ 92, 113, 134, 155],
[ 128, 158, 188, 218]], [[ 632, 671, 710, 749],
[ 776, 824, 872, 920],
[ 920, 977, 1034, 1091],
[1064, 1130, 1196, 1262]]])# sanity check with the shapestorch.einsum("bij, bjk -> bik", batch_tensor_1, batch_tensor_2).shape>> torch.Size([2, 4, 4])
12) Sum along axis 2
PyTorch: torch.sum(batch_ten, 2)
NumPy einsum: np.einsum(“ijk -> ij”, arr3D)
torch.einsum("ijk -> ij", batch_tensor_1)>> tensor([[ 3, 12, 21, 30],
[39, 48, 57, 66]])
13) Sum all the elements in an nD tensor
PyTorch: torch.sum(batch_ten)
NumPy einsum: np.einsum(“ijk -> “, arr3D)
torch.einsum("ijk -> ", batch_tensor_1)>> tensor(276)
14) Sum over multiple axes (i.e. marginalization)
PyTorch: torch.sum(arr, dim=(dim0, dim1, dim2, dim3, dim4, dim6, dim7))
NumPy: np.einsum(“ijklmnop -> n”, nDarr)
# 8D tensornDten = torch.randn((3,5,4,6,8,2,7,9))nDten.shape>> torch.Size([3, 5, 4, 6, 8, 2, 7, 9])# marginalize out dimension 5 (i.e. "n" here)esum = torch.einsum("ijklmnop -> n", nDten)esum>> tensor([-111.1110, -263.9169])# marginalize out axis 5 (i.e. sum over rest of the axes)tsum = torch.sum(nDten, dim=(0, 1, 2, 3, 4, 6, 7))torch.allclose(tsum, esum)>> False
15) Double Dot Products (same as: torch.sum(hadamard-product) cf. 3)
PyTorch: torch.sum(aten * bten)
NumPy : np.einsum(“ij, ij -> “, arr1, arr2)
torch.einsum("ij, ij -> ", aten, bten)>> tensor(1300)## Numpy Elipsisfrom numpy import arangea = arange(16).reshape(2,2,2,2)a>> array([[[[ 0, 1],
[ 2, 3]], [[ 4, 5],
[ 6, 7]]],
[[[ 8, 9],
[10, 11]], [[12, 13],
[14, 15]]]])a[..., 0].flatten()>> array([ 0, 2, 4, 6, 8, 10, 12, 14])
Equivalent to:
a[:,:,:,0].flatten()>> array([ 0, 2, 4, 6, 8, 10, 12, 14])
Expand size of tensor along non singleton dimension
eg extend dim 2
cuda0 = torch.device('cuda:0')a=torch.ones([2, 1, 2, 2]).to(cuda0)a.shape>> torch.Size([2, 1, 2, 2])a> >tensor([[[[1., 1.],
[1., 1.]]], [[[1., 1.],
[1., 1.]]]], device='cuda:0')out = torch.cat([a, torch.zeros(2,1,1,2).to(cuda0)], 2)out.shape>> torch.Size([2, 1, 3, 2])out> >tensor([[[[1., 1.],
[1., 1.],
[0., 0.]]], [[[1., 1.],
[1., 1.],
[0., 0.]]]], device='cuda:0')
Device
cuda0 = torch.device('cuda:0')a = torch.randn((2,3), device=cuda0)a.device>> device(type='cuda', index=0)b=torch.zeros(2,4).to(a.device)b.device>> device(type='cuda', index=0)
Scatter
Writes all values from the tensor src into self at the indices specified in the index tensor. For each value in src, its output index is specified by its index in src for dimension != dim and by the corresponding value in index for dimension = dim.
For a 3-D tensor, self is updated as:
self[index[i][j][k]][j][k] = src[i][j][k] # if dim == 0
self[i][index[i][j][k]][k] = src[i][j][k] # if dim == 1
self[i][j][index[i][j][k]] = src[i][j][k] # if dim == 2x = torch.rand(2, 5)x>> tensor([[0.4183, 0.0121, 0.0719, 0.2705, 0.7525],
[0.1310, 0.4384, 0.3306, 0.8629, 0.6674]])y = torch.zeros(3, 5)scatter_pattern = torch.tensor([[0, 1, 2, 0, 0], [2, 0, 0, 1, 2]])y.scatter_(0, scatter_pattern, x)>> tensor([[0.4183, 0.4384, 0.3306, 0.2705, 0.7525],
[0.0000, 0.0121, 0.0000, 0.8629, 0.0000],
[0.1310, 0.0000, 0.0719, 0.0000, 0.6674]])
The scatter says “send the elements of x to the following indices in torch.zeros, according to ROW-WISE (dim 0)”.
i.e. for each element in the original x tensor, we specify a row index (0, 1 or 2) to send it to in the tensor we are scattering into (y).
Permute
x = torch.randn(2, 3, 5)x.size()>> torch.Size([2, 3, 5])x.permute(2, 0, 1).size()>> torch.Size([5, 2, 3])
Cat
Concatenates the given sequence tensors in the given dimension.
2D:
x = torch.randint(10, size=(2,3))y = torch.randint(10, size=(2,3))x,y>> (tensor([[5, 1, 6],
[0, 9, 8]]),
tensor([[2, 6, 5],
[3, 0, 0]]))torch.cat((x, y), 0)>> tensor([[5, 1, 6],
[0, 9, 8],
[2, 6, 5],
[3, 0, 0]])torch.cat((x, y), 1)>> tensor([[5, 1, 6, 2, 6, 5],
[0, 9, 8, 3, 0, 0]])
3D:
z = torch.randint(10, size=(2,3))torch.cat((x, y, z), 0)>> tensor([[5, 1, 6],
[0, 9, 8],
[2, 6, 5],
[3, 0, 0],
[7, 0, 6],
[0, 2, 7]])torch.cat((x, y, z), 1)>> tensor([[5, 1, 6, 2, 6, 5, 7, 0, 6],
[0, 9, 8, 3, 0, 0, 0, 2, 7]])
link to github code here