Detach Pytorch

  • It is mainly used to create tensors. The created tensor storage is shared with another tensor with no involvement of grad.
  • Which in return, returns a new tensor which does not have any attachment to the present gradients.
  • Here there is no need for a gradient which in result to not having any type of gradients.
  • Here the output does not have any attachment, thus resulting in no gradients. Working of detach
  • Let us consider a program where it is used and not used.
x=torch.ones(20, requires_grad=True)
y=x**4
z=x**6
i=(y+z).sum()
i.backward()
# printing the grad
print(x.grad)
  • Here the value of b is equal to a power of 4. Therefore iequals to a power of 4 plus a power of 6( x^4 + x^6 ).
  • The derivative of the above is 4 in(to 2 power of 3 plus 6 into 2 power of 5(4*2^3 + 6*2^5), whose value equals 224.
  • The output will be it produces a vector with 20 elements whose value is 224.

Let us consider another example where we use detach:

x=torch.ones(20, requires_grad=True)
y=z**3
z=a.detach()**6
i=(y+z).sum()
i.backward()
  • Here we can see that value of c is detached from the graph, and thevalue of c is not calculated.
  • Therefore, the derivative will be 3 into a power of 2,for which the value is 12.
  • The output will be it produces a vector with 20 elements whose value is 12.

Let us consider another program where detach is used:

a = torch.arange(5., requires_grad=True)
b = a**2
c = a.detach()
c.zero_()
b.sum().backward()
print(a.grad)
  • From the above program, we will see an error since the data is not correct. If we delete the method 0.zero, the gradient value is printed.
  • The detach method is not used to create a method directly but in the code the tensor is modified. The tensor is updated. I detach commands.
  • Using the detach, the gradients are not allowed to share the data since it is blocked. But no copies are created using the detach method.
  • In the computational graph the detach is used when the tensor is not needed.

Detach Method in Pytorch

  • The pytorch needs to track all information related to tensors which help to compute the gradients.
  • When the gradients are not required, the detach method is used to create a view of the same which is in the form of graphs.
  • The graphs do not have records thatinvolve the results since the tracking operation will be deleted from the graph.
  • The torchviz package is to be used to see how the gradient is computed with the given tensor.
x=T.ones(5, requires_grad=True)
y=a**4
z=a**6
j = (y+z).sum()
make_dot( j).render(“ attached", format="jpg")
  • Here the operations will not be tracked.
y=x**4
z=x.detach()**6
j=(y+z).sum()
make_dot(i).render("detached", format="jpg")
  • The program can not track c**6. This is the working of detach method in python.

Example of detach in pytorch:

import torch
def storagespace(x,y):
if x.storage().data_ptr()==y.storage().data_ptr():
print("it is the same storage space")
else:
print("it is different storage space")
a = torch.ones((4,5), requires_grad=True)
print(p)
b = a
c= p.data
d = p.detach()
e = p.data.clone()
f = p.clone()
g = p.detach().clone()
h = torch.empty_like(p).copy_(a)
k = torch.tensor(a)
  • To copy the contructs we can use sourceTensor.clone( ).detach( ).

Program:

print("a",end='');samestorage(a,a)
print("b:",end='');samestorage(a,b)
print("c:",end='');samestorage(a,c)
print("d:",end='');samestorage(a,d)
print("e:",end='');samestorage(a,e)
print("f:",end='');samestorage(a,f)
print("g:",end='');samestorage(a,g)
print("h:",end='');samestorage(a,h)
  • The output will be it shows whether it is the same or different.
  • We can add methods to code the pytorch has 100 constructors.
import torch
import perfplot
perfplot.show(
setup=lambda l: torch.randn(l),
kernels=[
lambda a: a.new_tensor(a),
lambda a: a.clone().detach(),
lambda a: torch.empty_like(a).copy_(a),
lambda a: torch.tensor(a),
lambda a: a.detach().clone(),
],
labels=["new_tensor()", "clone().detach()", "empty_like().copy()", "tensor()", "detach().clone()"],
l_range=[3 ** i for i in range(30)],
alabel="len(a)",
loga=False,
logb=False,
title='Comparison for timing related to PyTorch tensor,
)
  • We should not use the clone method since the gradient will be propagated to the closed tensor.

Conclusion

  • A clone should be used with the detach when we want to copy a tensor and detach from the computational graph.
  • We should be aware of the process of detaching the computational graph since the code is not complicated always.