PyTorch Autograd Profiler

2020. 2. 2. 02:16Develop

 

링크: https://pytorch.org/docs/stable/autograd.html#torch.autograd.profiler.profile

 

Automatic differentiation package - torch.autograd — PyTorch master documentation

Automatic differentiation package - torch.autograd torch.autograd provides classes and functions implementing automatic differentiation of arbitrary scalar valued functions. It requires minimal changes to the existing code - you only need to declare Tensor

pytorch.org

 

Autograd includes a profiler that lets you inspect the cost of different operators inside your model - both on the CPU and GPU. There are two modes implemented at the moment - CPU-only using profile. and nvprof based (registers both CPU and GPU activity) using emit_nvtx.

 

 

torch.autograd.profiler.profile(enabled=True, use_cuda=False, record_shapes=False)

 

Context manager that manages autograd profiler state and holds a summary of results. Under the hood it just records events of functions being executed in C++ and exposes those events to Python. You can wrap any code into it and it will only report runtime of PyTorch functions.

  • enabled (bool, optional) – Setting this to False makes this context manager a no-op. (Default: True).
  • use_cuda (bool, optional) – Enables timing of CUDA events as well using the cudaEvent API. Adds approximately 4us of overhead to each tensor operation. Default: False
  • record_shapes (bool, optional) – If shapes recording is set, information about input dimensions will be collected. This allows one to see which dimensions have been used under the hood and further group by them using prof.key_averages(group_by_input_shape=True). Please note that shape recording might skew your profiling data. It is recommended to use separate runs with and without shape recording to validate the timing. Most likely the skew will be negligible for bottom most events (in a case of nested function calls). But for higher level functions the total self cpu time might be artificially increased because of the shape collection.

 

>>> x = torch.randn((1, 1), requires_grad=True)
>>> with torch.autograd.profiler.profile() as prof:
>>>     for _ in range(100):  # any normal python code, really!
>>>         y = x ** 2
>>          y.backward()
>>> # NOTE: some columns were removed for brevity
>>> print(prof.key_averages().table(sort_by="self_cpu_time_total"))
-----------------------------------  ---------------  ---------------  ---------------
Name                                 Self CPU total   CPU time avg     Number of Calls
-----------------------------------  ---------------  ---------------  ---------------
mul                                  32.048ms         32.048ms         200
pow                                  27.041ms         27.041ms         200
PowBackward0                         9.727ms          55.483ms         100
torch::autograd::AccumulateGrad      9.148ms          9.148ms          100
torch::autograd::GraphRoot           691.816us        691.816us        100
-----------------------------------  ---------------  ---------------  ---------------

 

Useful Functions

export_chrome_trace(path): Exports an EventList as a Chrome tracing tools file. The checkpoint can be later loaded and inspected under chrome://tracing URL.
key_averages(group_by_input_shape=False): Averages all function events over their keys.
table(sort_by=None, row_limit=100, header=None): Prints an EventList as a nicely formatted table.
torch.autograd.profiler.emit_nvtx(enabled=True, record_shapes=False): https://pytorch.org/docs/stable/autograd.html#torch.autograd.profiler.emit_nvtx