PyTorch Autograd Profiler
2020. 2. 2. 02:16ㆍDevelop
링크: https://pytorch.org/docs/stable/autograd.html#torch.autograd.profiler.profile
Autograd includes a profiler that lets you inspect the cost of different operators inside your model - both on the CPU and GPU. There are two modes implemented at the moment - CPU-only using profile
. and nvprof
based (registers both CPU and GPU activity) using emit_nvtx
.
torch.autograd.profiler.profile(enabled=True, use_cuda=False, record_shapes=False)
Context manager that manages autograd profiler state and holds a summary of results. Under the hood it just records events of functions being executed in C++ and exposes those events to Python. You can wrap any code into it and it will only report runtime of PyTorch functions.
- enabled (bool, optional) – Setting this to
False
makes this context manager a no-op. (Default:True
). - use_cuda (bool, optional) – Enables timing of CUDA events as well using the cudaEvent API. Adds approximately 4us of overhead to each tensor operation. Default:
False
- record_shapes (bool, optional) – If shapes recording is set, information about input dimensions will be collected. This allows one to see which dimensions have been used under the hood and further group by them using
prof.key_averages(group_by_input_shape=True)
. Please note that shape recording might skew your profiling data. It is recommended to use separate runs with and without shape recording to validate the timing. Most likely the skew will be negligible for bottom most events (in a case of nested function calls). But for higher level functions the total self cpu time might be artificially increased because of the shape collection.
>>> x = torch.randn((1, 1), requires_grad=True)
>>> with torch.autograd.profiler.profile() as prof:
>>> for _ in range(100): # any normal python code, really!
>>> y = x ** 2
>> y.backward()
>>> # NOTE: some columns were removed for brevity
>>> print(prof.key_averages().table(sort_by="self_cpu_time_total"))
----------------------------------- --------------- --------------- ---------------
Name Self CPU total CPU time avg Number of Calls
----------------------------------- --------------- --------------- ---------------
mul 32.048ms 32.048ms 200
pow 27.041ms 27.041ms 200
PowBackward0 9.727ms 55.483ms 100
torch::autograd::AccumulateGrad 9.148ms 9.148ms 100
torch::autograd::GraphRoot 691.816us 691.816us 100
----------------------------------- --------------- --------------- ---------------
Useful Functions
export_chrome_trace(path)
: Exports an EventList as a Chrome tracing tools file. The checkpoint can be later loaded and inspected underchrome://tracing
URL.
key_averages(group_by_input_shape=False)
: Averages all function events over their keys.
table(sort_by=None, row_limit=100, header=None)
: Prints an EventList as a nicely formatted table.
torch.autograd.profiler.emit_nvtx(enabled=True, record_shapes=False)
: https://pytorch.org/docs/stable/autograd.html#torch.autograd.profiler.emit_nvtx
'Develop' 카테고리의 다른 글
Root 권한 없이 local에 CUDA 설치하기 (1) | 2020.04.06 |
---|---|
ctypes를 통해 c++ array를 python list로 변환하기 (4) | 2020.03.01 |
PyTorch를 ONNX로 export하기 (2) | 2020.02.08 |
[Linux/Mac] vim을 IDE처럼! zsh 설정부터 vim 플러그인 설정까지 총 정리 (2) | 2020.02.07 |
[Pytorch Error] RuntimeError: Given input size: (256x1x1). Calculated output size: (256x0x0). Output size is too small (0) | 2020.01.21 |