{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Google Colab\uc5d0\uc11c \ub178\ud2b8\ubd81\uc744 \uc2e4\ud589\ud558\uc2e4 \ub54c\uc5d0\ub294 \n# http://tutorials.pytorch.kr/beginner/colab \ub97c \ucc38\uace0\ud558\uc138\uc694.\n%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "PyTorch \ubaa8\ub4c8 \ud504\ub85c\ud30c\uc77c\ub9c1\ud558\uae30\n===========================\n\n**Author:** [Suraj Subramanian](http://github.com/suraj813)\n\n: **\ubc88\uc5ed:** [\uc774\uc7ac\ubcf5](http://github.com/zzaebok)\n\nPyTorch\ub294 \ucf54\ub4dc \ub0b4\uc758 \ub2e4\uc591\ud55c Pytorch \uc5f0\uc0b0\uc5d0 \ub300\ud55c \uc2dc\uac04\uacfc \uba54\ubaa8\ub9ac \ube44\uc6a9\uc744\n\ud30c\uc545\ud558\ub294\ub370 \uc720\uc6a9\ud55c \ud504\ub85c\ud30c\uc77c\ub7ec(profiler) API\ub97c \ud3ec\ud568\ud558\uace0 \uc788\uc2b5\ub2c8\ub2e4.\n\ud504\ub85c\ud30c\uc77c\ub7ec\ub294 \ucf54\ub4dc\uc5d0 \uc27d\uac8c \ud1b5\ud569\ub420 \uc218 \uc788\uc73c\uba70, \ud504\ub85c\ud30c\uc77c\ub9c1 \uacb0\uacfc\ub294 \ud45c\ub85c\n\ucd9c\ub825\ub418\uac70\ub098 JSON \ud615\uc2dd\uc758 \ucd94\uc801(trace) \ud30c\uc77c\ub85c \ubc18\ud658\ub420 \uc218 \uc788\uc2b5\ub2c8\ub2e4.\n\n
NOTE:
\n
\n

\ud504\ub85c\ud30c\uc77c\ub7ec\ub294 \uba40\ud2f0\uc2a4\ub808\ub4dc\ud654\ub41c \ubaa8\ub378\ub4e4\uc744 \uc9c0\uc6d0\ud569\ub2c8\ub2e4.\ud504\ub85c\ud30c\uc77c\ub7ec\ub294 \uc5f0\uc0b0\uc774 \uc774\ub8e8\uc5b4\uc9c0\ub294 \uc2a4\ub808\ub4dc\uc640 \uac19\uc740 \uc2a4\ub808\ub4dc\uc5d0\uc11c \uc2e4\ud589\ub418\uc9c0\ub9cc \ub2e4\ub978 \uc2a4\ub808\ub4dc\uc5d0\uc11c \uc2e4\ud589\ub418\ub294 \uc790\uc2dd \uc5f0\uc0b0\ub610\ud55c \ud504\ub85c\ud30c\uc77c\ub9c1\ud560 \uc218 \uc788\uc2b5\ub2c8\ub2e4.\ub3d9\uc2dc\uc5d0 \uc2e4\ud589\ub418\ub294 \ud504\ub85c\ud30c\uc77c\ub7ec\ub4e4\uc740 \uacb0\uacfc\uac00 \uc11e\uc774\uc9c0 \uc54a\ub3c4\ub85d \uac01\uc790\uc758 \uc2a4\ub808\ub4dc \ubc94\uc704\uc5d0 \ud55c\uc815\ub429\ub2c8\ub2e4.

\n
\n
NOTE:
\n
\n

Pytorch 1.8\uc740 \ubbf8\ub798\uc758 \ub9b4\ub9ac\uc988\uc5d0\uc11c \uae30\uc874\uc758 \ud504\ub85c\ud30c\uc77c\ub7ec API\ub97c \ub300\uccb4\ud560 \uc0c8\ub85c\uc6b4 API\ub97c \uc18c\uac1c\ud558\uace0 \uc788\uc2b5\ub2c8\ub2e4.\uc0c8\ub85c\uc6b4 API\ub97c \uc774 \ud398\uc774\uc9c0 \uc5d0\uc11c \ud655\uc778\ud558\uc138\uc694.

\n
\n\n\ud504\ub85c\ud30c\uc77c\ub7ec API \uc0ac\uc6a9\ubc95\uc5d0 \ub300\ud574 \ube60\ub974\uac8c \uc54c\uc544\ubcf4\uace0 \uc2f6\ub2e4\uba74 [\uc774 \ub808\uc2dc\ud53c\n\ubb38\uc11c](http://tutorials.pytorch.kr/recipes/recipes/profiler_recipe.html)\n\ub97c \uc0b4\ud3b4\ubcf4\uc138\uc694.\n\n------------------------------------------------------------------------\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import torch\nimport numpy as np\nfrom torch import nn\nimport torch.autograd.profiler as profiler" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\ud504\ub85c\ud30c\uc77c\ub7ec\ub97c \uc774\uc6a9\ud558\uc5ec \uc131\ub2a5 \ub514\ubc84\uae45\ud558\uae30\n=====================================\n\n\ud504\ub85c\ud30c\uc77c\ub7ec\ub294 \ubaa8\ub378\uc5d0\uc11c \uc131\ub2a5\uc758 \ubcd1\ubaa9\uc744 \ud30c\uc545\ud560 \ub54c \uc720\uc6a9\ud560 \uc218 \uc788\uc2b5\ub2c8\ub2e4. \uc774\ubc88\n\uc608\uc81c\uc5d0\uc11c, \ub450 \uac00\uc9c0 \ud558\uc704 \uc791\uc5c5\uc744 \uc218\ud589\ud558\ub294 \uc0ac\uc6a9\uc790 \uc815\uc758 \ubaa8\ub4c8\uc744 \ub9cc\ub4e4\uaca0\uc2b5\ub2c8\ub2e4:\n\n- \uc785\ub825\uc5d0 \ub300\ud55c \uc120\ud615 \ubcc0\ud658\n- \ubcc0\ud658 \uacb0\uacfc\ub97c \uc774\uc6a9\ud55c \ub9c8\uc2a4\ud06c \ud150\uc11c(mask Tensor)\uc5d0\uc11c \uc778\ub371\uc2a4 \ucd94\ucd9c\n\n\uac01 \ud558\uc704 \uc791\uc5c5\ub4e4\uc5d0 \ub300\ud55c \ucf54\ub4dc\ub294 `profiler.record_function(\"label\")` \uc744\n\uc774\uc6a9\ud558\uc5ec \ub808\uc774\ube14\ub41c \ucee8\ud14d\uc2a4\ud2b8 \ub9e4\ub2c8\uc800(context manager) \ub4e4\uc5d0 \uc758\ud574 \uac10\uc309\ub2c8\ub2e4.\n\ud504\ub85c\ud30c\uc77c\ub7ec\uc758 \ucd9c\ub825\uc5d0\uc11c, \ud558\uc704 \uc791\uc5c5\ub4e4\uc758 \ubaa8\ub4e0 \uc5f0\uc0b0\uc5d0 \ub300\ud55c \uc9d1\uacc4(aggregate)\n\uc131\ub2a5 \uc9c0\ud45c\ub4e4\uc774 \ud574\ub2f9 \ub808\uc774\ube14 \uc544\ub798 \ub098\ud0c0\ub098\uac8c \ub429\ub2c8\ub2e4.\n\n\ud504\ub85c\ud30c\uc77c\ub7ec\ub97c \uc0ac\uc6a9\ud558\ub294 \uac83\uc740 \uc57d\uac04\uc758 \uc624\ubc84\ud5e4\ub4dc\uac00 \ubc1c\uc0dd\ud558\uba70, \ucf54\ub4dc\ub97c \ubd84\uc11d\ud560\n\ub54c\uc5d0\ub9cc \uc0ac\uc6a9\ud558\ub294 \uac83\uc774 \uac00\uc7a5 \uc88b\uc2b5\ub2c8\ub2e4. \ub9cc\uc77c \uc2e4\ud589\uc2dc\uac04\uc744 \ubca4\uce58\ub9c8\ud0b9\ud558\ub294\n\uacbd\uc6b0\uc5d0\ub294 \uc774\ub97c \uc81c\uac70\ud558\ub294 \uac83\uc744 \uc78a\uc9c0 \ub9c8\uc2ed\uc2dc\uc624.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "class MyModule(nn.Module):\n def __init__(self, in_features: int, out_features: int, bias: bool = True):\n super(MyModule, self).__init__()\n self.linear = nn.Linear(in_features, out_features, bias)\n\n def forward(self, input, mask):\n with profiler.record_function(\"LINEAR PASS\"):\n out = self.linear(input)\n\n with profiler.record_function(\"MASK INDICES\"):\n threshold = out.sum(axis=1).mean().item()\n hi_idx = np.argwhere(mask.cpu().numpy() > threshold)\n hi_idx = torch.from_numpy(hi_idx).cuda()\n\n return out, hi_idx" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\uc21c\uc804\ud30c \ub2e8\uacc4(forward pass) \ud504\ub85c\ud30c\uc77c\ub9c1\ud558\uae30\n========================================\n\n\uc785\ub825\uacfc \ub9c8\uc2a4\ud06c \ud150\uc11c, \uadf8\ub9ac\uace0 \ubaa8\ub378\uc744 \uc784\uc758\ub85c \ucd08\uae30\ud654\ud569\ub2c8\ub2e4.\n\n\ud504\ub85c\ud30c\uc77c\ub7ec\ub97c \uc2e4\ud589\ud558\uae30 \uc804, \uc815\ud655\ud55c \uc131\ub2a5 \ubca4\uce58\ub9c8\ud0b9\uc744 \ubcf4\uc7a5\ud558\uae30 \uc704\ud574 CUDA\ub97c\n\uc6cc\ubc0d\uc5c5(warm-up) \uc2dc\ud0b5\ub2c8\ub2e4. \ubaa8\ub378\uc758 \uc21c\uc804\ud30c \ub2e8\uacc4\ub97c `profiler.profile`\n\ucee8\ud14d\uc2a4\ud2b8 \ub9e4\ub2c8\uc800\ub97c \ud1b5\ud574 \uac10\uc309\ub2c8\ub2e4. `with_stack=True` \uc778\uc790\ub294 \uc5f0\uc0b0\uc758\n\ucd94\uc801(trace) \ud30c\uc77c \ub0b4\ubd80\uc5d0 \ud30c\uc77c\uacfc \uc904\ubc88\ud638\ub97c \ub367\ubd99\uc785\ub2c8\ub2e4.\n\n
WARNING:
\n
\n

with_stack=True \ub294 \ucd94\uac00\uc801\uc778 \uc624\ubc84\ud5e4\ub4dc\ub97c \ubc1c\uc0dd\uc2dc\ud0a4\uae30 \ub54c\ubb38\uc5d0 \ucf54\ub4dc\ub97c \ubd84\uc11d\ud560 \ub54c\uc5d0 \uc0ac\uc6a9\ud558\ub294 \uac83\uc774 \ubc14\ub78c\uc9c1\ud569\ub2c8\ub2e4.\uc131\ub2a5\uc744 \ubca4\uce58\ub9c8\ud0b9\ud55c\ub2e4\uba74 \uc774\ub97c \uc81c\uac70\ud558\ub294 \uac83\uc744 \uc78a\uc9c0 \ub9c8\uc2ed\uc2dc\uc624.

\n
\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "model = MyModule(500, 10).cuda()\ninput = torch.rand(128, 500).cuda()\nmask = torch.rand((500, 500, 500), dtype=torch.double).cuda()\n\n# \uc6cc\ubc0d\uc5c5(warm-up)\nmodel(input, mask)\n\nwith profiler.profile(with_stack=True, profile_memory=True) as prof:\n out, idx = model(input, mask)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\ud504\ub85c\ud30c\uc77c\ub7ec\uc758 \uacb0\uacfc \ucd9c\ub825\ud558\uae30\n==========================\n\n\ucd5c\uc885\uc801\uc73c\ub85c \ud504\ub85c\ud30c\uc77c\ub7ec\uc758 \uacb0\uacfc\ub97c \ucd9c\ub825\ud569\ub2c8\ub2e4. `profiler.key_averages` \ub294\n\uc5f0\uc0b0\uc790\uc758 \uc774\ub984\uc5d0 \ub530\ub77c \uacb0\uacfc\ub97c \uc9d1\uacc4\ud558\ub294\ub370, \uc120\ud0dd\uc801\uc73c\ub85c \uc785\ub825\uc758 shape\uacfc/\ub610\ub294\n\uc2a4\ud0dd \ucd94\uc801(stack trace) \uc774\ubca4\ud2b8\uc5d0 \ub530\ub77c\uc11c\ub3c4 \uacb0\uacfc\ub97c \uc9d1\uacc4\ud560 \uc218 \uc788\uc2b5\ub2c8\ub2e4.\n\uc785\ub825\uc758 shape\uc5d0 \ub530\ub77c\uc11c \uadf8\ub8f9\ud654 \ud558\ub294 \uac83\uc740 \uc5b4\ub5a0\ud55c shape\uc758 \ud150\uc11c\ub4e4\uc774 \ubaa8\ub378\uc5d0\n\uc758\ud574 \uc0ac\uc6a9\ub418\ub294\uc9c0 \ud30c\uc545\ud558\ub294 \ub370 \uc720\uc6a9\ud569\ub2c8\ub2e4.\n\n\uc5ec\uae30\uc11c, `group_by_stack_n=5` \ub97c \uc0ac\uc6a9\ud558\ub294\ub370 \uc774\ub294 \uc5f0\uc0b0(operation)\uacfc\ntraceback(\uac00\uc7a5 \ucd5c\uadfc 5\uac1c\uc758 \uc774\ubca4\ud2b8\uc5d0 \ub300\ud55c)\uc744 \uae30\uc900\uc73c\ub85c \uc2e4\ud589\uc2dc\uac04\uc744 \uc9d1\uacc4\ud558\ub294\n\uac83\uc774\uace0, \uc774\ubca4\ud2b8\ub4e4\uc774 \ub4f1\ub85d\ub41c \uc21c\uc11c\ub85c \uc815\ub82c\ub418\uc5b4 \ud45c\uc2dc\ub429\ub2c8\ub2e4. \uacb0\uacfc \ud45c\ub294\n`sort_by` \uc778\uc790 (\uc720\ud6a8\ud55c \uc815\ub82c \ud0a4\ub294\n[docs](http://pytorch.org/docs/stable/autograd.html#profiler) \uc5d0\uc11c\n\ud655\uc778\ud558\uc138\uc694) \ub97c \ub118\uaca8\uc90c\uc73c\ub85c\uc368 \uc815\ub82c\ub420 \uc218 \uc788\uc2b5\ub2c8\ub2e4.\n\n
NOTE:
\n
\n

notebook\uc5d0\uc11c \ud504\ub85c\ud30c\uc77c\ub7ec\ub97c \uc2e4\ud589\ud560 \ub54c \uc2a4\ud0dd \ucd94\uc801(stacktrace)\uc5d0\uc11c \ud30c\uc77c\uba85 \ub300\uc2e0<ipython-input-18-193a910735e8>(13): forward \uc640 \uac19\uc740 \ud56d\ubaa9\uc744 \ubcfc \uc218 \uc788\uc2b5\ub2c8\ub2e4.\uc774\ub294 <notebook-cell>(line number): calling-function \uc758 \ud615\uc2dd\uc5d0 \ub300\uc751\ub429\ub2c8\ub2e4.

\n
\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "print(prof.key_averages(group_by_stack_n=5).table(sort_by='self_cpu_time_total', row_limit=5))\n\n\"\"\"\n(Some columns are omitted)\n\n------------- ------------ ------------ ------------ ---------------------------------\n Name Self CPU % Self CPU Self CPU Mem Source Location\n------------- ------------ ------------ ------------ ---------------------------------\n MASK INDICES 87.88% 5.212s -953.67 Mb /mnt/xarfuse/.../torch/au\n (10): forward\n /mnt/xarfuse/.../torch/nn\n (9): \n /mnt/xarfuse/.../IPython/\n\n aten::copy_ 12.07% 715.848ms 0 b (12): forward\n /mnt/xarfuse/.../torch/nn\n (9): \n /mnt/xarfuse/.../IPython/\n /mnt/xarfuse/.../IPython/\n\n LINEAR PASS 0.01% 350.151us -20 b /mnt/xarfuse/.../torch/au\n (7): forward\n /mnt/xarfuse/.../torch/nn\n (9): \n /mnt/xarfuse/.../IPython/\n\n aten::addmm 0.00% 293.342us 0 b /mnt/xarfuse/.../torch/nn\n /mnt/xarfuse/.../torch/nn\n /mnt/xarfuse/.../torch/nn\n (8): forward\n /mnt/xarfuse/.../torch/nn\n\n aten::mean 0.00% 235.095us 0 b (11): forward\n /mnt/xarfuse/.../torch/nn\n (9): \n /mnt/xarfuse/.../IPython/\n /mnt/xarfuse/.../IPython/\n\n----------------------------- ------------ ---------- ----------------------------------\nSelf CPU time total: 5.931s\n\n\"\"\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\uba54\ubaa8\ub9ac \uc131\ub2a5 \ud5a5\uc0c1\uc2dc\ud0a4\uae30\n======================\n\n\uba54\ubaa8\ub9ac\uc640 \uc2dc\uac04 \uce21\uba74\uc5d0\uc11c \uac00\uc7a5 \ube44\uc6a9\uc774 \ud070 \uc5f0\uc0b0\uc740 MASK INDICES \ub0b4\n`forward(10)` \uc5f0\uc0b0\uc785\ub2c8\ub2e4. \uba3c\uc800 \uba54\ubaa8\ub9ac \uc18c\ubaa8 \ubb38\uc81c\ub97c \ud574\uacb0\ud574\ubd05\uc2dc\ub2e4. 12\ubc88\uc9f8\n\uc904\uc758 `.to()` \uc5f0\uc0b0\uc740 953.67 Mb\ub97c \uc18c\ubaa8\ud558\ub294 \uac83\uc744 \ud655\uc778\ud560 \uc218 \uc788\uc2b5\ub2c8\ub2e4. \uc774\n\uc5f0\uc0b0\uc740 `mask` \ub97c CPU\uc5d0 \ubcf5\uc0ac\ud569\ub2c8\ub2e4. `mask` \ub294 `torch.double` \ub370\uc774\ud130\n\ud0c0\uc785\uc73c\ub85c \ucd08\uae30\ud654\ub429\ub2c8\ub2e4. \uc774\ub97c `torch.float` \uc73c\ub85c \ubcc0\ud658\ud558\uc5ec \uba54\ubaa8\ub9ac \uc0ac\uc6a9\ub7c9\uc744\n\uc904\uc77c \uc218 \uc788\uc744\uae4c\uc694?\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "model = MyModule(500, 10).cuda()\ninput = torch.rand(128, 500).cuda()\nmask = torch.rand((500, 500, 500), dtype=torch.float).cuda()\n\n# \uc6cc\ubc0d\uc5c5(warm-up)\nmodel(input, mask)\n\nwith profiler.profile(with_stack=True, profile_memory=True) as prof:\n out, idx = model(input, mask)\n\nprint(prof.key_averages(group_by_stack_n=5).table(sort_by='self_cpu_time_total', row_limit=5))\n\n\"\"\"\n(Some columns are omitted)\n\n----------------- ------------ ------------ ------------ --------------------------------\n Name Self CPU % Self CPU Self CPU Mem Source Location\n----------------- ------------ ------------ ------------ --------------------------------\n MASK INDICES 93.61% 5.006s -476.84 Mb /mnt/xarfuse/.../torch/au\n (10): forward\n /mnt/xarfuse/ /torch/nn\n (9): \n /mnt/xarfuse/.../IPython/\n\n aten::copy_ 6.34% 338.759ms 0 b (12): forward\n /mnt/xarfuse/.../torch/nn\n (9): \n /mnt/xarfuse/.../IPython/\n /mnt/xarfuse/.../IPython/\n\n aten::as_strided 0.01% 281.808us 0 b (11): forward\n /mnt/xarfuse/.../torch/nn\n (9): \n /mnt/xarfuse/.../IPython/\n /mnt/xarfuse/.../IPython/\n\n aten::addmm 0.01% 275.721us 0 b /mnt/xarfuse/.../torch/nn\n /mnt/xarfuse/.../torch/nn\n /mnt/xarfuse/.../torch/nn\n (8): forward\n /mnt/xarfuse/.../torch/nn\n\n aten::_local 0.01% 268.650us 0 b (11): forward\n _scalar_dense /mnt/xarfuse/.../torch/nn\n (9): \n /mnt/xarfuse/.../IPython/\n /mnt/xarfuse/.../IPython/\n\n----------------- ------------ ------------ ------------ --------------------------------\nSelf CPU time total: 5.347s\n\n\"\"\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\uc774 \uc5f0\uc0b0\uc744 \uc704\ud55c CPU \uba54\ubaa8\ub9ac \uc0ac\uc6a9\ub7c9\uc774 \uc808\ubc18\uc73c\ub85c \uc904\uc5c8\uc2b5\ub2c8\ub2e4.\n\n\uc2dc\uac04 \uc131\ub2a5 \ud5a5\uc0c1\uc2dc\ud0a4\uae30\n====================\n\n\uc18c\ubaa8\ub41c \uc2dc\uac04\uc774 \uc870\uae08 \uc904\uae34 \ud588\uc9c0\ub9cc, \uc774\ub294 \uc544\uc9c1\ub3c4 \ub108\ubb34 \ub192\uc740 \uc218\uce58\uc785\ub2c8\ub2e4. CUDA\n\uc5d0\uc11c CPU \ub85c \ud589\ub82c\uc744 \ubcf5\uc0ac\ud558\ub294 \uac83\uc774 \uaf64 \ube44\uc6a9\uc774 \ud070 \uc5f0\uc0b0\uc778 \uac83\uc774 \ubc1d\ud600\uc84c\uc2b5\ub2c8\ub2e4.\n`forward(12)` \uc758 `aten::copy_` \uc5f0\uc0b0\uc740 `mask` \ub97c CPU\uc5d0 \ubcf5\uc0ac\ud558\uc5ec NumPy \uc758\n`argwhere` \ud568\uc218\ub97c \uc0ac\uc6a9\ud560 \uc218 \uc788\uac8c \ud569\ub2c8\ub2e4. `forward(13)` \uc758 `aten::copy_`\n\ub294 \ubc30\uc5f4\uc744 \ub2e4\uc2dc \ud150\uc11c\ub85c CUDA\uc5d0 \ubcf5\uc0ac\ud569\ub2c8\ub2e4. \uc774\uacf3\uc5d0\uc11c `torch` \ud568\uc218\n`nonzero()` \ub97c \ub300\uc2e0 \uc0ac\uc6a9\ud55c\ub2e4\uba74 \ub450 \uc5f0\uc0b0\uc744 \ubaa8\ub450 \uc81c\uac70\ud560 \uc218 \uc788\uc2b5\ub2c8\ub2e4.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "class MyModule(nn.Module):\n def __init__(self, in_features: int, out_features: int, bias: bool = True):\n super(MyModule, self).__init__()\n self.linear = nn.Linear(in_features, out_features, bias)\n\n def forward(self, input, mask):\n with profiler.record_function(\"LINEAR PASS\"):\n out = self.linear(input)\n\n with profiler.record_function(\"MASK INDICES\"):\n threshold = out.sum(axis=1).mean()\n hi_idx = (mask > threshold).nonzero(as_tuple=True)\n\n return out, hi_idx\n\n\nmodel = MyModule(500, 10).cuda()\ninput = torch.rand(128, 500).cuda()\nmask = torch.rand((500, 500, 500), dtype=torch.float).cuda()\n\n# \uc6cc\ubc0d\uc5c5(warm-up)\nmodel(input, mask)\n\nwith profiler.profile(with_stack=True, profile_memory=True) as prof:\n out, idx = model(input, mask)\n\nprint(prof.key_averages(group_by_stack_n=5).table(sort_by='self_cpu_time_total', row_limit=5))\n\n\"\"\"\n(Some columns are omitted)\n\n-------------- ------------ ------------ ------------ ---------------------------------\n Name Self CPU % Self CPU Self CPU Mem Source Location\n-------------- ------------ ------------ ------------ ---------------------------------\n aten::gt 57.17% 129.089ms 0 b (12): forward\n /mnt/xarfuse/.../torch/nn\n (25): \n /mnt/xarfuse/.../IPython/\n /mnt/xarfuse/.../IPython/\n\n aten::nonzero 37.38% 84.402ms 0 b (12): forward\n /mnt/xarfuse/.../torch/nn\n (25): \n /mnt/xarfuse/.../IPython/\n /mnt/xarfuse/.../IPython/\n\n INDEX SCORE 3.32% 7.491ms -119.21 Mb /mnt/xarfuse/.../torch/au\n (10): forward\n /mnt/xarfuse/.../torch/nn\n (25): \n /mnt/xarfuse/.../IPython/\n\naten::as_strided 0.20% 441.587us 0 b (12): forward\n /mnt/xarfuse/.../torch/nn\n (25): \n /mnt/xarfuse/.../IPython/\n /mnt/xarfuse/.../IPython/\n\n aten::nonzero\n _numpy 0.18% 395.602us 0 b (12): forward\n /mnt/xarfuse/.../torch/nn\n (25): \n /mnt/xarfuse/.../IPython/\n /mnt/xarfuse/.../IPython/\n-------------- ------------ ------------ ------------ ---------------------------------\nSelf CPU time total: 225.801ms\n\n\"\"\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\ub354 \uc77d\uc744\uac70\ub9ac\n===========\n\nPyTorch \ubaa8\ub378\uc5d0\uc11c \uc2dc\uac04\uacfc \uba54\ubaa8\ub9ac \ubcd1\ubaa9\uc744 \ubd84\uc11d\ud558\uae30 \uc704\ud574 \ud504\ub85c\ud30c\uc77c\ub7ec\uac00 \uc5b4\ub5bb\uac8c\n\uc0ac\uc6a9\ub420 \uc218 \uc788\ub294\uc9c0\ub97c \uc0b4\ud3b4\ubcf4\uc558\uc2b5\ub2c8\ub2e4. \uc544\ub798\uc5d0 \ud504\ub85c\ud30c\uc77c\ub7ec\uc5d0 \ub300\ud55c \uc77d\uc744\uac70\ub9ac\uac00\n\ub354 \uc788\uc2b5\ub2c8\ub2e4:\n\n- [\ud504\ub85c\ud30c\uc77c\ub7ec \uc0ac\uc6a9\n \ub808\uc2dc\ud53c](http://tutorials.pytorch.kr/recipes/recipes/profiler_recipe.html)\n- [Profiling RPC-Based\n Workloads](http://tutorials.pytorch.kr/recipes/distributed_rpc_profiling.html)\n- [Profiler API\n Docs](http://pytorch.org/docs/stable/autograd.html?highlight=profiler#profiler)\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.14" } }, "nbformat": 4, "nbformat_minor": 0 }