Cheatsheet
Default ConfigurationΒΆ
When retrieving a global config not from a file, torchmeter will initialize it using the following default configuration.
You can view it in a hierarchical way via:
# time interval in displaying profile
render_interval: 0.15 # unit: second
# Whether to fold the repeat part in rendering model structure tree
tree_fold_repeat: True
# Display settings for repeat blocks in the model structure tree
# It actually is a rich.panel.Panel object, refer to https://rich.readthedocs.io/en/latest/reference/panel.html#rich.panel.Panel
tree_repeat_block_args:
title: '[i]Repeat [[b]<repeat_time>[/b]] Times[/]' # Title of the repeat block, accept rich styling
title_align: center # Title alignment, left, center, right
subtitle: null # Subtitle of the repeat block, accept rich styling
subtitle_align: center # Subtitle alignment, left, center, right
style: dark_goldenrod # Style of the repeat block, execute `python -m rich.theme` to get more
highlight: true # Whether to highlight the value (number, string...)
box: HEAVY_EDGE # Box type, use its name directly like here!!! execute `python -m rich.box` to get more
border_style: dim # Border style, execute `python -m rich.theme` to get more
width: null # Width of the repeat block, null means auto
height: null # Height of the repeat block, null means auto
padding: # Padding of the repeat block
- 0 # top/bottom padding
- 1 # left/right padding
expand: false # Whether to expand the repeat block to full screen size
# Fine-grained display settings for each level in the model structure tree
# It actually is a rich.tree.Tree object, refer to https://rich.readthedocs.io/en/latest/reference/tree.html#rich.tree.Tree
# the `level` field is necessary!!!! It indicates that the following settings will be applied to that layer.
# level 0 indicates the root node(i.e. the model itself), level 1 indicates the first layer of model children, and so on.
tree_levels_args:
default: # Necessary!!!! Alternatively, you can set use 'all' to apply below settings to all levels
label: '[b gray35](<node_id>) [green]<name>[/green] [cyan]<type>[/]' # node represent string, accept rich styling
style: tree # Style of the node, execute `python -m rich.theme` to get more
guide_style: light_coral # Guide style of the node, execute `python -m rich.theme` to get more
highlight: true # Whether to highlight the value (number, string...)
hide_root: false # Whether to not display the node in this level
expanded: true # Whether to display the node's children
'0': # Necessary!!!! The number indicates that the following settings will be applied to that layer.
label: '[b light_coral]<name>[/]'
guide_style: light_coral
# if other settings is not specified, it will use the default settings defined by the `level default`
# Display settings for each column in the profile table
# It actually is a rich.table.Column object, refer to https://rich.readthedocs.io/en/latest/reference/table.html#rich.table.Column
table_column_args:
style: none # Style of the column, execute `python -m rich.theme` to get more
justify: center # Justify of the column, left, center, right
vertical: middle # Vertical align of the column, top, middle, bottom
overflow: fold # Overflow of the column, fold, crop, ellipsis, see https://rich.readthedocs.io/en/latest/console.html?highlight=overflow#overflow
no_wrap: false # Prevent wrapping of text within the column.
# Display settings for the profile table
# It actually is a rich.table.Table object, refer to https://rich.readthedocs.io/en/latest/reference/table.html#rich.table.Table
table_display_args:
style: spring_green4 # Style of the table, execute `python -m rich.theme` to get more
highlight: true # Whether to highlight the value (number, string...)
width: null # The width in characters of the table, or `null` to automatically fit
min_width: null # The minimum width of the table, or `null` for no minimum
expand: false # Whether to expand the table to full screen size
padding: # Padding for cells
- 0 # top/bottom padding
- 1 # left/right padding
collapse_padding: false # Whether to enable collapsing of padding around cells
pad_edge: true # Whether to enable padding of edge cells
leading: 0 # Number of blank lines between rows (precludes `show_lines` below)
title: null # Title of the table, accept rich styling
title_style: bold # Style of the title, execute `python -m rich.theme` to get more
title_justify: center # Justify of the title, left, center, right
caption: null # The table caption rendered below, accept rich styling
caption_style: null # Style of the caption, execute `python -m rich.theme` to get more
caption_justify: center # Justify of the caption, left, center, right
show_header: true # Whether to show the header row
header_style: bold # Style of the header, execute `python -m rich.theme` to get more
show_footer: false # Whether to show the footer row
footer_style: italic # Style of the footer, execute `python -m rich.theme` to get more
show_lines: false # Whether to show lines between rows
row_styles: null # Optional list of row styles, if more than one style is given then the styles will alternate
show_edge: true # Whether to show the edge of the table
box: ROUNDED # Box type, use its name directly like here!!! execute `python -m rich.box` to get more
safe_box: true # Whether to disable box characters that don't display on windows legacy terminal with *raster* fonts
border_style: null # Style of the border, execute `python -m rich.theme` to get more
# Display settings about how to combine the tree and table in the profile
combine:
horizon_gap: 2 # horizontal gap in pixel between the tree and table
Tree Level IndexΒΆ
What is the tree level index?ΒΆ
As the name implies, it is the hierarchical index of a operation tree.
Figuratively, within the model operation tree, each guide line represents a level. The level index value commences from 0 and increments from left to right.
Additionally, the level index at which a tree node is mounted is equal to len(tree_node.node_id.split('.')). For instance, the node (1.1.6.2) 1 BatchNorm2d below is mounted at level 4.
AnyNet
βββ (1) layers Sequential
β βββ (1.1) 0 BasicBlock
β β βββ (1.1.1) conv1 Conv2d
β β βββ (1.1.2) bn1 BatchNorm2d
β β βββ (1.1.3) relu ReLU
β β βββ (1.1.4) conv2 Conv2d
β β βββ (1.1.5) bn2 BatchNorm2d
β β βββ (1.1.6) downsample Sequential
β β βββ (1.1.6.1) 0 Conv2d
β β βββ (1.1.6.2) 1 BatchNorm2d
β βββ (1.2) 1 BasicBlock
β βββ (1.2.1) conv1 Conv2d
β βββ (1.2.2) bn1 BatchNorm2d
β βββ (1.2.3) relu ReLU
β βββ (1.2.4) conv2 Conv2d
β βββ (1.2.5) bn2 BatchNorm2d
βββ (2) avgpool AdaptiveAvgPool2d
βββ (3) fc Linear
β β β β
0 1 2 3 (level index, each pointing to a guide line)
How to use the tree level index?ΒΆ
A valid level index empowers you to customize the operation tree with meticulous precision.
torchmeter regards the following value as a valid tree level index:
- A non-negative integer (e.g.
0,1,2, ...): The configurations under a specific index apply only to the corresponding level. default: The configurations under this index will be applied to all undefined levels.all: The configurations under this index will override those at any other level, and will be applied with the highest priority across all levels.
Please refer to Customize the Hierarchical Display for specific usage scenarios.
Tree Node AttributesΒΆ
What is a tree node attribute?ΒΆ
-
Upon the instantiation of a
torchmeter.Meterin combination with aPytorchmodel, an automated scan of the model's architecture will be executed. Subsequently, a tree structure will be produced to depict the model. -
This tree structure is realized via
torchmeter.engine.OperationTree. In this tree, each node is an instance oftorchmeter.engine.OperationNode, which represents a layer or operation (such asnn.Conv2d,nn.ReLU, etc.) within the model. -
Therefore, the attributes of a tree node are the attributes / properties of an instance of
OperationNode.
What can a tree node attribute help me?ΒΆ
All the attributes that are available, as defined below, are intended to:
- facilitate your acquisition of supplementary information of a tree node;
- customize the display of the tree structure during the rendering procedure.
What are the available attributes of a tree node?ΒΆ
Illustrative Example
from collections import OrderedDict
import torch.nn as nn
class SimpleModel(nn.Module):
def __init__(self):
super(SimpleModel, self).__init__()
self.single_1 = nn.Linear(1, 10)
self.repeat_1x2 = nn.Sequential(OrderedDict({
"A": nn.Linear(10, 10),
"B": nn.Linear(10, 10),
}))
self.single_2 = nn.ReLU()
self.repeat_2x3 = nn.Sequential(OrderedDict({
"C": nn.Linear(10, 5),
"D": nn.ReLU(),
"E": nn.Linear(5, 10),
"F": nn.Linear(10, 5),
"G": nn.ReLU(),
"H": nn.Linear(5, 10),
"I": nn.Linear(10, 5),
"J": nn.ReLU(),
"K": nn.Linear(5, 10),
}))
self.single_3 = nn.Linear(10, 1)
Taking the above model as an example, the values of each attribute in each layer are as follows:
name , type , node_id , is_leaf , operation , module_repr
| node_id | name | type | is_leaf | operation | module_repr |
|---|---|---|---|---|---|
0 |
SimpleModel |
SimpleModel |
False |
instance created via SimpleModel() |
SimpleModel |
1 |
single_1 |
Linear |
True |
instance created via nn.Linear(1, 10) in line 8 |
Linear(in_features=1, out_features=10, bias=True) |
2 |
repeat_1x2 |
Sequential |
False |
instance created via nn.Sequential in line 10 |
Sequential |
2.1 |
A |
Linear |
True |
instance created via nn.Linear(10, 10) in line 11 |
Linear(in_features=10, out_features=10, bias=True) |
2.2 |
B |
Linear |
True |
instance created via nn.Linear(10, 10) in line 12 |
Linear(in_features=10, out_features=10, bias=True) |
3 |
single_2 |
ReLU |
True |
instance created via nn.ReLU() in line 15 |
ReLU() |
4 |
repeat_2x3 |
Sequential |
False |
instance created via nn.Sequential in line 17 |
Sequential |
4.1 |
C |
Linear |
True |
instance created via nn.Linear(10, 5) in line 18 |
Linear(in_features=10, out_features=5, bias=True) |
4.2 |
D |
ReLU |
True |
instance created via nn.ReLU() in line 19 |
ReLU() |
4.3 |
E |
Linear |
True |
instance created via nn.Linear(5, 10) in line 20 |
Linear(in_features=5, out_features=10, bias=True) |
4.4 |
F |
Linear |
True |
instance created via nn.Linear(10, 5) in line 22 |
Linear(in_features=10, out_features=5, bias=True) |
4.5 |
G |
ReLU |
True |
instance created via nn.ReLU() in line 23 |
ReLU() |
4.6 |
H |
Linear |
True |
instance created via nn.Linear(5, 10) in line 24 |
Linear(in_features=5, out_features=10, bias=True) |
4.7 |
I |
Linear |
True |
instance created via nn.Linear(10, 5) in line 26 |
Linear(in_features=10, out_features=5, bias=True) |
4.8 |
J |
ReLU |
True |
instance created via nn.ReLU() in line 27 |
ReLU() |
4.9 |
K |
Linear |
True |
instance created via nn.Linear(5, 10) in line 28 |
Linear(in_features=5, out_features=10, bias=True) |
5 |
single_3 |
Linear |
True |
instance created via nn.Linear(10, 1) in line 31 |
Linear(in_features=10, out_features=1, bias=True) |
parent & childs
Here we use the node id of the parent and childs of each node to simplify the display.
In actual uage, a node'sparentisNoneor an instance oftorchmeter.engine.OperationNode;
while thechildsis an orderdict with node id as key and the node instance as value.
| node_id | name | type | parent | childs |
|---|---|---|---|---|
0 |
SimpleModel |
SimpleModel |
None |
1 ~ 5 |
1 |
single_1 |
Linear |
0 |
|
2 |
repeat_1x2 |
Sequential |
0 |
2.1, 2.2 |
2.1 |
A |
Linear |
2 |
|
2.2 |
B |
Linear |
2 |
|
3 |
single_2 |
ReLU |
0 |
|
4 |
repeat_2x3 |
Sequential |
0 |
4.1 ~ 4.9 |
4.1 |
C |
Linear |
4 |
|
4.2 |
D |
ReLU |
4 |
|
4.3 |
E |
Linear |
4 |
|
4.4 |
F |
Linear |
4 |
|
4.5 |
G |
ReLU |
4 |
|
4.6 |
H |
Linear |
4 |
|
4.7 |
I |
Linear |
4 |
|
4.8 |
J |
ReLU |
4 |
|
4.9 |
K |
Linear |
4 |
|
5 |
single_3 |
Linear |
0 |
repeat_winsz & repeat_time
| node_id | name | type | repeat_winsz | repeat_time | explanation |
|---|---|---|---|---|---|
0 |
SimpleModel |
SimpleModel |
1 |
1 |
no repeatition |
1 |
single_1 |
Linear |
1 |
1 |
no repeatition |
2 |
repeat_1x2 |
Sequential |
1 |
1 |
no repeatition |
2.1 |
A |
Linear |
1 | 2 | Repeating windows cover 2.1 and 2.2. The two layers have the same definition, so it can be considered that one module is repeated twice |
2.2 |
B |
Linear |
1 |
1 |
Have been included in a repeating window. So skip repetitiveness analysis and use default values. |
3 |
single_2 |
ReLU |
1 |
1 |
no repeatition |
4 |
repeat_2x3 |
Sequential |
1 |
1 |
no repeatition |
4.1 |
C |
Linear |
3 | 3 | Repeating windows taking 4.1 ~ 4.3 as a whole and cover 4.1 ~ 4.9. |
4.2 |
D |
ReLU |
1 |
1 |
Have been included in a repeating window. So skip repetitiveness analysis and use default values. |
4.3 |
E |
Linear |
1 |
1 |
Have been included in a repeating window. So skip repetitiveness analysis and use default values. |
4.4 |
F |
Linear |
1 |
1 |
Have been included in a repeating window. So skip repetitiveness analysis and use default values. |
4.5 |
G |
ReLU |
1 |
1 |
Have been included in a repeating window. So skip repetitiveness analysis and use default values. |
4.6 |
H |
Linear |
1 |
1 |
Have been included in a repeating window. So skip repetitiveness analysis and use default values. |
4.7 |
I |
Linear |
1 |
1 |
Have been included in a repeating window. So skip repetitiveness analysis and use default values. |
4.8 |
J |
ReLU |
1 |
1 |
Have been included in a repeating window. So skip repetitiveness analysis and use default values. |
4.9 |
K |
Linear |
1 |
1 |
Have been included in a repeating window. So skip repetitiveness analysis and use default values. |
5 |
single_3 |
Linear |
1 |
1 |
no repeatition |
| Attribute | Type | Explanation |
|---|---|---|
operation |
torch.nn.Module |
The underlying pytorch module |
type |
str |
The operation type. If the operation is a pytorch module, use the name of its class |
name |
str |
The module name defined in the underlying pytorch model |
node_id |
str |
A globally unique module identifier, formatted as <parent-node-id>.<node-number-in-current-level>. The index commences from 1, cause the root is denoted as 0 |
is_leaf |
bool |
Whether the node is a leaf node (no child nodes) |
module_repr |
str |
The text representation of the current operation. For non-leaf nodes, it's the ndoe type. Conversely, it is the return of __repr__() method |
parent |
torchmeter.engine.OperationNode |
The parent node of this node. Each node has only one parent |
childs |
OrderDict[str, OperationNode] |
An orderdict storing children of this node in feed-forward order. Key is node_id of child, value is the child node itself. |
repeat_winsz |
int |
The size of the repeating window for the current node. Default is 1, meaning no repetition (window has only the node itself) |
repeat_time |
int |
The number of repetitions of the window where the current module is located. Default is 1, meaning no repetition |
How to use the attributes of a tree node?ΒΆ
An attribute of a tree node can be employed as a placeholder within the value of certain configurations. This allows for the dynamic retrieval of the attribute value during the tree-rendering procedure.
The configurations/scenario supporting the tree node attribute as a placeholder are listed below.
| configuration/scenario | Default Value |
|---|---|
tree_levels_args.[level-index].label1 |
'[b gray35](<node_id>) [green]<name>[/green] [cyan]<type>[/]'2 |
tree_repeat_block_args.title |
'[i]Repeat [[b]<repeat_time>[/b]] Times[/]' |
tree_renderer.repeat_footer |
Support text and function, see Customize the footer |
Usage Example
For example, if you want to unify the titles of all repeated blocks into bold My Repeat Title, then you can do this
from rich import print
from torchmeter import Meter
from torchvision import models
resnet18 = models.resnet18()
model = Meter(resnet18)
model.tree_repeat_block_args.title = '[b]My Repeat Title[/b]' #(1)
print(model.structure)
- πββοΈ That's all, then you can see the titles in all repeat blocks have been changed
Unit ExplanationΒΆ
There are four types of units in torchmeter, listed as follows:
The raw-data tag in the subsequent content indicates that the unit marked with this tag is used in the raw data mode
Used by
param,cal
| unit | explanation | tag | example |
|---|---|---|---|
null |
Number of subjects | raw-data |
5: There are 5 semantic subjects |
K |
\(10^3\) | 5 K: There are 5,000 ... |
|
M |
\(10^6\) | 5 M: There are 5,000,000 ... |
|
G |
\(10^9\) | 5 G: There are 5,000,000,000 ... |
|
T |
\(10^{12}\) | 5 T: There are 5,000,000,000,000 ... |
Used by
mem
| unit | explanation | tag | example |
|---|---|---|---|
B |
\(2^0=1\) bytes | raw-data |
5 B: \(5 \times 1 = 5\) bytes |
KiB |
\(2^{10}=1024\) bytes | 5 KiB: \(5 \times 2^{10} = 5120\) bytes |
|
MiB |
\(2^{20}\) bytes | 5 MiB: \(5 \times 2^{20} = 5242880\) bytes |
|
GiB |
\(2^{30}\) bytes | 5 GiB: \(5 \times 2^{30} = 5368709120\) bytes |
|
TiB |
\(2^{40}\) bytes | 5 TiB: \(5 \times 2^{40} = 5497558138880\) bytes |
Used by
ittp- inference time
Why do I obtain different results with different attempts or inputs?
Don't worry, it's a normal phenomenon.
- Different results with different attempts
-
Inference latency is measured in real-time because it is related to dynamic factors such as the real-time load of the machine and the device where the model is located. Therefore, each time the
ittpattribute (i.e.,Meter(your_model).ittp) is accessed, the inference latency and throughput will be re-measured to reflect the model performance under the current working conditions. - Different results with different inputs
-
The meaning of inference latency is
the time it takes for the model to complete one forward propagation with the given input. Therefore, different inputs will bring different workloads to the model, resulting in differences in inference lantency. -
In
TorchMete, the measurement of inference latency and throughput will be based on the input received by the model in the most recent forward propagation. Hence, different input batches or different sample shape, combined with differences in machine load at different times, will lead to changes in inference latency.
It should be additionally mentioned that due to automatic device synchronization, the input will be synchronized to the device where the model is located before the forward propagation is executed, so the results obtained from two inputs of the same content on different devices will be very similar.
| unit | explanation | tag | example |
|---|---|---|---|
ns |
nanosecond | 5 ns: \(5 \times 10^{-9}\) seconds |
|
us |
microsecond | 5 us: \(5 \times 10^{-6}\) seconds |
|
ms |
millisecond | 5 ms: \(5 \times 10^{-3}\) seconds |
|
s |
second | raw-data |
5 s: \(5 \times 10^{0}\) seconds |
min |
minute | 5 min: \(5 \times 60^{1}\) seconds |
|
h |
hour | 5 h: \(5 \times 60^{2}\) seconds |
Used by
ittp- throughput
What is the meaning of Input in the table below?
Input refers to all the inputs received by the model in your last execution of forward propagation. Torchmeter will treat these inputs as a standard unit to calculate the inference latency and throughput.
To facilitate comparisons between models, we call for using the same input (such as a single sample with batch_size=1) for different models when measuring all statistics, in order to obtain more universally comparable results.
In the following example, Input in Case 1 refers to x=torch.randn(1, 3, 224, 224); y=0.1, while in Case 2, it refers to x=torch.randn(100, 3, 224, 224); y=0.1. You can see the difference between two cases from the results: the inference latency when batch_size=100 is significantly higher than that when batch_size=1.
import torch
import torch.nn as nn
from rich import print
from torchmeter import Meter
from torchvision import models
class ExampleModel(nn.Module):
def __init__(self):
super(ExampleModel, self).__init__()
self.backbone = models.resnet18()
def forward(self, x: torch.Tensor, y: int):
return self.backbone(x) + y
model = Meter(ExampleModel(), device="cuda")
# case1: batch size = 1 ------------------------------
ipt = torch.randn(1, 3, 224, 224)
model(ipt, 0.1)
print(model.ittp)
# case2: batch size = 100 ------------------------------
ipt = torch.randn(100, 3, 224, 224)
model(ipt, 0.1)
print(model.ittp)
| unit | explanation | tag | example |
|---|---|---|---|
IPS |
Input Per Second | raw-data |
5 IPS: process 5 inputs per second |
KIPS |
\(10^3\) IPS |
5 KIPS: process 5,000 inputs per second |
|
MIPS |
\(10^6\) IPS |
5 MIPS: process 5,000,000 inputs per second |
|
GIPS |
\(10^9\) IPS |
5 GIPS: process 5,000,000,000 inputs per second |
|
TIPS |
\(10^{12}\) IPS |
5 TIPS: process 5,000,000,000,000 inputs per second |
-
As for the value of
[level-index], please refer to Tree Level Index . β© -
The style markup and its abbreviation in
richis supported in writing value content. β©