Cheatsheet

Default Configuration¶

When retrieving a global config not from a file, torchmeter will initialize it using the following default configuration.
You can view it in a hierarchical way via:

Python

from torchmeter import get_config

cfg = get_config()
print(cfg)

YAML

# time interval in displaying profile
render_interval: 0.15         # unit: second

# Whether to fold the repeat part in rendering model structure tree
tree_fold_repeat: True 

# Display settings for repeat blocks in the model structure tree
# It actually is a rich.panel.Panel object, refer to https://rich.readthedocs.io/en/latest/reference/panel.html#rich.panel.Panel 
tree_repeat_block_args:
  title: '[i]Repeat [[b]<repeat_time>[/b]] Times[/]' # Title of the repeat block, accept rich styling
  title_align: center         # Title alignment, left, center, right
  subtitle: null              # Subtitle of the repeat block, accept rich styling
  subtitle_align: center      # Subtitle alignment, left, center, right

  style: dark_goldenrod   # Style of the repeat block, execute `python -m rich.theme` to get more
  highlight: true             # Whether to highlight the value (number, string...)
  box: HEAVY_EDGE             # Box type, use its name directly like here!!! execute `python -m rich.box` to get more
  border_style: dim           # Border style, execute `python -m rich.theme` to get more

  width: null                 # Width of the repeat block, null means auto
  height: null                # Height of the repeat block, null means auto
  padding:                    # Padding of the repeat block
    - 0                       # top/bottom padding
    - 1                       # left/right padding
  expand: false               # Whether to expand the repeat block to full screen size


# Fine-grained display settings for each level in the model structure tree
# It actually is a rich.tree.Tree object, refer to https://rich.readthedocs.io/en/latest/reference/tree.html#rich.tree.Tree
# the `level` field is necessary!!!! It indicates that the following settings will be applied to that layer.
# level 0 indicates the root node(i.e. the model itself), level 1 indicates the first layer of model children, and so on.
tree_levels_args:
  default:   # Necessary!!!! Alternatively, you can set use 'all' to apply below settings to all levels
    label: '[b gray35](<node_id>) [green]<name>[/green] [cyan]<type>[/]' # node represent string, accept rich styling
    style: tree               # Style of the node, execute `python -m rich.theme` to get more
    guide_style: light_coral  # Guide style of the node, execute `python -m rich.theme` to get more
    highlight: true           # Whether to highlight the value (number, string...)
    hide_root: false          # Whether to not display the node in this level 
    expanded: true            # Whether to display the node's children

  '0':   # Necessary!!!! The number indicates that the following settings will be applied to that layer.   
    label: '[b light_coral]<name>[/]'
    guide_style: light_coral
    # if other settings is not specified, it will use the default settings defined by the `level default`


# Display settings for each column in the profile table
# It actually is a rich.table.Column object, refer to https://rich.readthedocs.io/en/latest/reference/table.html#rich.table.Column
table_column_args:
  style: none       # Style of the column, execute `python -m rich.theme` to get more
  justify: center   # Justify of the column, left, center, right
  vertical: middle  # Vertical align of the column, top, middle, bottom
  overflow: fold    # Overflow of the column, fold, crop, ellipsis, see https://rich.readthedocs.io/en/latest/console.html?highlight=overflow#overflow
  no_wrap: false    # Prevent wrapping of text within the column.


# Display settings for the profile table
# It actually is a rich.table.Table object, refer to https://rich.readthedocs.io/en/latest/reference/table.html#rich.table.Table
table_display_args:
  style: spring_green4        # Style of the table, execute `python -m rich.theme` to get more
  highlight: true             # Whether to highlight the value (number, string...)

  width: null                 # The width in characters of the table, or `null` to automatically fit
  min_width: null             # The minimum width of the table, or `null` for no minimum
  expand: false               # Whether to expand the table to full screen size
  padding:                    # Padding for cells 
    - 0                       # top/bottom padding
    - 1                       # left/right padding
  collapse_padding: false     # Whether to enable collapsing of padding around cells
  pad_edge: true              # Whether to enable padding of edge cells
  leading: 0                  # Number of blank lines between rows (precludes `show_lines` below)

  title: null                 # Title of the table, accept rich styling
  title_style: bold           # Style of the title, execute `python -m rich.theme` to get more
  title_justify: center       # Justify of the title, left, center, right
  caption: null               # The table caption rendered below, accept rich styling
  caption_style: null         # Style of the caption, execute `python -m rich.theme` to get more
  caption_justify: center     # Justify of the caption, left, center, right

  show_header: true           # Whether to show the header row
  header_style: bold          # Style of the header, execute `python -m rich.theme` to get more

  show_footer: false          # Whether to show the footer row
  footer_style: italic        # Style of the footer, execute `python -m rich.theme` to get more

  show_lines: false           # Whether to show lines between rows
  row_styles: null            # Optional list of row styles, if more than one style is given then the styles will alternate

  show_edge: true             # Whether to show the edge of the table
  box: ROUNDED                # Box type, use its name directly like here!!! execute `python -m rich.box` to get more
  safe_box: true              # Whether to disable box characters that don't display on windows legacy terminal with *raster* fonts
  border_style: null          # Style of the border, execute `python -m rich.theme` to get more

# Display settings about how to combine the tree and table in the profile
combine:
  horizon_gap: 2  # horizontal gap in pixel between the tree and table

Tree Level Index¶

What is the tree level index?¶

As the name implies, it is the hierarchical index of a operation tree.

Figuratively, within the model operation tree, each guide line represents a level. The level index value commences from 0 and increments from left to right.

Additionally, the level index at which a tree node is mounted is equal to len(tree_node.node_id.split('.')). For instance, the node (1.1.6.2) 1 BatchNorm2d below is mounted at level 4.

AnyNet
├── (1) layers Sequential
│   ├── (1.1) 0 BasicBlock
│   │   ├── (1.1.1) conv1 Conv2d
│   │   ├── (1.1.2) bn1 BatchNorm2d
│   │   ├── (1.1.3) relu ReLU
│   │   ├── (1.1.4) conv2 Conv2d
│   │   ├── (1.1.5) bn2 BatchNorm2d
│   │   └── (1.1.6) downsample Sequential
│   │       ├── (1.1.6.1) 0 Conv2d
│   │       └── (1.1.6.2) 1 BatchNorm2d
│   └── (1.2) 1 BasicBlock
│       ├── (1.2.1) conv1 Conv2d
│       ├── (1.2.2) bn1 BatchNorm2d
│       ├── (1.2.3) relu ReLU
│       ├── (1.2.4) conv2 Conv2d
│       └── (1.2.5) bn2 BatchNorm2d
├── (2) avgpool AdaptiveAvgPool2d
└── (3) fc Linear

↑   ↑   ↑   ↑

0   1   2   3  (level index, each pointing to a guide line)

How to use the tree level index?¶

A valid level index empowers you to customize the operation tree with meticulous precision. torchmeter regards the following value as a valid tree level index:

A non-negative integer (e.g. 0, 1, 2, ...): The configurations under a specific index apply only to the corresponding level.
default: The configurations under this index will be applied to all undefined levels.
all: The configurations under this index will override those at any other level, and will be applied with the highest priority across all levels.

Please refer to Customize the Hierarchical Display for specific usage scenarios.

Tree Node Attributes¶

What is a tree node attribute?¶

Upon the instantiation of a torchmeter.Meter in combination with a Pytorch model, an automated scan of the model's architecture will be executed. Subsequently, a tree structure will be produced to depict the model.
This tree structure is realized via torchmeter.engine.OperationTree. In this tree, each node is an instance of torchmeter.engine.OperationNode, which represents a layer or operation (such as nn.Conv2d, nn.ReLU, etc.) within the model.
Therefore, the attributes of a tree node are the attributes / properties of an instance of OperationNode.

What can a tree node attribute help me?¶

All the attributes that are available, as defined below, are intended to:

facilitate your acquisition of supplementary information of a tree node;
customize the display of the tree structure during the rendering procedure.

What are the available attributes of a tree node?¶

Illustrative Example

Python

from collections import OrderedDict
import torch.nn as nn

class SimpleModel(nn.Module):
    def __init__(self):
        super(SimpleModel, self).__init__()

        self.single_1 = nn.Linear(1, 10)

        self.repeat_1x2 = nn.Sequential(OrderedDict({
            "A": nn.Linear(10, 10),
            "B": nn.Linear(10, 10),
        }))

        self.single_2 = nn.ReLU()

        self.repeat_2x3 = nn.Sequential(OrderedDict({
            "C": nn.Linear(10, 5),
            "D": nn.ReLU(),
            "E": nn.Linear(5, 10),

            "F": nn.Linear(10, 5),
            "G": nn.ReLU(),
            "H": nn.Linear(5, 10),

            "I": nn.Linear(10, 5),
            "J": nn.ReLU(),
            "K": nn.Linear(5, 10),
        }))

        self.single_3 = nn.Linear(10, 1)

Taking the above model as an example, the values of each attribute in each layer are as follows:

name , type , node_id , is_leaf , operation , module_repr

node_id	name	type	is_leaf	operation	module_repr
`0`	`SimpleModel`	`SimpleModel`	`False`	instance created via `SimpleModel()`	`SimpleModel`
`1`	`single_1`	`Linear`	`True`	instance created via `nn.Linear(1, 10)` in line `8`	`Linear(in_features=1, out_features=10, bias=True)`
`2`	`repeat_1x2`	`Sequential`	`False`	instance created via `nn.Sequential` in line `10`	`Sequential`
`2.1`	`A`	`Linear`	`True`	instance created via `nn.Linear(10, 10)` in line `11`	`Linear(in_features=10, out_features=10, bias=True)`
`2.2`	`B`	`Linear`	`True`	instance created via `nn.Linear(10, 10)` in line `12`	`Linear(in_features=10, out_features=10, bias=True)`
`3`	`single_2`	`ReLU`	`True`	instance created via `nn.ReLU()` in line `15`	`ReLU()`
`4`	`repeat_2x3`	`Sequential`	`False`	instance created via `nn.Sequential` in line `17`	`Sequential`
`4.1`	`C`	`Linear`	`True`	instance created via `nn.Linear(10, 5)` in line `18`	`Linear(in_features=10, out_features=5, bias=True)`
`4.2`	`D`	`ReLU`	`True`	instance created via `nn.ReLU()` in line `19`	`ReLU()`
`4.3`	`E`	`Linear`	`True`	instance created via `nn.Linear(5, 10)` in line `20`	`Linear(in_features=5, out_features=10, bias=True)`
`4.4`	`F`	`Linear`	`True`	instance created via `nn.Linear(10, 5)` in line `22`	`Linear(in_features=10, out_features=5, bias=True)`
`4.5`	`G`	`ReLU`	`True`	instance created via `nn.ReLU()` in line `23`	`ReLU()`
`4.6`	`H`	`Linear`	`True`	instance created via `nn.Linear(5, 10)` in line `24`	`Linear(in_features=5, out_features=10, bias=True)`
`4.7`	`I`	`Linear`	`True`	instance created via `nn.Linear(10, 5)` in line `26`	`Linear(in_features=10, out_features=5, bias=True)`
`4.8`	`J`	`ReLU`	`True`	instance created via `nn.ReLU()` in line `27`	`ReLU()`
`4.9`	`K`	`Linear`	`True`	instance created via `nn.Linear(5, 10)` in line `28`	`Linear(in_features=5, out_features=10, bias=True)`
`5`	`single_3`	`Linear`	`True`	instance created via `nn.Linear(10, 1)` in line `31`	`Linear(in_features=10, out_features=1, bias=True)`

parent & childs

Here we use the node id of the parent and childs of each node to simplify the display.
In actual uage, a node's parent is None or an instance of torchmeter.engine.OperationNode;
while the childs is an orderdict with node id as key and the node instance as value.

node_id	name	type	parent	childs
`0`	`SimpleModel`	`SimpleModel`	`None`	`1 ~ 5`
`1`	`single_1`	`Linear`	`0`
`2`	`repeat_1x2`	`Sequential`	`0`	`2.1, 2.2`
`2.1`	`A`	`Linear`	`2`
`2.2`	`B`	`Linear`	`2`
`3`	`single_2`	`ReLU`	`0`
`4`	`repeat_2x3`	`Sequential`	`0`	`4.1 ~ 4.9`
`4.1`	`C`	`Linear`	`4`
`4.2`	`D`	`ReLU`	`4`
`4.3`	`E`	`Linear`	`4`
`4.4`	`F`	`Linear`	`4`
`4.5`	`G`	`ReLU`	`4`
`4.6`	`H`	`Linear`	`4`
`4.7`	`I`	`Linear`	`4`
`4.8`	`J`	`ReLU`	`4`
`4.9`	`K`	`Linear`	`4`
`5`	`single_3`	`Linear`	`0`

repeat_winsz & repeat_time

node_id	name	type	repeat_winsz	repeat_time	explanation
`0`	`SimpleModel`	`SimpleModel`	`1`	`1`	no repeatition
`1`	`single_1`	`Linear`	`1`	`1`	no repeatition
`2`	`repeat_1x2`	`Sequential`	`1`	`1`	no repeatition
`2.1`	`A`	`Linear`	1	2	Repeating windows cover `2.1` and `2.2`. The two layers have the same definition, so it can be considered that one module is repeated twice
`2.2`	`B`	`Linear`	`1`	`1`	Have been included in a repeating window. So skip repetitiveness analysis and use default values.
`3`	`single_2`	`ReLU`	`1`	`1`	no repeatition
`4`	`repeat_2x3`	`Sequential`	`1`	`1`	no repeatition
`4.1`	`C`	`Linear`	3	3	Repeating windows taking `4.1 ~ 4.3` as a whole and cover `4.1 ~ 4.9`.
`4.2`	`D`	`ReLU`	`1`	`1`	Have been included in a repeating window. So skip repetitiveness analysis and use default values.
`4.3`	`E`	`Linear`	`1`	`1`	Have been included in a repeating window. So skip repetitiveness analysis and use default values.
`4.4`	`F`	`Linear`	`1`	`1`	Have been included in a repeating window. So skip repetitiveness analysis and use default values.
`4.5`	`G`	`ReLU`	`1`	`1`	Have been included in a repeating window. So skip repetitiveness analysis and use default values.
`4.6`	`H`	`Linear`	`1`	`1`	Have been included in a repeating window. So skip repetitiveness analysis and use default values.
`4.7`	`I`	`Linear`	`1`	`1`	Have been included in a repeating window. So skip repetitiveness analysis and use default values.
`4.8`	`J`	`ReLU`	`1`	`1`	Have been included in a repeating window. So skip repetitiveness analysis and use default values.
`4.9`	`K`	`Linear`	`1`	`1`	Have been included in a repeating window. So skip repetitiveness analysis and use default values.
`5`	`single_3`	`Linear`	`1`	`1`	no repeatition

Attribute	Type	Explanation
`operation`	`torch.nn.Module`	The underlying pytorch module
`type`	`str`	The operation type. If the operation is a pytorch module, use the name of its class
`name`	`str`	The module name defined in the underlying pytorch model
`node_id`	`str`	A globally unique module identifier, formatted as `<parent-node-id>.<node-number-in-current-level>`. The index commences from `1`, cause the root is denoted as `0`
`is_leaf`	`bool`	Whether the node is a leaf node (no child nodes)
`module_repr`	`str`	The text representation of the current operation. For non-leaf nodes, it's the ndoe type. Conversely, it is the return of `__repr__()` method
`parent`	`torchmeter.engine.OperationNode`	The parent node of this node. Each node has only one parent
`childs`	`OrderDict[str, OperationNode]`	An orderdict storing children of this node in feed-forward order. Key is `node_id` of child, value is the child node itself.
`repeat_winsz`	`int`	The size of the repeating window for the current node. Default is 1, meaning no repetition (window has only the node itself)
`repeat_time`	`int`	The number of repetitions of the window where the current module is located. Default is 1, meaning no repetition

How to use the attributes of a tree node?¶

An attribute of a tree node can be employed as a placeholder within the value of certain configurations. This allows for the dynamic retrieval of the attribute value during the tree-rendering procedure.

The configurations/scenario supporting the tree node attribute as a placeholder are listed below.

configuration/scenario	Default Value
`tree_levels_args.[level-index].label`¹	`'[b gray35](<node_id>) [green]<name>[/green] [cyan]<type>[/]'`²
`tree_repeat_block_args.title`	`'[i]Repeat [[b]<repeat_time>[/b]] Times[/]'`
`tree_renderer.repeat_footer`	Support text and function, see Customize the footer

Usage Example

For example, if you want to unify the titles of all repeated blocks into bold My Repeat Title, then you can do this

Python

from rich import print
from torchmeter import Meter
from torchvision import models

resnet18 = models.resnet18()
model = Meter(resnet18)

model.tree_repeat_block_args.title = '[b]My Repeat Title[/b]' #(1)

print(model.structure)

🙋‍♂️ That's all, then you can see the titles in all repeat blocks have been changed

Unit Explanation¶

There are four types of units in torchmeter, listed as follows:

The raw-data tag in the subsequent content indicates that the unit marked with this tag is used in the raw data mode

Counting Units Binary Storage Units Time Units Inference Speed Units

Used by param, cal

unit	explanation	tag	example
`null`	Number of subjects	`raw-data`	`5`: There are `5` semantic subjects
`K`	\(10^3\)		`5 K`: There are `5,000` ...
`M`	\(10^6\)		`5 M`: There are `5,000,000` ...
`G`	\(10^9\)		`5 G`: There are `5,000,000,000` ...
`T`	\(10^{12}\)		`5 T`: There are `5,000,000,000,000` ...

Used by mem

unit	explanation	tag	example
`B`	\(2^0=1\) bytes	`raw-data`	`5 B`: \(5 \times 1 = 5\) bytes
`KiB`	\(2^{10}=1024\) bytes		`5 KiB`: \(5 \times 2^{10} = 5120\) bytes
`MiB`	\(2^{20}\) bytes		`5 MiB`: \(5 \times 2^{20} = 5242880\) bytes
`GiB`	\(2^{30}\) bytes		`5 GiB`: \(5 \times 2^{30} = 5368709120\) bytes
`TiB`	\(2^{40}\) bytes		`5 TiB`: \(5 \times 2^{40} = 5497558138880\) bytes

Used by ittp - inference time

Why do I obtain different results with different attempts or inputs?

Don't worry, it's a normal phenomenon.

Different results with different attempts: Inference latency is measured in real-time because it is related to dynamic factors such as the real-time load of the machine and the device where the model is located. Therefore, each time the ittp attribute (i.e., Meter(your_model).ittp) is accessed, the inference latency and throughput will be re-measured to reflect the model performance under the current working conditions.
Different results with different inputs: The meaning of inference latency is the time it takes for the model to complete one forward propagation with the given input. Therefore, different inputs will bring different workloads to the model, resulting in differences in inference lantency.; In TorchMete, the measurement of inference latency and throughput will be based on the input received by the model in the most recent forward propagation. Hence, different input batches or different sample shape, combined with differences in machine load at different times, will lead to changes in inference latency.

It should be additionally mentioned that due to automatic device synchronization, the input will be synchronized to the device where the model is located before the forward propagation is executed, so the results obtained from two inputs of the same content on different devices will be very similar.

unit	explanation	tag	example
`ns`	nanosecond		`5 ns`: \(5 \times 10^{-9}\) seconds
`us`	microsecond		`5 us`: \(5 \times 10^{-6}\) seconds
`ms`	millisecond		`5 ms`: \(5 \times 10^{-3}\) seconds
`s`	second	`raw-data`	`5 s`: \(5 \times 10^{0}\) seconds
`min`	minute		`5 min`: \(5 \times 60^{1}\) seconds
`h`	hour		`5 h`: \(5 \times 60^{2}\) seconds

Used by ittp - throughput

What is the meaning of Input in the table below?

Input refers to all the inputs received by the model in your last execution of forward propagation. Torchmeter will treat these inputs as a standard unit to calculate the inference latency and throughput.

To facilitate comparisons between models, we call for using the same input (such as a single sample with batch_size=1) for different models when measuring all statistics, in order to obtain more universally comparable results.

In the following example, Input in Case 1 refers to x=torch.randn(1, 3, 224, 224); y=0.1, while in Case 2, it refers to x=torch.randn(100, 3, 224, 224); y=0.1. You can see the difference between two cases from the results: the inference latency when batch_size=100 is significantly higher than that when batch_size=1.

Python

import torch
import torch.nn as nn
from rich import print
from torchmeter import Meter
from torchvision import models

class ExampleModel(nn.Module):
    def __init__(self):
        super(ExampleModel, self).__init__()
        self.backbone = models.resnet18()

    def forward(self, x: torch.Tensor, y: int):
        return self.backbone(x) + y

model = Meter(ExampleModel(), device="cuda")

# case1: batch size = 1 ------------------------------
ipt = torch.randn(1, 3, 224, 224)
model(ipt, 0.1)
print(model.ittp)

# case2: batch size = 100 ------------------------------
ipt = torch.randn(100, 3, 224, 224)
model(ipt, 0.1)
print(model.ittp)

Result of Case 1Result of Case 2

InferTime_Throughput_INFO
•   Operation_Id = 0
• Operation_Name = ExampleModel
• Operation_Type = ExampleModel
•     Infer_Time = 2.20 ms ± 19.53 us
•     Throughput = 454.31 IPS ± 4.03 IPS

InferTime_Throughput_INFO
•   Operation_Id = 0
• Operation_Name = ExampleModel
• Operation_Type = ExampleModel
•     Infer_Time = 11.38 ms ± 8.30 us
•     Throughput = 87.86 IPS ± 0.06

unit	explanation	tag	example
`IPS`	Input Per Second	`raw-data`	`5 IPS`: process `5` inputs per second
`KIPS`	\(10^3\) `IPS`		`5 KIPS`: process `5,000` inputs per second
`MIPS`	\(10^6\) `IPS`		`5 MIPS`: process `5,000,000` inputs per second
`GIPS`	\(10^9\) `IPS`		`5 GIPS`: process `5,000,000,000` inputs per second
`TIPS`	\(10^{12}\) `IPS`		`5 TIPS`: process `5,000,000,000,000` inputs per second

As for the value of [level-index], please refer to Tree Level Index . ↩
The style markup and its abbreviation in rich is supported in writing value content. ↩