Intros

🚀 𝒀𝒐𝒖𝒓 𝑨𝒍𝒍-𝒊𝒏-𝑶𝒏𝒆 𝑻𝒐𝒐𝒍 𝒇𝒐𝒓 𝑷𝒚𝒕𝒐𝒓𝒄𝒉 𝑴𝒐𝒅𝒆𝒍 𝑨𝒏𝒂𝒍𝒚𝒔𝒊𝒔 🚀

Repo: https://github.com/TorchMeter/torchmeter
Intro: Provides comprehensive measurement of Pytorch model's Parameters, FLOPs/MACs, Memory-Cost, Inference-Time and Throughput with highly customizable result display ✨

𝒜. 𝐻𝒾𝑔𝒽𝓁𝒾𝑔𝒽𝓉𝓈¶

𝚉𝚎𝚛𝚘-𝙸𝚗𝚝𝚛𝚞𝚜𝚒𝚘𝚗 𝙿𝚛𝚘𝚡𝚢

Acts as drop-in decorator without any changes of the underlying model
Seamlessly integrates with Pytorch modules while preserving full compatibility (attributes and methods)

𝙵𝚞𝚕𝚕-𝚂𝚝𝚊𝚌𝚔 𝙼𝚘𝚍𝚎𝚕 𝙰𝚗𝚊𝚕𝚢𝚝𝚒𝚌𝚜

Holistic performance analytics across 5 dimensions:

Parameter Analysis
- Total/trainable parameter quantification
- Layer-wise parameter distribution analysis
- Gradient state tracking (requires_grad flags)
Computational Profiling
- FLOPs/MACs precision calculation
- Operation-wise calculation distribution analysis
- Dynamic input/output detection (number, type, shape, ...)
Memory Diagnostics
- Input/output tensor memory awareness
- Hierarchical memory consumption analysis
Inference latency & 5. Throughput benchmarking
- Auto warm-up phase execution (eliminates cold-start bias)
- Device-specific high-precision timing
- Inference latency & Throughput Benchmarking

𝚁𝚒𝚌𝚑 𝚅𝚒𝚜𝚞𝚊𝚕𝚒𝚣𝚊𝚝𝚒𝚘𝚗

Programmable tabular report
- Dynamic table structure adjustment
- Style customization and real-time rendering
- Real-time data analysis in programmable way
Rich-text hierarchical operation tree
- Style customization and real-time rendering
- Smart module folding based on structural equivalence detection for intuitive model structure insights

𝙵𝚒𝚗𝚎-𝙶𝚛𝚊𝚒𝚗𝚎𝚍 𝙲𝚞𝚜𝚝𝚘𝚖𝚒𝚣𝚊𝚝𝚒𝚘𝚗

Real-time hot-reload rendering:
Dynamic adjustment of rendering configuration for operation trees, report tables and their nested components
Progressive update:
Namespace assignment + dictionary batch update

𝙲𝚘𝚗𝚏𝚒𝚐-𝙳𝚛𝚒𝚟𝚎𝚗 𝚁𝚞𝚗𝚝𝚒𝚖𝚎 𝙼𝚊𝚗𝚊𝚐𝚎𝚖𝚎𝚗𝚝

Centralized control:
Singleton-managed global configuration for dynamic behavior adjustment
Portable presets:
Export/import YAML profiles for runtime behaviors, eliminating repetitive setup

𝙿𝚘𝚛𝚝𝚊𝚋𝚒𝚕𝚒𝚝𝚢 𝚊𝚗𝚍 𝙿𝚛𝚊𝚌𝚝𝚒𝚌𝚊𝚕𝚒𝚝𝚢

Decoupled pipeline:
Separation of data collection and visualization
Automatic device synchronization:
Maintains production-ready status by keeping model and data co-located
Dual-mode reporting with export flexibility:
- Measurement units mode vs. raw data mode
- Multi-format export (CSV/Excel) for analysis integration

ℬ. 𝐼𝓃𝓈𝓉𝒶𝓁𝓁𝒶𝓉𝒾𝑜𝓃¶

𝙲𝚘𝚖𝚙𝚊𝚝𝚒𝚋𝚒𝚕𝚒𝚝𝚢

OS: windows / linux / macOS
Python: >= 3.8
Pytorch: >= 1.7.0

𝚃𝚑𝚛𝚘𝚞𝚐𝚑 𝙿𝚢𝚝𝚑𝚘𝚗 𝙿𝚊𝚌𝚔𝚊𝚐𝚎 𝙼𝚊𝚗𝚊𝚐𝚎𝚛

the most convenient way, suitable for installing the released latest stable version

# pip series
pip/pipx/pipenv install torchmeter

# Or via conda
conda install torchmeter

# Or via uv
uv add torchmeter

# Or via poetry
poetry add torchmeter

# Other managers' usage please refer to their own documentation

𝚃𝚑𝚛𝚘𝚞𝚐𝚑 𝙱𝚒𝚗𝚊𝚛𝚢 𝙳𝚒𝚜𝚝𝚛𝚒𝚋𝚞𝚝𝚒𝚘𝚗

Suitable for installing released historical versions

Download .whl from PyPI or Github Releases .
Install locally:
```
pip install torchmeter-x.x.x.whl # (1)
```
1. 🙋‍♂️ Replace x.x.x with actual version

𝚃𝚑𝚛𝚘𝚞𝚐𝚑 𝚂𝚘𝚞𝚛𝚌𝚎 𝙲𝚘𝚍𝚎

Suitable for who want to try out the upcoming features (may has unknown bugs).

git clone https://github.com/TorchMeter/torchmeter.git
cd torchmeter

# If you want to install the released stable version, use this: 
git checkout vx.x.x # Stable (1)

# If you want to try the latest development version(alpha/beta), use this:
git checkout master  # Development version

pip install .

🙋‍♂️ Don't forget to eplace x.x.x with actual version. You can check all available versions with git tag -l

𝒞. 𝒢𝑒𝓉𝓉𝒾𝓃𝑔 𝓈𝓉𝒶𝓇𝓉𝑒𝒹¶

𝙳𝚎𝚕𝚎𝚐𝚊𝚝𝚎 𝚢𝚘𝚞𝚛 𝚖𝚘𝚍𝚎𝚕 𝚝𝚘 𝚝𝚘𝚛𝚌𝚑𝚖𝚎𝚝𝚎𝚛

Implementation of ExampleNet

Python

import torch.nn as nn

class ExampleNet(nn.Module):
    def __init__(self):
        super(ExampleNet, self).__init__()

        self.backbone = nn.Sequential(
            self._nested_repeat_block(2),
            self._nested_repeat_block(2)
        )

        self.gap = nn.AdaptiveAvgPool2d(1)

        self.classifier = nn.Linear(3, 2)

    def _inner_net(self):
        return nn.Sequential(
            nn.Conv2d(10, 10, 1),
            nn.BatchNorm2d(10),
            nn.ReLU(),
        )

    def _nested_repeat_block(self, repeat:int=1):
        inners = [self._inner_net() for _ in range(repeat)]
        return nn.Sequential(
            nn.Conv2d(3, 10, 3, stride=1, padding=1),
            nn.BatchNorm2d(10),
            nn.ReLU(),
            *inners,
            nn.Conv2d(10, 3, 1),
            nn.BatchNorm2d(3),
            nn.ReLU()
        )

    def forward(self, x):
        x = self.backbone(x)
        x = self.gap(x)
        x = x.squeeze(dim=(2,3))
        return self.classifier(x)

Python

import torch.nn as nn
from torchmeter import Meter
from torch.cuda import is_available as is_cuda

# 1️⃣ Prepare your pytorch model, here is a simple examples
underlying_model = ExampleNet() # (1)

# Set an extra attribute to the model to show 
# how torchmeter acts as a zero-intrusion proxy later
underlying_model.example_attr = "ABC"

# 2️⃣ Wrap your model with torchmeter
model = Meter(underlying_model)

# 3️⃣ Validate the zero-intrusion proxy

# Get the model's attribute
print(model.example_attr)

# Get the model's method
# `_inner_net` is a method defined in the ExampleNet
print(hasattr(model, "_inner_net")) 

# Move the model to other device (now on cpu)
print(model)
if is_cuda():
    model.to("cuda")
    print(model) # now on cuda

🙋‍♂️ see above for implementation of ExampleNet

𝙶𝚎𝚝 𝚒𝚗𝚜𝚒𝚐𝚑𝚝𝚜 𝚒𝚗𝚝𝚘 𝚝𝚑𝚎 𝚖𝚘𝚍𝚎𝚕 𝚜𝚝𝚛𝚞𝚌𝚝𝚞𝚛𝚎

Python

from rich import print

print(model.structure)

𝚀𝚞𝚊𝚗𝚝𝚒𝚏𝚢 𝚖𝚘𝚍𝚎𝚕 𝚙𝚎𝚛𝚏𝚘𝚛𝚖𝚊𝚗𝚌𝚎 𝚏𝚛𝚘𝚖 𝚟𝚊𝚛𝚒𝚘𝚞𝚜 𝚍𝚒𝚖𝚎𝚗𝚜𝚒𝚘𝚗𝚜

Python

# Parameter Analysis
# Suppose that the `backbone` part of ExampleNet is frozen
_ = model.backbone.requires_grad_(False)
print(model.param)
tb, data = model.profile('param', no_tree=True)

# Before measuring calculation you should first execute a feed-forward
import torch
input = torch.randn(1, 3, 32, 32)
output = model(input) # (1)

# Computational Profiling
print(model.cal) # (2)
tb, data = model.profile('cal', no_tree=True)

# Memory Diagnostics
print(model.mem) # (3)
tb, data = model.profile('mem', no_tree=True)

# Performance Benchmarking
print(model.ittp) # (4)
tb, data = model.profile('ittp', no_tree=True)

# Overall Analytics
print(model.overview())

🙋‍♂️ you do not need to concern about the device mismatch, just feed the model with the input.
🙋‍♂️ cal for calculation
🙋‍♂️ mem for memory
🙋‍♂️ ittp for inference time & throughput

𝙴𝚡𝚙𝚘𝚛𝚝 𝚛𝚎𝚜𝚞𝚕𝚝𝚜 𝚏𝚘𝚛 𝚏𝚞𝚛𝚝𝚑𝚎𝚛 𝚊𝚗𝚊𝚕𝚢𝚜𝚒𝚜

Python

# export to csv
tb, data = model.profile('param', show=False, save_to="params.csv")

# export to excel
tb, data = model.profile('cal', show=False, save_to="../calculation.xlsx")

𝙰𝚍𝚟𝚊𝚗𝚌𝚎𝚍 𝚞𝚜𝚊𝚐𝚎

Attributes/methods access of underlying model
Automatic device synchronization
Smart module folding
Performance gallery
Customized visulization
Best practice of programmable tabular report
- Real-time structure adjustment
- Real-time data analysis
Instant export and postponed export
Centralized configuration management
Submodule exploration

𝒟. 𝒞𝑜𝓃𝓉𝓇𝒾𝒷𝓊𝓉𝑒¶

Thank you for wanting to make TorchMeter even better!

There are several ways to make a contribution:

Before jumping in, let's ensure smooth collaboration by reviewing our 📋 contribution guidelines first.

Thanks again !

ℰ. 𝒞𝑜𝒹𝑒 𝑜𝒻 𝒞𝑜𝓃𝒹𝓊𝒸𝓉¶

Refer to official code-of-conduct file for more details.

TorchMeter is an open-source project built by developers worldwide. We're committed to fostering a friendly, safe, and inclusive environment for all participants.
This code applies to all community spaces including but not limited to GitHub repositories, community forums, etc.

ℱ. 𝐿𝒾𝒸𝑒𝓃𝓈𝑒¶

TorchMeter is released under the AGPL-3.0 License, see the LICENSE file for the full text.
Please carefully review the terms in the LICENSE file before using or distributing TorchMeter.
Ensure compliance with the licensing conditions, especially when integrating this project into larger systems or proprietary software.