Skip to content

Intros

TorchMeter Banner TorchMeter Banner

πŸš€ 𝒀𝒐𝒖𝒓 𝑨𝒍𝒍-π’Šπ’-𝑢𝒏𝒆 𝑻𝒐𝒐𝒍 𝒇𝒐𝒓 π‘·π’šπ’•π’π’“π’„π’‰ 𝑴𝒐𝒅𝒆𝒍 π‘¨π’π’‚π’π’šπ’”π’Šπ’” πŸš€

PyPI-Version Python-Badge Pytorch-Badge Ruff-Badge Static Badge

  • Repo: https://github.com/TorchMeter/torchmeter
  • Intro: Provides comprehensive measurement of Pytorch model's Parameters, FLOPs/MACs, Memory-Cost, Inference-Time and Throughput with highly customizable result display ✨

π’œ. π»π’Ύπ‘”π’½π“π’Ύπ‘”π’½π“‰π“ˆ

πš‰πšŽπš›πš˜-π™Έπš—πšπš›πšžπšœπš’πš˜πš— π™Ώπš›πš˜πš‘πš’
  • Acts as drop-in decorator without any changes of the underlying model
  • Seamlessly integrates with Pytorch modules while preserving full compatibility (attributes and methods)
π™΅πšžπš•πš•-πš‚πšπšŠπšŒπš” π™Όπš˜πšπšŽπš• π™°πš—πšŠπš•πš’πšπš’πšŒπšœ

Holistic performance analytics across 5 dimensions:

  • Parameter Analysis

    • Total/trainable parameter quantification
    • Layer-wise parameter distribution analysis
    • Gradient state tracking (requires_grad flags)
  • Computational Profiling

    • FLOPs/MACs precision calculation
    • Operation-wise calculation distribution analysis
    • Dynamic input/output detection (number, type, shape, ...)
  • Memory Diagnostics

    • Input/output tensor memory awareness
    • Hierarchical memory consumption analysis
  • Inference latency & 5. Throughput benchmarking

    • Auto warm-up phase execution (eliminates cold-start bias)
    • Device-specific high-precision timing
    • Inference latency & Throughput Benchmarking
πšπš’πšŒπš‘ πš…πš’πšœπšžπšŠπš•πš’πš£πšŠπšπš’πš˜πš—
  • Programmable tabular report

    • Dynamic table structure adjustment
    • Style customization and real-time rendering
    • Real-time data analysis in programmable way
  • Rich-text hierarchical operation tree

    • Style customization and real-time rendering
    • Smart module folding based on structural equivalence detection for intuitive model structure insights
π™΅πš’πš—πšŽ-π™Άπš›πšŠπš’πš—πšŽπš π™²πšžπšœπšπš˜πš–πš’πš£πšŠπšπš’πš˜πš—
  • Real-time hot-reload rendering:
    Dynamic adjustment of rendering configuration for operation trees, report tables and their nested components

  • Progressive update:
    Namespace assignment + dictionary batch update

π™²πš˜πš—πšπš’πš-π™³πš›πš’πšŸπšŽπš— πšπšžπš—πšπš’πš–πšŽ π™ΌπšŠπš—πšŠπšπšŽπš–πšŽπš—πš
  • Centralized control:
    Singleton-managed global configuration for dynamic behavior adjustment

  • Portable presets:
    Export/import YAML profiles for runtime behaviors, eliminating repetitive setup

π™Ώπš˜πš›πšπšŠπš‹πš’πš•πš’πšπš’ πšŠπš—πš π™Ώπš›πšŠπšŒπšπš’πšŒπšŠπš•πš’πšπš’
  • Decoupled pipeline:
    Separation of data collection and visualization

  • Automatic device synchronization:
    Maintains production-ready status by keeping model and data co-located

  • Dual-mode reporting with export flexibility:

    • Measurement units mode vs. raw data mode
    • Multi-format export (CSV/Excel) for analysis integration

ℬ. πΌπ“ƒπ“ˆπ“‰π’Άπ“π“π’Άπ“‰π’Ύπ‘œπ“ƒ

π™²πš˜πš–πš™πšŠπšπš’πš‹πš’πš•πš’πšπš’
  • OS: windows / linux / macOS
  • Python: >= 3.8
  • Pytorch: >= 1.7.0
πšƒπš‘πš›πš˜πšžπšπš‘ π™Ώπš’πšπš‘πš˜πš— π™ΏπšŠπšŒπš”πšŠπšπšŽ π™ΌπšŠπš—πšŠπšπšŽπš›

the most convenient way, suitable for installing the released latest stable version

# pip series
pip/pipx/pipenv install torchmeter

# Or via conda
conda install torchmeter

# Or via uv
uv add torchmeter

# Or via poetry
poetry add torchmeter

# Other managers' usage please refer to their own documentation
πšƒπš‘πš›πš˜πšžπšπš‘ π™±πš’πš—πšŠπš›πš’ π™³πš’πšœπšπš›πš’πš‹πšžπšπš’πš˜πš—

Suitable for installing released historical versions

  1. Download .whl from PyPI or Github Releases .

  2. Install locally:

    pip install torchmeter-x.x.x.whl # (1)
    
    1. πŸ™‹β€β™‚οΈ Replace x.x.x with actual version
πšƒπš‘πš›πš˜πšžπšπš‘ πš‚πš˜πšžπš›πšŒπšŽ π™²πš˜πšπšŽ

Suitable for who want to try out the upcoming features (may has unknown bugs).

git clone https://github.com/TorchMeter/torchmeter.git
cd torchmeter

# If you want to install the released stable version, use this: 
git checkout vx.x.x # Stable (1)

# If you want to try the latest development version(alpha/beta), use this:
git checkout master  # Development version

pip install .
  1. πŸ™‹β€β™‚οΈ Don't forget to eplace x.x.x with actual version. You can check all available versions with git tag -l

π’ž. 𝒒𝑒𝓉𝓉𝒾𝓃𝑔 π“ˆπ“‰π’Άπ“‡π“‰π‘’π’Ή

π™³πšŽπš•πšŽπšπšŠπšπšŽ πš’πš˜πšžπš› πš–πš˜πšπšŽπš• 𝚝𝚘 πšπš˜πš›πšŒπš‘πš–πšŽπšπšŽπš›
Implementation of ExampleNet
Python
import torch.nn as nn

class ExampleNet(nn.Module):
    def __init__(self):
        super(ExampleNet, self).__init__()

        self.backbone = nn.Sequential(
            self._nested_repeat_block(2),
            self._nested_repeat_block(2)
        )

        self.gap = nn.AdaptiveAvgPool2d(1)

        self.classifier = nn.Linear(3, 2)

    def _inner_net(self):
        return nn.Sequential(
            nn.Conv2d(10, 10, 1),
            nn.BatchNorm2d(10),
            nn.ReLU(),
        )

    def _nested_repeat_block(self, repeat:int=1):
        inners = [self._inner_net() for _ in range(repeat)]
        return nn.Sequential(
            nn.Conv2d(3, 10, 3, stride=1, padding=1),
            nn.BatchNorm2d(10),
            nn.ReLU(),
            *inners,
            nn.Conv2d(10, 3, 1),
            nn.BatchNorm2d(3),
            nn.ReLU()
        )

    def forward(self, x):
        x = self.backbone(x)
        x = self.gap(x)
        x = x.squeeze(dim=(2,3))
        return self.classifier(x)
Python
import torch.nn as nn
from torchmeter import Meter
from torch.cuda import is_available as is_cuda

# 1️⃣ Prepare your pytorch model, here is a simple examples
underlying_model = ExampleNet() # (1)

# Set an extra attribute to the model to show 
# how torchmeter acts as a zero-intrusion proxy later
underlying_model.example_attr = "ABC"

# 2️⃣ Wrap your model with torchmeter
model = Meter(underlying_model)

# 3️⃣ Validate the zero-intrusion proxy

# Get the model's attribute
print(model.example_attr)

# Get the model's method
# `_inner_net` is a method defined in the ExampleNet
print(hasattr(model, "_inner_net")) 

# Move the model to other device (now on cpu)
print(model)
if is_cuda():
    model.to("cuda")
    print(model) # now on cuda
  1. πŸ™‹β€β™‚οΈ see above for implementation of ExampleNet
π™ΆπšŽπš πš’πš—πšœπš’πšπš‘πšπšœ πš’πš—πšπš˜ πšπš‘πšŽ πš–πš˜πšπšŽπš• πšœπšπš›πšžπšŒπšπšžπš›πšŽ
Python
from rich import print

print(model.structure)
πš€πšžπšŠπš—πšπš’πšπš’ πš–πš˜πšπšŽπš• πš™πšŽπš›πšπš˜πš›πš–πšŠπš—πšŒπšŽ πšπš›πš˜πš– πšŸπšŠπš›πš’πš˜πšžπšœ πšπš’πš–πšŽπš—πšœπš’πš˜πš—πšœ
Python
# Parameter Analysis
# Suppose that the `backbone` part of ExampleNet is frozen
_ = model.backbone.requires_grad_(False)
print(model.param)
tb, data = model.profile('param', no_tree=True)

# Before measuring calculation you should first execute a feed-forward
import torch
input = torch.randn(1, 3, 32, 32)
output = model(input) # (1)

# Computational Profiling
print(model.cal) # (2)
tb, data = model.profile('cal', no_tree=True)

# Memory Diagnostics
print(model.mem) # (3)
tb, data = model.profile('mem', no_tree=True)

# Performance Benchmarking
print(model.ittp) # (4)
tb, data = model.profile('ittp', no_tree=True)

# Overall Analytics
print(model.overview())
  1. πŸ™‹β€β™‚οΈ you do not need to concern about the device mismatch, just feed the model with the input.
  2. πŸ™‹β€β™‚οΈ cal for calculation
  3. πŸ™‹β€β™‚οΈ mem for memory
  4. πŸ™‹β€β™‚οΈ ittp for inference time & throughput
π™΄πš‘πš™πš˜πš›πš πš›πšŽπšœπšžπš•πšπšœ πšπš˜πš› πšπšžπš›πšπš‘πšŽπš› πšŠπš—πšŠπš•πš’πšœπš’πšœ
Python
# export to csv
tb, data = model.profile('param', show=False, save_to="params.csv")

# export to excel
tb, data = model.profile('cal', show=False, save_to="../calculation.xlsx")
π™°πšπšŸπšŠπš—πšŒπšŽπš 𝚞𝚜𝚊𝚐𝚎
  1. Attributes/methods access of underlying model
  2. Automatic device synchronization
  3. Smart module folding
  4. Performance gallery
  5. Customized visulization
  6. Best practice of programmable tabular report
  7. Instant export and postponed export
  8. Centralized configuration management
  9. Submodule exploration

π’Ÿ. π’žπ‘œπ“ƒπ“‰π“‡π’Ύπ’·π“Šπ“‰π‘’

Thank you for wanting to make TorchMeter even better!

There are several ways to make a contribution:

Before jumping in, let's ensure smooth collaboration by reviewing our πŸ“‹ contribution guidelines first.

Thanks again !

β„°. π’žπ‘œπ’Ήπ‘’ π‘œπ’» π’žπ‘œπ“ƒπ’Ήπ“Šπ’Έπ“‰

Refer to official code-of-conduct file for more details.

  • TorchMeter is an open-source project built by developers worldwide. We're committed to fostering a friendly, safe, and inclusive environment for all participants.

  • This code applies to all community spaces including but not limited to GitHub repositories, community forums, etc.

β„±. πΏπ’Ύπ’Έπ‘’π“ƒπ“ˆπ‘’

  • TorchMeter is released under the AGPL-3.0 License, see the LICENSE file for the full text.
  • Please carefully review the terms in the LICENSE file before using or distributing TorchMeter.
  • Ensure compliance with the licensing conditions, especially when integrating this project into larger systems or proprietary software.