VTune Call Graph Utilization Report

本文介绍如何使用VTune性能分析器收集程序流程信息,包括函数调用次数及时长,并通过实例展示如何解读调用图报告,帮助优化应用程序性能。

 

VTune Call Graph Utilization Report

                The call graph collector of the VTune(TM) Performance Analyzer collects information about the program flow of an application, that is, the number of function calls to some other function and the amount of time each function spent executing its code and/or calling other functions.

1.       Pre-Condition

                The application needed to be optimized must be built with the link option /FIXED:NO, or else the function can’t be used.

2.       Instrumentation for Call Graph Profiling

                Instrumentation is the process of modifying a program so that dynamic information is recorded during program execution. Data collection routines invoked at specific points in the execution of the target program record run-time information. These routines provide information about time spent in each function, and the call sequence that leads to a specific function. By default, the VTune Performance Analyzer instruments all application functions and system-level exports.

Note: VTune export too many dynamic library to make the application can’t startup, so the configure of the Activity should be modified. In the Configure Call Graph Window, click the Advance button, turn the instrumental level of System DLL and User EXE to minimal.

                This process does not change the functionality of the program. However, during runtime, it slows performance down. The VTune analyzer keeps track of the exit and entry points, records the number of times each function was called, establishes a relationship between the caller (parent) and callee (child) function, and stores this data.

 

3.       Process of Call Graph Data Collection

            When you select an Activity with the call graph collector in the Tuning Browser and click Run Activity to begin performance data collection, the VTune Performance Analyzer performs the following steps:

1)      Instruments the application and/or modules of interest defined during Activity creation.

2)      Launches and profiles the instrumented application and/or modules of interest until the application terminates, or until you stop running the application. It keeps track of the exit and entry points, records the number of times each function was called, establishes a relationship between the caller (parent) and callee (child) function, and stores this data.

3)      Analyzes the profile data, generates a new Activity result, which is stored in the Tuning Browser, and displays the call graph data in tabular and graph view.

4.      Viewing Call Graph Data

            After collected call graph data using the VTune Performance Analyzer, you can view the call graph profiling information in the following views:

 

            Graph: provides visual graphical presentation of the application execution.

            Call list: provides full information on the selected function, its callers (parents) and callees (children) in the table format.

            Function summary: provides full information on all the application instrumented functions in table format.

            The upper section of the call graph window displays the function information in a table format. The rows in the function summary display functions with different background colors according to the hierarchical position. The default view shows the first four types of data as follows:

Module

    Thread

        Class(optional)

            Function

The function summary view provides several columns, the most import column for our test is the Self Time (microseconds), Total Time (microseconds), % in function.

Self Time: Time (microseconds) spent in the function itself.

Total Time: Time (microseconds) spent in the function and in all the callees it called.

% in function: Ratio displaying how much time was spent in the function itself. You can calculate the ratio using the following formula:

% in function = (Self Time/Total Time)* 100

 

 


In the same environment, the three benchmark may change slightly, we can choose the average value calculated by several measurements.

Note:  the punctuation of the column percentage in function can’t be more precise, maybe the calculator can help.

Theoretically, when we substitute one of the protocol decode library, the Self Time and the percentage in function ascend or descend can be regarded as the performance of the library changed.

 

 

5.      Examples

The key data from the call graph as below, the first two indicate running in the same environment and the third one use the optimized SCCP library.

Table 1 unoptimized sccp 1

Module

Function

Self Time

Total Time

% in function

3gpp_r99_sccp_DLL.dll – Total

 

29060

 

 

3gpp_r99_sccp_DLL.dll

 

0

 

 

3gpp_r99_sccp_DLL.dll

pa_DLLGetPtr

0

0

0

3gpp_r99_sccp_DLL.dll

pa_DLLGetTable

0

0

0

3gpp_r99_sccp_DLL.dll

pal_Initialize

0

0

0

3gpp_r99_sccp_DLL.dll

pal_Terminate

0

0

0

3gpp_r99_sccp_DLL.dll

 

29060

 

 

3gpp_r99_sccp_DLL.dll

pal_PreDecodeKFE

17869

78826

0.23

3gpp_r99_sccp_DLL.dll

pal_PreDecodeTO

11191

32389

0.35

 

Table 2 unoptimized sccp 2

Module

Function

Self Time

Total Time

% in function

3gpp_r99_sccp_DLL.dll – Total

 

28931

 

 

3gpp_r99_sccp_DLL.dll

 

0

 

 

3gpp_r99_sccp_DLL.dll

pa_DLLGetPtr

0

0

0

3gpp_r99_sccp_DLL.dll

pa_DLLGetTable

0

0

0

3gpp_r99_sccp_DLL.dll

pal_Initialize

0

0

0

3gpp_r99_sccp_DLL.dll

pal_Terminate

0

0

0

3gpp_r99_sccp_DLL.dll

 

28931

 

 

3gpp_r99_sccp_DLL.dll

pal_PreDecodeKFE

17837

78844

0.23

3gpp_r99_sccp_DLL.dll

pal_PreDecodeTO

11094

32234

0.34

Table 3 Optimized SCCP 

Module

Function

Self Time

Total Time

% in function

3gpp_r99_sccp_DLL.dll - Total

 

22636

 

 

3gpp_r99_sccp_DLL.dll

 

1

 

 

3gpp_r99_sccp_DLL.dll

pa_DLLGetPtr

1

1

1

3gpp_r99_sccp_DLL.dll

pa_DLLGetTable

0

0

0

3gpp_r99_sccp_DLL.dll

pal_InitKFE

0

0

0

3gpp_r99_sccp_DLL.dll

 

22635

 

 

3gpp_r99_sccp_DLL.dll

pal_PreDecodeKFE

16823

77956

0.22

3gpp_r99_sccp_DLL.dll

pal_PreDecodeTO

5812

40065

0.15

 

代码转载自:https://pan.quark.cn/s/8ce4326d996e 对于在 CentOS 7 系统中修改网卡配置文件后无法使设置生效的情况,经过实践验证,可以通过使用 nmcli 命令来进行调整。完成修改之后,需要重新启动虚拟机以使更改生效,这样操作流程即告完成。如果设置仍然无法生效,则表明虚拟机在启动过程中所获取的 IP 地址配置并非针对 eth0,此时可以对其它网卡的配置文件进行修改或将其移除。在 CentOS 7 系统中,网络配置的管理机制与早期版本存在差异,主要体现为采用了 Network Manager 服务来负责网络接口的管理。在某些情形下,尽管修改了 `/etc/sysconfig/network-scripts` 目录下的 `ifcfg-eth0` 文件,但网络配置却未能即时生效。此类问题的发生通常源于 CentOS 7 采用了不同于以往的配置读取方法。接下来将具体阐述如何借助 nmcli 命令来处理这一挑战。 以 root 用户身份登录系统并打开终端界面。nmcli 是 Network Manager 提供的命令行界面工具,它支持在命令行环境下执行网络连接的建立、编辑、查询及管理任务。针对修改 eth0 网卡配置的需求,可以遵循以下步骤进行操作: 1. 导航至 `/etc/sysconfig/network-scripts` 目录: ``` cd /etc/sysconfig/network-scripts ``` 2. 检查该目录内是否存在 `ifcfg-eth0.bak` 文件,该备份文件可能是先前调整配置时遗留下来的,若存在可能造成冲突。若发现该文件,可以选择将其删除: ``` [root@localhost netw...
代码转载自:https://pan.quark.cn/s/46fd08fb879c 网管教程 从入门到精通软件篇 ★一。★详尽的xp修复控制台指令及其应用!!! 放入xp(2000)的光盘,安装时选择R,执行修复! Windows XP(涵盖 Windows 2000)的控制台指令是在系统遭遇某些意外状况时的一种极具效用的诊断、检测以及恢复系统功能的工具。笔者确实一直期望能够将这方面的指令进行归纳,此次由老范辛苦整理了这份极具价值的秘籍。 Bootcfg bootcfg 命令用于启动配置与故障恢复(对大多数计算机而言,即 boot.ini 文件)。 带有特定参数的 bootcfg 命令仅在运用故障恢复控制台时方可使用。能够在命令行界面下运用带有不同参数的 bootcfg 命令。 用法: bootcfg /default 设定默认引导选项。 bootcfg /add 向引导清单中增添 Windows 安装。 bootcfg /rebuild 重复整个 Windows 安装流程并让用户选择需添加的项目。 注意:运用 bootcfg /rebuild 之前,应先借助 bootcfg /copy 命令备份 boot.ini 文件。 bootcfg /scan 探查用于 Windows 安装的全部磁盘并展示结果。 注意:这些结果被静态存储,并用于当前会话。若在当前会话期间磁盘配置发生变动,为获取更新的探查结果,必须先重启计算机,然后再次探查磁盘。 bootcfg /list 列示引导清单中已有的项目。 bootcfg /disableredirect 在启动引导程序中禁用重定向。 bootcfg /redirect [ PortBaudRrate] |[ useBio...
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值