SophieK: Understanding the basics behind a profiling report - Part 2: distribution of the execution time

Hi! This article follows directly this one, and explains what's going on in the second part of the profiling report.

First, some things to remember :
1) When profiling the same code several times, you will not get exactly the same results or the exact same percentages. Also, depending on what you are doing in parallel in your image, some extra results may show in the report (it is quite unusual, but it can happen).
2) If you want to profile code that runs for only a short amount of time, you'll need to execute it more than one time to get a significant amount of samples (remember the profiler is cadenced at ~1,3 kHz)
3) If you want to use the profiler to understand why your program is slow, and how you could improve its code, I suggest you first use MessageTally (In Pharo, right-click and "Profile it", in Squeak, right-click, "spy on it"). You can read this chapter about profiling in the Deep Into Pharo manual, and this post.

Part 2 - Distribution of the execution time

So, let's have a look at this part of the report :

% of generated vm code (% of total) (samples) (cumulative)
25.55% (24.04%) Behavior>>new (449) (25.55%)
24.36% (22.91%) UndefinedObject>>DoIt (428) (49.91%)
22.25% (20.93%) Behavior>>basicNew (391) (72.17%)
19.07% (17.93%) PIC basicNew (335) (91.24%)
5.98% ( 5.62%) PIC initialize (105) (97.21%)
2.79% ( 2.62%) ProtoObject>>initialize (49) (100.0%)

% of vanilla vm code (% of total) (samples) (cumulative)
34.57% ( 1.50%) scavengeReferentsOf (28) (34.57%)
23.46% ( 1.02%) doScavenge (19) (58.02%)
22.22% ( 0.96%) copyAndForward (18) (80.25%)
9.88% ( 0.43%) shouldRemapObj (8) (90.12%)
3.70% ( 0.16%) mapStackPages (3) (93.83%)
2.47% ( 0.11%) scavengingGCTenuringIf (2) (96.30%)
1.23% ( 0.05%) addToFreeListbytes (1) (97.53%)
1.23% ( 0.05%) mapInterpreterOops (1) (98.77%)
1.23% ( 0.05%) ioUTCMicroseconds (1) (100.0%)

30 samples in the rest 1.61% of total
% of rest (% of total) (samples) (cumulative)
100.0% ( 1.61%) Samples Not In Any Function (30) (100.0%)

It displays how the execution time is divided up between 2 types of code: the generated vm code and the vanilla vm code. In both these parts, the results are displayed the same way : you get a list of functions and how the execution time is divided up between these functions.

1. Generated vm code

First, the generated vm code part.
It deals with the code generated by the JIT compiler of the VM, that is to say all the functions, enilopmarts/trampolines, PICs present in the machine code zone (remember the "Code size", displayed at the very beginning of the report? This refers to the size of the machine code zone I'm talking about right now).
You can differentiate the different types of functions as follows :

Class>>selector refers to a native code function, i.e. a method that has been optimized by the JIT and converted to native code (like the "Behavior>>new" one)
PIC selector refers to a PIC, that could either be an open PIC or a closed PIC (like "PIC basicNew") (for more informations about PICs and what they are, you can refer to this, this. This short article is very clear as well)
ce... refers to trampolines/enilopmarts (not pictured in this example, but for instance, like "ceTraceStoreTrampoline")

The profiling of [100000000 timesRepeat: [Object new]] gave this result :

% of generated vm code (% of total) (samples) (cumulative)
25.55% (24.04%) Behavior>>new (449) (25.55%)
24.36% (22.91%) UndefinedObject>>DoIt (428) (49.91%)
22.25% (20.93%) Behavior>>basicNew (391) (72.17%)
19.07% (17.93%) PIC basicNew (335) (91.24%)
5.98% ( 5.62%) PIC initialize (105) (97.21%)
2.79% ( 2.62%) ProtoObject>>initialize (49) (100.0%)

By looking at these results, we can say that 25,5% of the time spent in generated vm code was spent in the Behavior>>new function, corresponding to 24,08% of the total execution time (disregarding the differentiation generated/vanilla). When profiling, 449 samples were referring to Behavior>>new, and that represents 25,55%, cumulated, in generated vm code.
We can roughly trace what happened in the VM when Object new was executed 100000000 times.
Indeed, Object>>new refers to Behavior>>new, and if you browse Behavior>>new, you'll find :

Behavior>>new

^ self basicNew initialize

That's the reason why you come across Behavior>>new, Behavior>>basicNew and ProtoObject>>initialize. As basicNew and initialize are frequently used, they are soon turned into PICs. The profiler is meant to provide the detail whether the PIC is an open or a closed one, but that will be the topic of another post.

2. Vanilla vm code

So this is it for the generated vm code part, let's have a look at the vanilla vm code part :

% of vanilla vm code (% of total) (samples) (cumulative)
34.57% ( 1.50%) scavengeReferentsOf (28) (34.57%)
23.46% ( 1.02%) doScavenge (19) (58.02%)
22.22% ( 0.96%) copyAndForward (18) (80.25%)
9.88% ( 0.43%) shouldRemapObj (8) (90.12%)
3.70% ( 0.16%) mapStackPages (3) (93.83%)
2.47% ( 0.11%) scavengingGCTenuringIf (2) (96.30%)
1.23% ( 0.05%) addToFreeListbytes (1) (97.53%)
1.23% ( 0.05%) mapInterpreterOops (1) (98.77%)
1.23% ( 0.05%) ioUTCMicroseconds (1) (100.0%)

I am definitely not as proficient as in the generated vm code part. But as usual, when you want to know what these functions do, it is better to have a look directly in the VMMaker package!
Usually, at least a few of these functions will refer to garbage collection. Here, this is indeed the case for the first four functions. So according to these results, we can see that 90,12% of the time spent in the vanilla vm code was spent scavenging (see the 90,12% in the cumulative row). I copied below the comments of these functions.

scavengeReferentsOf: inspects all the pointers in referrer. If any are new objects, it has them moved to FutureSurvivorSpace, and answers truth. If there are no new referents, it answers falsity
doScavenge: the inner shell for scavenge, abstracted out so globalGarbageCollect can use it.
copyAndForward: copies a survivor object either to futureSurvivorSpace or, if it is to be promoted, to oldSpace.
shouldRemapObj: Answer if the obj should be scavenged (or simply followed)

If you want to have more global informations about memory management, you can directly have a look in the third part of the report, in which you will find the total amount of performed full GC's, scavenges, tenures...

3. Samples in the rest

The last samples, that could not be linked to a known symbol, are gathered in this part. Usually, this represents a non-significant amount of samples.

30 samples in the rest 1.61% of total
% of rest (% of total) (samples) (cumulative)
100.0% ( 1.61%) Samples Not In Any Function (30) (100.0%)

The next article will be about the third part of the report, which gather data about memory usage, among other things. It will also provide some insights about how to tune the VM according to the results.
Thanks for reading!

SophieK

jeudi 17 août 2017

Understanding the basics behind a profiling report - Part 2: distribution of the execution time

Part 2 - Distribution of the execution time

1. Generated vm code

2. Vanilla vm code

3. Samples in the rest

Aucun commentaire:

Enregistrer un commentaire