Information on Solaris’ DTrace Technology

Submitted by Bryan Cantrill 2003-11-18 Solaris 22 Comments

Now that Solaris Express 11/03 is publicly available for download, Bryan Cantrill (Solaris Kernel Development) from the DTrace engineering team wanted to take a moment to post some slightly more elaborate examples of this new technology.The following was posted on the Solaris newsgroup:

Before we get started, note that there is quite a bit of publicly available documentation on DTrace at BigAdmin:
http://www.sun.com/bigadmin/content/dtrace

Also, note that this post isn’t meant to be a comprehensive tutorial — it’s really more of a quick demonstration of how DTrace might be used to solve problems that you may come across in your day-to-day workings with Solaris.

On with the show: let’s say you’re looking at mpstat(1) output on your multiuser server. You might see something like this:



CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl

 12    1  27 3504   338  206  765   27  114   65    1   337    9  19  41  31

 13    1  19 5725    98   68  723   22  108  120    0   692    3  17  20  61

 14    0  57 3873   224  192  670   22   75   86    1   805    7  10  35  48

 15   36   7 1551    42   28  689    6   68   59    0   132    2   9  35  54

 16   14   7 7209   504  457 1031   37  125  244    0   459    6  30   4  60

 17    5   5 4960   150  108  817   37   98  154    0   375    6  26   6  62

 18    5   6 6085  1687 1661  741   60   76  248    0   434    3  33   0  64

 19    0  15 10037    72   41  876   23  100  291    1   454    2  19   9  71

 20   12   5 5890   746  711  992   32  122  216    2   960   10  33   4  53

 21   60   5 1567   729  713  467   15   80   59    0   376    2  35  10  53

 22    0   6 4378   315  291  751   17   84  142    1   312    3  16   1  80

 23    0   6 12119    33    3  874   20   82  384    1   513    4  24  11  62

And well you may wonder (as perhaps you often have) — what the hell is
causing all of those cross calls, anyway? (Cross calls appear in the “xcal”
column; see mpstat(1).)

Using DTrace, investigating this is a snap:

# dtrace -n xcalls'{@[execname] = count()}’
dtrace: description ‘xcalls’ matched 4 probes
[ letting this run for a few seconds ] ^C

mozilla-bin 1
lockd 1
in.mpathd 2
nsrmmd 5
grep 6
chmod 6
cat 6
nwadmin 13
ls 24
in.tftpd 28
nsrindexd 34
fsflush 38
cut 42
find 42
mkdir 66
rm 76
ipop3d 78
scp 79
inetd 96
dtrace 111
nawk 118
imapd-simmonmt 126
rmdir 132
sshd 138
rpc.rstatd 159
mv 398
save 1292
gzip 1315
get_all2 1678
sched 1712
nfsd 3709
tar 27054

So the bulk of these are coming from tar(1) processes. To explore the cross calls coming from tar specifically, we could add a predicate to the above:



# dtrace -n xcalls’/execname == “tar”/{@[pid] = count()}’

dtrace: description ‘xcalls’ matched 4 probes

[ again letting this run for a few seconds ]
^C
   938339              507

   938415              784

   938229             2438

   938390             3282

   938363             6464

   937067            13984

But now we notice a strange thing: none of these processes seems to bearound by the time we’re back on the shell, e.g.:

# pflags 937067
pflags: cannot examine 937067: no such process or core file

So is someone launching a bunch of short-lived tar processes? To explore this as a rough first cut, we could just print the pid whenever a tar
process is launched:



# dtrace -n syscall::exec\*:return’/execname == “tar”/{trace(pid)}’

CPU     ID                    FUNCTION:NAME

 18    112                     exece:return    954538

 21    112                     exece:return    954519

 17    112                     exece:return    954584

 21    112                     exece:return    954579

 21    112                     exece:return    954607

 22    112                     exece:return    954632

 21    112                     exece:return    954658

 20    112                     exece:return    954684

 20    112                     exece:return    954705

 15    112                     exece:return    954727

 14    112                     exece:return    954763

 16    112                     exece:return    954775

 12    112                     exece:return    954798

 18    112                     exece:return    954818

 22    112                     exece:return    954873

 22    112                     exece:return    954889

 23    112                     exece:return    954893

 14    112                     exece:return    954916

 14    112                     exece:return    954971

…

Seeing that this is happening with some frequency (the above has output every second or so), we want to explore more generally what is being exec(2)’d on the machine:

# dtrace -n syscall::exec\*:entry'{@[copyinstr(arg0)] = count()}’
[ letting this run for a bit ] ^C

/usr/lib/mail.local 1
/usr/bin/true 1
/usr/bin/pgrep 1
/usr/bin/wc 1
/usr/bin/touch 1
/usr/bin/sleep 1
/usr/sbin/modinfo 4
/bin/awk 4
/bin/sh 4
/usr/bin/chmod 14
/usr/bin/cat 14
/usr/bin/tar 14
./do1 14
/usr/bin/cut 15
/usr/bin/grep 15
/usr/bin/find 15
/usr/bin/ls 15
/usr/bin/mv 15
/usr/bin/rm 16
/usr/bin/mkdir 17
/usr/bin/gzip 29
/usr/bin/nawk 44
/usr/bin/rmdir 48
/home/ellenp/rmdir 48
/home1/netscape/rmdir 48
/usr/dt/bin/rmdir 96
/usr/openwin/bin/rmdir 96

There’s a bunch that’s interesting here — but for now we’re interested in those tar processes. In particular, when someone is exec’ing a tar
process, we want to know who exactly they are. To do this, we write the following DTrace script, and call it “whotar.d”:



syscall::exec*:entry

/copyinstr(arg0) == “/usr/bin/tar”/

{

@[execname] = count();

}

Running this points us straight to the culprit:

# dtrace -s whotar.d
[ again letting this run for a bit ] ^C

get_all2 22

So get_all2 is the problem. We can now use traditional tools to hone in
on this:

# ptree `pgrep get_all2`
100578 /usr/sbin/inetd -s
900870 in.rlogind
900872 -csh
901382 /bin/sh get_all3
901383 /bin/sh ./get_all2 80/80acc080 90/00fccc90
959045 /bin/sh ./get_all2 80/80acc080 90/00fccc90
959046 tar xvf – explorer.80c23d8e.ptgw02is-2003.10.27.09.35/README
959047 gzip -dc /net/pamer.central/proactive/rawdata/8e/80c23d8e
901384 /bin/sh ./get_all2 90/80bdbb90 a0/1274eda0
959565 /bin/sh ./get_all2 90/80bdbb90 a0/1274eda0
959566 tar xvf – explorer.8089ed9d.curly-2002.05.06.10.00/README exp
959567 gzip -dc /net/pamer.central/proactive/rawdata/9d/8089ed9d
901385 /bin/sh ./get_all2 a0/80aabda0 b0/1d6c05b0
959619 mv disks/diskinfo disks/format.out disks/mount.out etc/release
901386 /bin/sh ./get_all2 b0/80b264b0 c0/335419c0
959612 /bin/sh ./get_all2 b0/80b264b0 c0/335419c0
959613 tar xvf – explorer.82a86abd.seq22-2002.05.31.21.49/README exp
959614 gzip -dc /net/pamer.central/proactive/rawdata/bd/82a86abd

So that explains where the tar processes (and thus the cross calls) are coming
from! And just how short-lived _are_ these tar processes? We can answer this
question with DTrace, natch. To do this, we’ll use an associative array and
the “timestamp” variable. We are going to linearly quantize the result from 0
seconds to 60 seconds. We’ll also use a “tick-1sec” probe to automatically
stop after a minute. The script (which we call “longtar.d”):

syscall::exec*:return
/execname == “tar”/
{
start[pid] = timestamp;
}

fbt::exit:entry
/start[pid]/
{
@[“tar time, in seconds”] =
lquantize((timestamp – start[pid]) / 1000000000, 0, 60);
start[pid] = 0;
}

tick-1sec
/n++ >= 60/
{
exit(0);
}

Running this:

# dtrace -s longtar.d
dtrace: script ‘longtar.d’ matched 4 probes
CPU ID FUNCTION:NAME
23 38110 :tick-1sec

tar time, in seconds
value ————- Distribution ————- count
< 0 | 0 0 |@ 1 1 |@@@@@ 4 2 |@@@@@@@@@@@ 8 3 |@@@@@@@ 5 4 |@@@@@@@ 5 5 |@@ 2 6 | 0 7 |@@@@ 3 8 | 0

This highlights just how difficult this problem would be to find using
traditional (process-centric) tools: each tar process is only alive
for (roughly) two seconds! And yet, taken together, the tar processes
are accounting for the vast majority of cross calls on the system.

This is a good example of using DTrace because the problem here is _not_ a
single application — it’s an insurgent army of tar processes that are
slipping away before they can be noticed. Indeed, running prstat(1) on this
machine doesn’t show tar — or _any_ of the get_all2 progeny processes — as
top resource-consuming processes. Using DTrace, however, one can quickly
root-cause aberrant systemic behavior of this nature.

Here you can download Solaris Express 11/03 and try DTrace yourself!

Enjoy,
Bryan Cantrill, Mike Shapiro and Adam Leventhal
DTrace Engineering Team

About The Author

Eugenia Loli

Ex-programmer, ex-editor in chief at OSNews.com, now a visual artist/filmmaker.

Follow me on Twitter @EugeniaLoli

22 Comments

2003-11-18 2:52 am
Anonymous
I think I am not knowlegeable enough to understand what’s going on with this piece of news.
2003-11-18 3:20 am
Anonymous
My best guess is that it describes a way of determining what’s eating up system resources, after the fact.
2003-11-18 3:20 am
Anonymous
Neither am I. We can only hope that someone will come along and enlighten us.
2003-11-18 3:39 am
Anonymous
DTrace allows administrators/users to dynamically add a number of tracing probes that can monitor a number of various facilities in a process agnostic manner, and assign actions for when these actions “fire”. The number of uses for such a facility are unimaginable, everything from monitoring which users might be causing a particular resource contention to diagnosing software bugs or bad hardware.
2003-11-18 4:26 am
Anonymous
Is there a Linux equivalent of this command?
2003-11-18 6:07 am
Anonymous
The example might make a little more sense when viewed in a fixed-width font. (It’s actually not as cryptic as it may first appear.) This example may be a little hard to appreciate if you’ve never had a box with performance problems, but suffice it to say that DTrace allows one to quickly explore aberrant system behavior. It’s been of great use internally at Sun, so we’re excited to finally get it into customers’ hands…
2003-11-18 6:15 am
Anonymous
Here it is from google; the formatting is as intendend,
and makes it a lot easier to read.
http://groups.google.com/groups?selm=bp4fa1%24civ%241%4…
2003-11-18 6:30 am
Anonymous
IE, Safari and Opera render the article just fine with the <PRE> tag. It is NS4 and Mozilla that don’t.
2003-11-18 6:36 am
Anonymous
FYI: Safari renders it fine as well.
2003-11-18 8:36 am
Anonymous
Memory and disk activity monitoring in Solaris has always been fantastic. We recently discovered a problem with the Oracle shared memory configuration on one of our boxen, thanks to… that command that gives periodical snapshots on memory usage, page-ins and outs etc. what’s the name…
2003-11-18 11:24 am
Anonymous
The <pre> tags look just fine in Mozilla (good ol 1.4) for me. Never had a problem.
Are people getting the pre text in a variable with font??
2003-11-18 11:52 am
Anonymous
Obviously linux has had this for quite some time. Innovation _comes_ from linux (just generally not the mainstream kernel for obvious reasons).
http://www-124.ibm.com/linux/projects/dprobes/
It doesn’t interface with shell scripts though.
2003-11-18 2:29 pm
Anonymous
Obviously linux has had this for quite some time. Innovation _comes_ from linux (just generally not the mainstream kernel for obvious reasons).
Yes, I suppose that’s why Solaris has had massive CPU scalability, O(1) scheduling and countless other Linux “innovations” for years now. Give it a rest already, there’s very little Linux has brought to the table that the commercial UNIX variants didn’t already have ages ago.
2003-11-18 2:50 pm
Anonymous
I have used dtrace for a little bit now and it is a great tool for helping quickly isolate kernel bug or performance problems. It’s also a much better replacement to trace/truss/etc for userland. If you do any form of performance or kernel development on Solaris its a must check out tool.
2003-11-18 3:30 pm
Anonymous
the tool whose name I couldn’t remember. It’s fantastic!!
2003-11-18 4:58 pm
Anonymous
Obviously linux has had this for quite some time. Innovation _comes_ from linux (just generally not the mainstream kernel for obvious reasons).
fwiw, dtrace seems to be quite a bit beyond what dprobes can do. look at the dtrace doc in comparison to the dprobes docs; there isn’t much of a comparison
2003-11-18 7:14 pm
Anonymous
vmstat is certainly useful for certain classes of problems. Part of the difficulty has been mapping back numbers seen in vmstat to activity on the system.
For example, let’s suppose you saw a number in the “re” column of vmstat that you wanted to explore. (“re” denotes page reclaims from the cache list.) Prior to DTrace, you were basically stuck resorting to “black art” techniques. But with DTrace, it can be explored succinctly, e.g.:
# dtrace -n pgrec'{@[execname] = count()}’
dtrace: description ‘pgrec’ matched 2 probes
[ waiting a bit ]
^C
java_vm 2
cat 4
chmod 4
cut 4
ls 4
rm 5
grep 5
mkdir 7
mutt 12
find 12
nawk 18
inetd 19
rmdir 24
tar 45
mv 45
get_all2 76
mail.local 305
sendmail 416
iCald.pl6+RPATH 538
sched 2096
gzip 2201
imapd 4054
nfsd 9719
(Note that the above would normally line up properly — but I don’t think OSNews allows <pre> tags in posts.) The above output indicates that a plurality of our reclaims our coming from providing NFS service (the above was run on a big NFS server), with some interesting other players in the top five. The next avenue of exploration would probably depend on the expected behavior of the machine. (And given that I was just running this on a production NFS server at Sun, I think I’m going to go back and understand why we’re running reclaim-intensive gzips…)
2003-11-18 10:10 pm
Anonymous
>The <pre> tags look just fine in Mozilla (good ol 1.4) for me. Never had a problem.
>Are people getting the pre text in a variable with font??
No, that’s not what we mean. Mozilla and NS4 also show it with fixed fonts, but the spacing they use is not right. On IE, Opera and Safari the columns are all as they suppose to be, in line, while on NS4 and Mozilla some of the lines are not rendering as they supposed to.
2003-11-19 12:41 am
Anonymous
Wow, that was sure a useful post – thanks!
(the point I was trying to make originally, is that Solaris has so many tools of this kind, that make real life much easier).
2003-11-19 2:17 am
Anonymous
Wow, that was sure a useful post – thanks!
As it turns out, posting it was useful to me, too: the gzips on that machine were due to someone running a really nasty shell script that was copying and manipulating a large amount of data. Stopping the script lightened the load significantly (this is an NFS server that serves 1,000 home directories), and the user running the script has been asked to run it only during off-hours. A good DTrace object lesson, as the processes were short-lived (in part because the script was inefficient) and did not show up in a prstat…
2003-11-19 11:10 am
Anonymous
Hey Bryan,
I’ll have to check this out! I’ve used prstat and a lot of the other p-commands to figure things out (preap is fun:). This gives a lot more info on the processses, and that’s great. So this is standard in Solaris 10? Have they decided what to name it yet? Solaris 10 / SunOS 6.0?
2003-11-19 4:10 pm
Anonymous
Yeah, we think preap is very cute.
If you use the p-tools quite a bit, you’ll find a lot to like in DTrace. DTrace is standard in the next version of Solaris; you can download it now in Solaris Express:
http://wwws.sun.com/software/solaris/solaris-express/get.html
As for what it’s called: uname returns 5.10. As far as I’m concerned, everything else is just spin and hype…