Now that Solaris Express 11/03 is publicly available for download, Bryan Cantrill (Solaris Kernel Development) from the DTrace engineering team wanted to take a moment to post some slightly more elaborate examples of this new technology.The following was posted on the Solaris newsgroup:
Before we get started, note that there is quite a bit of publicly available documentation on DTrace at BigAdmin:
http://www.sun.com/bigadmin/content/dtrace
Also, note that this post isn’t meant to be a comprehensive tutorial — it’s really more of a quick demonstration of how DTrace might be used to solve problems that you may come across in your day-to-day workings with Solaris.
On with the show: let’s say you’re looking at mpstat(1) output on your multiuser server. You might see something like this:
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
12 1 27 3504 338 206 765 27 114 65 1 337 9 19 41 31
13 1 19 5725 98 68 723 22 108 120 0 692 3 17 20 61
14 0 57 3873 224 192 670 22 75 86 1 805 7 10 35 48
15 36 7 1551 42 28 689 6 68 59 0 132 2 9 35 54
16 14 7 7209 504 457 1031 37 125 244 0 459 6 30 4 60
17 5 5 4960 150 108 817 37 98 154 0 375 6 26 6 62
18 5 6 6085 1687 1661 741 60 76 248 0 434 3 33 0 64
19 0 15 10037 72 41 876 23 100 291 1 454 2 19 9 71
20 12 5 5890 746 711 992 32 122 216 2 960 10 33 4 53
21 60 5 1567 729 713 467 15 80 59 0 376 2 35 10 53
22 0 6 4378 315 291 751 17 84 142 1 312 3 16 1 80
23 0 6 12119 33 3 874 20 82 384 1 513 4 24 11 62
And well you may wonder (as perhaps you often have) — what the hell is
causing all of those cross calls, anyway? (Cross calls appear in the “xcal”
column; see mpstat(1).)
Using DTrace, investigating this is a snap:
# dtrace -n xcalls'{@[execname] = count()}’
dtrace: description ‘xcalls’ matched 4 probes
[ letting this run for a few seconds ] ^Cmozilla-bin 1
lockd 1
in.mpathd 2
nsrmmd 5
grep 6
chmod 6
cat 6
nwadmin 13
ls 24
in.tftpd 28
nsrindexd 34
fsflush 38
cut 42
find 42
mkdir 66
rm 76
ipop3d 78
scp 79
inetd 96
dtrace 111
nawk 118
imapd-simmonmt 126
rmdir 132
sshd 138
rpc.rstatd 159
mv 398
save 1292
gzip 1315
get_all2 1678
sched 1712
nfsd 3709
tar 27054
So the bulk of these are coming from tar(1) processes. To explore the cross calls coming from tar specifically, we could add a predicate to the above:
# dtrace -n xcalls’/execname == “tar”/{@[pid] = count()}’
dtrace: description ‘xcalls’ matched 4 probes
[ again letting this run for a few seconds ] ^C938339 507
938415 784
938229 2438
938390 3282
938363 6464
937067 13984
But now we notice a strange thing: none of these processes seems to bearound by the time we’re back on the shell, e.g.:
# pflags 937067
pflags: cannot examine 937067: no such process or core file
So is someone launching a bunch of short-lived tar processes? To explore this as a rough first cut, we could just print the pid whenever a tar
process is launched:
# dtrace -n syscall::exec\*:return’/execname == “tar”/{trace(pid)}’
CPU ID FUNCTION:NAME
18 112 exece:return 954538
21 112 exece:return 954519
17 112 exece:return 954584
21 112 exece:return 954579
21 112 exece:return 954607
22 112 exece:return 954632
21 112 exece:return 954658
20 112 exece:return 954684
20 112 exece:return 954705
15 112 exece:return 954727
14 112 exece:return 954763
16 112 exece:return 954775
12 112 exece:return 954798
18 112 exece:return 954818
22 112 exece:return 954873
22 112 exece:return 954889
23 112 exece:return 954893
14 112 exece:return 954916
14 112 exece:return 954971
…
Seeing that this is happening with some frequency (the above has output every second or so), we want to explore more generally what is being exec(2)’d on the machine:
# dtrace -n syscall::exec\*:entry'{@[copyinstr(arg0)] = count()}’
[ letting this run for a bit ] ^C/usr/lib/mail.local 1
/usr/bin/true 1
/usr/bin/pgrep 1
/usr/bin/wc 1
/usr/bin/touch 1
/usr/bin/sleep 1
/usr/sbin/modinfo 4
/bin/awk 4
/bin/sh 4
/usr/bin/chmod 14
/usr/bin/cat 14
/usr/bin/tar 14
./do1 14
/usr/bin/cut 15
/usr/bin/grep 15
/usr/bin/find 15
/usr/bin/ls 15
/usr/bin/mv 15
/usr/bin/rm 16
/usr/bin/mkdir 17
/usr/bin/gzip 29
/usr/bin/nawk 44
/usr/bin/rmdir 48
/home/ellenp/rmdir 48
/home1/netscape/rmdir 48
/usr/dt/bin/rmdir 96
/usr/openwin/bin/rmdir 96
There’s a bunch that’s interesting here — but for now we’re interested in those tar processes. In particular, when someone is exec’ing a tar
process, we want to know who exactly they are. To do this, we write the following DTrace script, and call it “whotar.d”:
syscall::exec*:entry
/copyinstr(arg0) == “/usr/bin/tar”/
{
@[execname] = count();
}
Running this points us straight to the culprit:
# dtrace -s whotar.d
[ again letting this run for a bit ] ^Cget_all2 22
So get_all2 is the problem. We can now use traditional tools to hone in
on this:# ptree `pgrep get_all2`
100578 /usr/sbin/inetd -s
900870 in.rlogind
900872 -csh
901382 /bin/sh get_all3
901383 /bin/sh ./get_all2 80/80acc080 90/00fccc90
959045 /bin/sh ./get_all2 80/80acc080 90/00fccc90
959046 tar xvf – explorer.80c23d8e.ptgw02is-2003.10.27.09.35/README
959047 gzip -dc /net/pamer.central/proactive/rawdata/8e/80c23d8e
901384 /bin/sh ./get_all2 90/80bdbb90 a0/1274eda0
959565 /bin/sh ./get_all2 90/80bdbb90 a0/1274eda0
959566 tar xvf – explorer.8089ed9d.curly-2002.05.06.10.00/README exp
959567 gzip -dc /net/pamer.central/proactive/rawdata/9d/8089ed9d
901385 /bin/sh ./get_all2 a0/80aabda0 b0/1d6c05b0
959619 mv disks/diskinfo disks/format.out disks/mount.out etc/release
901386 /bin/sh ./get_all2 b0/80b264b0 c0/335419c0
959612 /bin/sh ./get_all2 b0/80b264b0 c0/335419c0
959613 tar xvf – explorer.82a86abd.seq22-2002.05.31.21.49/README exp
959614 gzip -dc /net/pamer.central/proactive/rawdata/bd/82a86abdSo that explains where the tar processes (and thus the cross calls) are coming
from! And just how short-lived _are_ these tar processes? We can answer this
question with DTrace, natch. To do this, we’ll use an associative array and
the “timestamp” variable. We are going to linearly quantize the result from 0
seconds to 60 seconds. We’ll also use a “tick-1sec” probe to automatically
stop after a minute. The script (which we call “longtar.d”):syscall::exec*:return
/execname == “tar”/
{
start[pid] = timestamp;
}fbt::exit:entry
/start[pid]/
{
@[“tar time, in seconds”] =
lquantize((timestamp – start[pid]) / 1000000000, 0, 60);
start[pid] = 0;
}tick-1sec
/n++ >= 60/
{
exit(0);
}Running this:
# dtrace -s longtar.d
dtrace: script ‘longtar.d’ matched 4 probes
CPU ID FUNCTION:NAME
23 38110 :tick-1sectar time, in seconds
value ————- Distribution ————- count
< 0 | 0 0 |@ 1 1 |@@@@@ 4 2 |@@@@@@@@@@@ 8 3 |@@@@@@@ 5 4 |@@@@@@@ 5 5 |@@ 2 6 | 0 7 |@@@@ 3 8 | 0
This highlights just how difficult this problem would be to find using
traditional (process-centric) tools: each tar process is only alive
for (roughly) two seconds! And yet, taken together, the tar processes
are accounting for the vast majority of cross calls on the system.
This is a good example of using DTrace because the problem here is _not_ a
single application — it’s an insurgent army of tar processes that are
slipping away before they can be noticed. Indeed, running prstat(1) on this
machine doesn’t show tar — or _any_ of the get_all2 progeny processes — as
top resource-consuming processes. Using DTrace, however, one can quickly
root-cause aberrant systemic behavior of this nature.
Here you can download Solaris Express 11/03 and try DTrace yourself!
Enjoy,
Bryan Cantrill, Mike Shapiro and Adam Leventhal
DTrace Engineering Team
I think I am not knowlegeable enough to understand what’s going on with this piece of news.
My best guess is that it describes a way of determining what’s eating up system resources, after the fact.
Neither am I. We can only hope that someone will come along and enlighten us.
DTrace allows administrators/users to dynamically add a number of tracing probes that can monitor a number of various facilities in a process agnostic manner, and assign actions for when these actions “fire”. The number of uses for such a facility are unimaginable, everything from monitoring which users might be causing a particular resource contention to diagnosing software bugs or bad hardware.
Is there a Linux equivalent of this command?
The example might make a little more sense when viewed in a fixed-width font. (It’s actually not as cryptic as it may first appear.) This example may be a little hard to appreciate if you’ve never had a box with performance problems, but suffice it to say that DTrace allows one to quickly explore aberrant system behavior. It’s been of great use internally at Sun, so we’re excited to finally get it into customers’ hands…
Here it is from google; the formatting is as intendend,
and makes it a lot easier to read.
http://groups.google.com/groups?selm=bp4fa1%24civ%241%4…
IE, Safari and Opera render the article just fine with the <PRE> tag. It is NS4 and Mozilla that don’t.
FYI: Safari renders it fine as well.
Memory and disk activity monitoring in Solaris has always been fantastic. We recently discovered a problem with the Oracle shared memory configuration on one of our boxen, thanks to… that command that gives periodical snapshots on memory usage, page-ins and outs etc. what’s the name…
The <pre> tags look just fine in Mozilla (good ol 1.4) for me. Never had a problem.
Are people getting the pre text in a variable with font??
Obviously linux has had this for quite some time. Innovation _comes_ from linux (just generally not the mainstream kernel for obvious reasons).
http://www-124.ibm.com/linux/projects/dprobes/
It doesn’t interface with shell scripts though.
Obviously linux has had this for quite some time. Innovation _comes_ from linux (just generally not the mainstream kernel for obvious reasons).
Yes, I suppose that’s why Solaris has had massive CPU scalability, O(1) scheduling and countless other Linux “innovations” for years now. Give it a rest already, there’s very little Linux has brought to the table that the commercial UNIX variants didn’t already have ages ago.
I have used dtrace for a little bit now and it is a great tool for helping quickly isolate kernel bug or performance problems. It’s also a much better replacement to trace/truss/etc for userland. If you do any form of performance or kernel development on Solaris its a must check out tool.
the tool whose name I couldn’t remember. It’s fantastic!!
Obviously linux has had this for quite some time. Innovation _comes_ from linux (just generally not the mainstream kernel for obvious reasons).
fwiw, dtrace seems to be quite a bit beyond what dprobes can do. look at the dtrace doc in comparison to the dprobes docs; there isn’t much of a comparison
vmstat is certainly useful for certain classes of problems. Part of the difficulty has been mapping back numbers seen in vmstat to activity on the system.
For example, let’s suppose you saw a number in the “re” column of vmstat that you wanted to explore. (“re” denotes page reclaims from the cache list.) Prior to DTrace, you were basically stuck resorting to “black art” techniques. But with DTrace, it can be explored succinctly, e.g.:
# dtrace -n pgrec'{@[execname] = count()}’
dtrace: description ‘pgrec’ matched 2 probes
[ waiting a bit ]
^C
java_vm 2
cat 4
chmod 4
cut 4
ls 4
rm 5
grep 5
mkdir 7
mutt 12
find 12
nawk 18
inetd 19
rmdir 24
tar 45
mv 45
get_all2 76
mail.local 305
sendmail 416
iCald.pl6+RPATH 538
sched 2096
gzip 2201
imapd 4054
nfsd 9719
(Note that the above would normally line up properly — but I don’t think OSNews allows <pre> tags in posts.) The above output indicates that a plurality of our reclaims our coming from providing NFS service (the above was run on a big NFS server), with some interesting other players in the top five. The next avenue of exploration would probably depend on the expected behavior of the machine. (And given that I was just running this on a production NFS server at Sun, I think I’m going to go back and understand why we’re running reclaim-intensive gzips…)
>The <pre> tags look just fine in Mozilla (good ol 1.4) for me. Never had a problem.
>Are people getting the pre text in a variable with font??
No, that’s not what we mean. Mozilla and NS4 also show it with fixed fonts, but the spacing they use is not right. On IE, Opera and Safari the columns are all as they suppose to be, in line, while on NS4 and Mozilla some of the lines are not rendering as they supposed to.
Wow, that was sure a useful post – thanks!
(the point I was trying to make originally, is that Solaris has so many tools of this kind, that make real life much easier).
Wow, that was sure a useful post – thanks!
As it turns out, posting it was useful to me, too: the gzips on that machine were due to someone running a really nasty shell script that was copying and manipulating a large amount of data. Stopping the script lightened the load significantly (this is an NFS server that serves 1,000 home directories), and the user running the script has been asked to run it only during off-hours. A good DTrace object lesson, as the processes were short-lived (in part because the script was inefficient) and did not show up in a prstat…
Hey Bryan,
I’ll have to check this out! I’ve used prstat and a lot of the other p-commands to figure things out (preap is fun:). This gives a lot more info on the processses, and that’s great. So this is standard in Solaris 10? Have they decided what to name it yet? Solaris 10 / SunOS 6.0?
Yeah, we think preap is very cute.
If you use the p-tools quite a bit, you’ll find a lot to like in DTrace. DTrace is standard in the next version of Solaris; you can download it now in Solaris Express:
http://wwws.sun.com/software/solaris/solaris-express/get.html
As for what it’s called: uname returns 5.10. As far as I’m concerned, everything else is just spin and hype…