The Big *BSD Interview

Matt Dillon, not the famous actor but the kernel/VM FreeBSD hacker also well known for writting the Dice C compiler for the Amiga, is here with us today for an in-depth interview about everything regarding FreeBSD 5.0. This is the OS that all the techie people are waiting for and presenting it as the most advanced, technically-speaking, free OS of today. Additionally, we also include two mini interviews with Theo de Raadt, the OpenBSD founder, and Jun-ichiro “itojun” Hagino from the NetBSD Core Team.

1. What goodies BSD 5.0 is going to bring us?


Matt Dillon: There are at least a dozen projects going on in parallel and we have ripped up and replaced a great deal of code all over the kernel. So much so that we recently decided to extend the 5.0 release date another year to give the many projects a chance to get out of development mode and into stabilization mode. Due to this a good chunk of -current’s features are also being slowly MFC’d into -stable. Not KSEs or SMPng though. I think Julian’s comment in his KSE commit message was something on the order of “X-MFC after: ha ha ha ha” ๐Ÿ™‚


KSEs and SMPng are the most visible projects (SMPng is being spear-headed by John Baldwin), but we have also done a great deal of work on the network stack, checksum offloading, network drivers (especially GigE drivers), devfs (which is now the default in -current), crypto-quality random number generation, scaleability, and new machine ports. We are actively porting to IA64, PowerPC, and Sparc64, and even though the Alpha is dying away our already-operational (in -stable) alpha port is being actively developed to ensure that our code remains 64 bit clean and to provide ground-breaking work for the other ports.


A good chunk of the work has already been MFCd to stable. For example, -stable can push 900+ MBits (120 MBytes/sec ‘netstat -in 1’) over
a TCP connection on a DELL2550 with GigE using *normal* sized frames (mtu 1500). That’s full saturation.


A great deal of filesystem work has also been completed. Filesystem snapshot support (for UFS+Softupdates) is progressing nicely and may
prove to be one of the most important new filesystem features in the 5.x system once it stabilizes. There are also two new native features
in UFS that have stabilized and have in fact been MFCd to -stable: dirpref and dirhash. dirpref comes directly from Grigoriy Orlov of
OpenBSD and reworks the way directories are layed-out on disk, resulting in huge directory and file stat/open/create/remove performance gains.
Some filesystem operations have improved by over 60x (6000%), and many of the common ones have improved over 400%. dirhash is a very
low-overhead in-kernel whole-directory hashing mechanism that radically improves the performance of directory operations.


There is much more. Some things, like dirhash and dirpref, can be easily MFC’d to -stable. Others, such as softupdate’s snapshot support,
probably won’t be due to major dependancies on other current-only projects.


2. Which is the one feature that you would like most to add to the BSD kernel?


Matt Dillon: I would like to see a native process and device-level descriptor migration capability. This isn’t a new idea (few ideas in computer architecture can ever be called ‘new’), but it is an idea whos time has come. I would like to see an ability to migrate processes as
well as their device-level state and couple that with external rerouting. So an I/O descriptor representing a TCP connection could
be migrated entirely off the original machine, for example.


Process migration is a good basis to support Q.O.S. and maintainance issues on todays platforms. As computing hardware becomes more
powerful and we run more services (and more connections, and more users) on any given box, the ability to migrate everything off a
box in order to take it down for maintainance without users noticing that you are doing it has become the ultimate IT grail. The
reason is simple: if you use a modern machine to its fullest potential and the system crashes, you are potentially interrupting
thousands of users rather then simply dozens. The concept of the ‘maintainance-window’ introduces the same problem, even with
load balancing and connection management and distribution technologies. The ‘maintainance-window’ concept is rapidly becoming unacceptable
in today’s full-on world.


In short, process migration would allow the open-source community to begin to provide Q.O.S. levels that only mainframes can provide
today. And if you didn’t hear me mention so-called ‘clustering’ solutions currently available from unnamed vendors, it’s because they
can’t actually deliver these things — not true Q.O.S. That’s my opinion, anyway. Using a cluster to hide the fact that the underlying
systems crash regularly is an extremely dangerous way to manage a computing environment.


I’m going to cheat a bit and also give you my #2 feature-wish: I want native filesystem replication. I don’t care a whit about
common server-based disk store: you don’t get reliability or scaleability that way. I want to see distributed (replicated,
not partitioned) filesystems that are transactionally coherent, to go along with the process-migration of course ๐Ÿ™‚

3. Soft-Update seems to be one step further than Journaling, it is the “modern” way of doing journaling, and FreeBSD has that feature. However, do you have plans to add to the FreeBSD fs some of the features found on XFS or JFS systems?


Matt Dillon: I would not characterize soft-updates as being a step-ahead of
journaling. At least not meta-data journaling. It’s just another
way of doing things. Even though soft-updates can theoretically
perform better then even meta-data journalling the plain fact of the
matter is that linear disk bandwidth has at least 25x the throughput
of a random seek/write. So journaling meta-data has a fairly small
performance impact if you can asynchronize everything *else*.
Softupdates works extremely well for UFS but the softupdates concept
can break down with other filesystems – it could very well be impossible
to implement softupdates-like operation on a filesystem which implements
directories as BTree’s or hashes, for example. On the other hand
softupdates can commit meta-data operations out of order while still
maintaining filesystem integrity, and it can do it in an infinitely
fine-grained fashion which naturally leads to better parallelism.
Journaled filesystems typically can’t do that. So the usefullness of
the theory depends heavily on what your goals are. For general purpose
work both theories work equally well.


Most filesystem-specific ‘super’ features are highly specialized and
not actually useful in the vast majority of system installations.
XFS has data zoning features and (at least under IRIX) the ability to
guarentee data stream latency and bandwidth. I can count the number
of applications that actually need those features on one hand with a
few fingers cut off. XFS’s major advantage, as with all journaled
filesystems, is instant crash recovery. All else being equal this is
a journaled filesystem’s biggest advantage for general purpose
computing but, even so, supplying the proper options to newfs when
creating a UFS filesystem can drop fsck times by an order of magnitude
on large filesystems. People using UFS are not really at that much
of a disadvantage. You can’t provide any sort of Q.O.S. if you depend
on fast crash recovery to be fast. Q.O.S. means having redundant
hardware at the very least. I can’t comment on JFS, I’ve never used it.


All of the BSD camps make stability priority #1 and performance
priority #2. Performance and fast crash recovery is completely
irrelevant if the filesystem corrupts the data or causes a crash!
This is especially true as HD capacities increase and filesystems
become larger. I have never quite understood why the Linux community
gets so revved up by the huge number of filesystems they support. As
if the sheer number combine together to provide a more effective system!
You don’t get reliability, performance, and long term stability by
playing with filesystems, you get it by choosing or focusing on one or
two filesystems that deliver those characteristics. Depending on
filesystem-specific ‘super’ features makes code non-portable and is not
usually a good idea.


In anycase, most BSD developers are happy with UFS. Oh, when I say UFS
I really mean UFS+FFS or UFS+FFS+SOFTUPDATES. UFS is not the ancient
creaking beast that some people have stereotyped it as. The basic
theory and structure was sound and is still sound to this very day.
Over the years we’ve fixed bugs (what few bugs we find), added
capability support, better caching, reorganized the layout in a
backwards-compatible fashion, re-introduced reblocking (basically
on-the-fly defragmentation), softupdates, snapshot support, etc etc etc.


4. After the open source bubble bursted recently, a lot of companies seized support and stoped contributing code to both Linux or BSD. How has this affected the BSD development?


Matt Dillon: It creates a short term disruption for the people involved in regards
to their ability to contribute but I do not believe company layoffs
will have any effect on the open-source movement itself or on Linux
and BSD development in the long term. The biggest contributors to
open-source are not staple employees of a company who are hired
specifically to interact with the open-source community. They are
people who have a real interest and love of open-source who happen to
be working at a company in a leverageable position.


While there have been BSD related layoffs, it’s nothing that was
unexpected and has had much less of an impact on us then I’m sure the
huge number of linux-centric companies going bust has had on the Linux
psyche. All I can say is: It aint our (the open-source community)
fault. Most of the linux centric companies were leeching off the
linux name, and those that weren’t didn’t fail because they were
using Linux, they failed because they didn’t have a business model
with a chance in hell of (ever) going profitable. Open-source operates
behind the scenes far more then it operates in the public eye, and
it’s hard to sell support to hackers who actually have *fun*
trying to figure out a problem. In some respects Linux and the BSDs
are poor commercialization candidates because they are *too* good…
that they simply do not require the level of support that something
like Windows-NT or Oracle might require in a back-office setting.


Open source has created far more disruption and change in commercial
interests then the other way around. I think it has been for the
better, though I’m sure many commercial entities (such as MS) aren’t
too happy about being forced to be more honest with their customers.
(hmm… actually I think they still haven’t learned, and look at the
effect. MS has gotten its fingers burned so many times in their
dirty war against open-source that even long-time commercial partners
don’t believe what they say any more!).

5. How do you feel that Linux got most of the attention the last couple of years, and it was able to move a bit faster to the desktop arena? Is the Desktop market interest at all the FreeBSD people?


Matt Dillon: I find it to be an interesting exercise in social engineering,
economics, and psychology. Oh, you want to know what I *really*
think?


I think biggest winner here is open-source. A great deal of what
people label as ‘Linux’ isn’t actually Linux. It’s open-source
that compiles just as easily on FreeBSD (*without* linux emulation)
as it does on Linux. Take GNOME and KDE for example. No linux emulation
necessary there! The areas where FreeBSD has problems are almost entirely
relegated to commercial binary-only distributions. Now, that said,
Linux is certainly the largest driver of interest that leads to the
development of many of these projects. I don’t think we would have
GNOME or KDE without Linux. As a driver of interest Linux has earned
its place at the top of heap.


In regards to the desktop… well, I’m not sure exactly what you are
asking. Both Linux and FreeBSD are in the same boat there… the only
way to drive desktop acceptance is to ship machines pre-installed with
the OS (whatever OS) and preconfigured with a desktop so when you turn
the thing on, you are ready to rock. The only way to do that is for
the PC vendors to pre-install Linux (or FreeBSD, or whatever).


Other then that common issue, there really is no difference between
FreeBSD and Linux in regards to the desktop. Oh, we could integrate the
sound a little better and it would be nice to get a native OpenGL
implementation working, but everything else is already there, because
both platforms are running the same GUI software.


6. Please explain to us what SMPng (next-generation symmetric multi-processing) and KSE (kernel scheduler entities) are, which are features to be found on the BSD-5-Current.


Matt Dillon: SMPng is FreeBSD’s fine-grained mutex, interrupt threading, and
Giant-removal implementation. Potentially kernel pre-emption is
also part of the equation but the jury is still out on that.
The purpose is to be able to have several mainline processes and/or
interrupts operating in kernel mode simultaniously. This is the primary
scaleability issue in any SMP system. The work being done here is
roughly compareable to the SMP work being done in Linux. Linux is
about a year ahead of us but both Linux and the BSDs have a great
deal of work to do to catch up with Solaris.


KSE is a totally new (but old idea) way of implementing userland threads.
The idea here is two fold: (1) to remove any requirement that userland
code understand which system calls might block and which system
calls might not block. (2) to do all primary thread scheduling and
switching in userland, where any given cpu can switch between threads
with approximately the same overhead as a userland subroutine call.


With KSEs if a userland process makes a system call which blocks, the
kernel will detach the kernel context (which is now blocked) and return
directly to the user mode scheduler using an ‘upcall’. The userland
scheduler can then immediately switch to another thread. Another
system call will be given a new, fresh, KSE to play with. The blocked
kernel context runs completely asynchronously from the userland process
until it finishes and can potentially run concurrently with other
detached KSEs for the same process. When a KSE completes the kernel
notifies the userland scheduler allowing the userland scheduler to
reschedule the ‘blocked’ thread which is now ‘returning’ from the
system call that originally blocked.


The essential difference between KSEs and both select/kqueue-based
threads and rfork based threads is that with KSEs you get all the
parallelism of the SMP box and all the power of a userland-only context
switch between threads (read: *very* fast switch times) without
*any* of the kernel overhead. A program can literally be running
thousands of threads with no significant kernel overhead. Only
blocked system calls eat kernel resources. In addition to this,
we can manage kernel resources in the face of thousands of threads
by limiting the ‘pool’ of KSEs we assign to any given process or user
or whatever. So if 500 of those 1000 threads block in a syscall we
just get a little less cpu-efficient and don’t blow out kernel memory.


Currently FreeBSD can use both select/kqueue and rfork (linux-style)
threading. KSEs bring us to the next level.

7. From the technical point of view, how would you rate the Linux 2.4 kernel compared to BSD’s?


Matt Dillon: I don’t know enough about recent linux kernels to be able to rate
them, nor would it be P.C. I do follow the VM work being done in
Linux and in particular Rik van Riel’s work. I think Linux is going
through a somewhat painful transition as it moves away from a
Wild-West/Darwinist development methodology into something a bit
more thoughtful. I will admit to wanting to take a clue-bat to
some of the people arguing against Rik’s VM work who simply do not
understand the difference between optimizing a few nanoseconds out
of a routine that is rarely called verses spending a few extra cpu
cycles to choose the best pages to recycle in order to avoid
disk I/O that would cost tens of millions of cpu cycles later on.
It is an attitude I had when I was maybe 16 years old… that every
clock cycle matters no matter how its spent. Bull!


8. How is the “relationship” between the FreeBSD programmers and the OpenBSD/NetBSD ones? Do you share code, opinions, chatting regularly? Or all these BSD projects are completely independant to each other?


Matt Dillon: The BSD groups are like high school social circles. No, really!
That’s the best analogy I can think of! Many developers focus
on just their little clique but a good chunk run in multiple
circles. There are developers that maintain the same driver code
across several BSD distributions. There are developers who focus
their work in one BSD distribution but have ties to developers
in others. If the work is interesting enough, such as the ‘dirpref’
work, developers that focus on coding in other BSD distributions
will pick up the patch set and bring it in. That is how FreeBSD
got the dirpref code. Kirk imported it from OpenBSD into FreeBSD-current
and I MFC’d it to -stable after it had been proven out in -current.


In many respects this development methodology gives us the best of
both worlds. Developers are free to focus on the distribution they are
most familar with and if the work is interesting enough it gets several
eyes from the other distributions who not only port the code in, but
also review it. Testing can wind up occuring in all the distributions
simultaniously and with something like ‘dirpref’, if someone finds a
bug it will almost certainly wind up being fixed in the other
distributions within a few days. Security bugs are independantly
verified but often the fix is common to all the BSDs and no duplicate
work need occur. There is constant borrowing going on between the
BSDs and even between BSD and Linux, especially in regards to driver
code.


9. What is your opinion on .NET and do you think that it may be possible that .NET change the OS “map” as we know it?


Matt Dillon: I believe .NET is Vapor. It’s a marketing term dreamed up by Microsoft
that will magically morph into whatever Microsoft eventually winds up
delivering. MS announces grandiose ideas with cute catch phrases
all the time, and as with any good vapor there is always some
basis in truth (if only a little pinprick). The reality is a little
different though… remember, these are the people that hyped windows-ME
up the wazoo and all we got out of it was a speech-synthesized windows
installation wizard! These are the people that called NT the unix-killer
and told people it was as reliable as UNIX. .NOT is probably a more
descriptive term for .NET. My guess is that it will turn into
Microsoft-proprietary rent-a-service glue, and that it will introduce
an order of magnitude more security issues then IIS.


10. Some say that FreeBSD has the best VM ever, whem compared to any other Operating System. Do you think that there is still space for improvement and are there still features to be added?


Matt Dillon: I think we made great progress stabilizing the VM system and working
out performance issues related to machine scaling in the -4.x series
of FreeBSD releases. The machines have proven to be great workhorses
in a wide range of applications and are able to provide the long term
stability and performance required by its users. Generally speaking,
the technology behind the VM system is quite sound and does not need
much more in the way of improvement. Obviously in -5.x we will be
multi-threading pieces of it for SMPng, but the core algorithms appear
to extend cleanly to MP and 64 bit platforms and we do not expect to
have to make any fundamental changes. There is always room for
improvement, of course! While we are likely to stand pat with the
VM core in early 5.x releases, there is a great deal of work planned
to improve the I/O and buffer cache subsystems a little later on.
My personal goal is to eventually remove the buffer cache entirely
or at least morph it into nothing more complex than an I/O staging
subsystem.

1. How is the “relationship” between the NetBSD programmers and the OpenBSD/FreeBSD ones? Do you share code, opinions, chatting regularly? Or all these BSD projects are completely independant to each other?


Itojun: Yes we do chat with each other and share code/opinions. Some of the developers do have commit access (can modify source code tree) for multiple BSDs.


2. Do you incorporate code to NetBSD from OpenBSD or FreeBSD when important changes are made to these OSes?


Itojun: Yes, but depending on the characteristics of the changes. If it is a one-line change for security issue, we’d integrate them right away. If it is a big feature addition, we review them carefully and sometimes do integrate the changes, sometime do not (we get similar changes from others, we implement it ourselves, or integrate it with lot of improvements).


3. What goodies the next version of NetBSD is scheduled to bring us?


Itojun: SMP (for multiple platforms!) and fine-grained thread support are the biggest targets we are attacking. More platforms support, of course.


4. NetBSD’s goal is to port the OS to as many platforms as it can. Which platforms are still needed NetBSD to be ported and it is a priority to do so?


Itojun: Sony PlayStation2 (port exists, needs integration).

1. How is the “relationship” between the OpenBSD programmers and the FreeBSD/NetBSD ones? Do you share code, opinions, chatting regularly? Or all these BSD projects are completely independant to each other?


Theo de Raadt: There are no formal relationships of any kind. That said, since it is a free world, there are numerous developers who do talk to their counterparts in the other group. Even when that does not happen, public mailing lists and the mainstay product of our projects — source code — is completely visible. What more could one want?


2. Do you incorporate code to OpenBSD from NetBSD or FreeBSD when important changes are made to these OSes?


Theo de Raadt: Sure, why wouldn’t we?


3. What goodies the next version of OpenBSD is scheduled to bring us?


Theo de Raadt: First off, I should reiterate what I have been saying for 5 years: OpenBSD development is not revolutionary, but evolutionary. That means that between one release and another, not a lot of big things happen, but instead we should view it as a series of about 10,000 – 20,000 small changes. Over a series of OpenBSD releases, this amounts to a very big deal. Any release from 2 years back feels very different from the current codebase we have, but actually labelling the big changes between two consecutive releases is very difficult. Thousands of these changes are bug fixes, minor conformance improvements… things which I would argue matter MUCH MORE than “new
features”.


That said, this next release has one big thing that people are waiting to try out: We have written a whole new packet filter / nat engine, and fully integrated it into the system. People who are used to ipf will find that pf is much like ipf, but has some improvements which we have always wanted to make (and which the old ipf license had blocked us from doing).


The alpha port has been significantly improved to support many of the higher end models (kind of funny considering the entire platform is now end of lifed…), and we will be releasing our first ultrasparc beta.


Other than that and the thousands of little fixes and improvements everywhere, and probably a bunch of other things I have already forgotten,


4. OpenBSD’s goal is to bring ultimate security to a server. By patching the holes and only accepting proved software do you think that it keeps your development moving slow from implementing something new to the OS level and
releasing it pretty fast?


Theo de Raadt: No, I think it does not affect or release schedule or development process.

27 Comments

  1. 2001-10-09 12:38 am
  2. 2001-10-09 1:36 am
  3. 2001-10-09 2:04 am
  4. 2001-10-09 2:47 am
  5. 2001-10-09 3:27 am
  6. 2001-10-09 4:01 am
  7. 2001-10-09 4:01 am
  8. 2001-10-09 4:59 am
  9. 2001-10-09 6:02 am
  10. 2001-10-09 6:05 am
  11. 2001-10-09 7:28 am
  12. 2001-10-09 7:39 am
  13. 2001-10-09 8:03 am
  14. 2001-10-09 8:54 am
  15. 2001-10-09 12:29 pm
  16. 2001-10-09 7:54 pm
  17. 2001-10-09 7:58 pm
  18. 2001-10-10 5:02 am
  19. 2001-10-10 7:46 am
  20. 2001-10-10 12:05 pm
  21. 2001-10-10 6:15 pm
  22. 2001-10-11 4:48 am
  23. 2001-10-12 3:54 am
  24. 2001-10-16 8:38 pm
  25. 2001-10-22 11:39 am
  26. 2002-02-02 3:26 am