FreeBSD System Disk Mirroring with GEOM Step-by-Step

Guest post by Ralf S. Engelschall 2005-01-11 FreeBSD 9 Comments

I was asked by co-workers how to establish a RAID-1 (mirror) for the system disk partitions of a FreeBSD 5 system.Giving an answer wasn’t easy for me because in FreeBSD 5 we have multiple possible solutions, each with its pros and cons, and for neither one it is obvious how to deploy it step-by-step without destroying the existing system. Especially if the constraint is to migrate the production system fully remotely from the plain single-disk to the mirrored two-disk setup. I have chosen the newer GEOM mirror class because IMHO it currently is both the most flexible and robust solution and especially works for the boot partition without tricks, too. Nevertheless the necessary steps are not obvious for a FreeBSD administrator, so I’ve written them down in detail as one of the possible answers to this FAQ about mirroring system disk partitions.

9 Comments

2005-01-11 1:18 pm
Anonymous
I was already looking for alternatives to my page-faulting vinum configuration on 5.3-release. So this comes in quite handy! Probably will switch to gmirror.
2005-01-13 5:06 am
Anonymous
We just lost one of our servers, which hadn’t been RAID’ed, and now we’re going to use your notes to rebuild the new system. Excellent!
2005-01-13 7:21 am
Anonymous
Doesn’t neem to work on 5.3_RELENG-p2. When I shutdown -r now, the array comes up degraded with da0s1 disconnected. I see there’s some patches around for this, but as near as I can tell, they won’t go into 5.3_RELENG, as they are too ‘hacky.’
I’m wondering now what I should do. It works well while it is running, but the reboot breaks it. Should I just go with it and try to figure out the right patches to apply or scrap it? If scrapping, then what would be the procedure to back out of this?
Hmm….
2005-01-13 7:25 am
Anonymous
One other thing I noticed wih the second procedure for the mirror-on-a-slice. The insructions tell you to zero out the MBR of the first disk, don’t do this, you’ll then have to rescue and boot0cfg the first stage boot loader back onto the first disk. This works for the first procedure, as the whole disk, including MBR are mirrored, but zeroing the MBR in the second scenario is wrong, as it won’t get re-written from the mirror.
2005-01-13 8:10 am
Anonymous
> Doesn’t neem to work on 5.3_RELENG-p2. When I shutdown -r
> now, the array comes up degraded with da0s1 disconnected.
Perhaps you forgot the swapoff=YES in /etc/rc.conf?
If not, my recommendation is to just upgrade to 5.3-STABLE
which has this plus many other small GEOM mirror bugs fixed
and where the RAID-1 works flawlessly for me.
2005-01-13 8:13 am
Anonymous
> One other thing I noticed wih the second procedure for the
> mirror-on-a-slice. The insructions tell you to zero out the
> MBR of the first disk, don’t do this, you’ll then have to
> rescue and boot0cfg the first stage boot loader back onto
> the first disk. This works for the first procedure, as the
> whole disk, including MBR are mirrored, but zeroing the MBR
> in the second scenario is wrong, as it won’t get re-written
> from the mirror.
Yes, correct. My fault. I’ve now fixed it by added the missing
step where the MBR and slice is re-created. Just not zeroing
the first disk works only if you initially created the slice on the second disk with exactly the same size it is on the first disk. Because usually this isn’t the case all the time, it’s better to zero the first disk and re-establish a fresh slice (now with _exactly_ the same size as the one on the second disk) there. Thanks for catching this bug and giving feedback.
2005-01-13 10:16 am
Anonymous
>Perhaps you forgot the swapoff=YES in /etc/rc.conf?
Alas, no, that directive exists in that file.
>If not, my recommendation is to just upgrade to 5.3-STABLE
>which has this plus many other small GEOM mirror bugs fixed
>and where the RAID-1 works flawlessly for me.
Well, this is eventually going to be a production box at a remote location, so I’d rather it be on the RELENG track. I don’t mind playing around with it now, and even have it patched for awhile in production, but in the long run, if it isn’t going to work right in this release, I guess I should just back out of it for now.
Man, I’d really like to use it.
I’ve a single partition filling the whole slice, (da0s1a) mounted as ‘/’, not split up into /, /usr, /var, could that be a problem?
I’m sorry, the machine is out of reach right now, but I was getting on error saying something about an error writing metadata to /dev/da0s1, then, disconnected da0s1 from the array.
I wonder if going the whole-disk (da0,da1) route would get around this problem.
Another hmmm….
2005-01-15 5:45 am
Anonymous
I’ve got it to work, more or less, with a clean cvsup of RELENG 5.3-p4 and the patch mentioned here, with the mount_delay set to ’10’ as the poster stated:
http://lists.freebsd.org/pipermail/freebsd-geom/2004-October/000338…
and the ‘Whole disk’ option (Approach #1).
Approach #2 suffered from the problem that da0s1 would always get disconnected because GEOM couldn’t ‘update the metdata for da0s1.’ It would disconnect the provider da0s1 and not reconnect it. I had to ‘forget’ and ‘insert’ da0s1 for it to be recognized again.
With the current set up, I’m no longer getting the metadata update error, both providers get connected. I’m still seeing a problem though that now da1 wants to rebuild after a reboot because gmirror first tries to connect da1s1 which I originally had configured for gm0s1, the ‘mirror-on-a-slice’ approach. It looks like somehow some artifacts from the original setup are getting in the way of the new setup. While the original gm0s1 isn’t listed anywhere, I still get messages that gmirror is attempting to connect da1s1 to it, then destroys it when it cannot connect to gm0s1, because it doesn’t exist. After this happens and gm0s1 is destroyed, then it rebuilds gm0 with da1 and everything runs Ok. Just need to figure out how to get rid of that original gm0s1 for good everywhere and it should be good to go.
2005-01-17 3:10 pm
Anonymous
So I ended up only having to add:
kern.geom.mirror.timeout=0
in /boot/loader.conf and it works fine now.