Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Absolute BSD - The Ultimate Guide To FreeBSD (2002).pdf
Скачиваний:
31
Добавлен:
17.08.2013
Размер:
8.15 Mб
Скачать

code segment =

base 0x0, limit 0xfffff, type 0x1b

=

DPL 0, pres 1, def32 1, gran 1

processor eflags =

interrupt enabled, resume, IOPL=0

current process =

5 (syncer)

interrupt mask =

bio

trap number =

12

panic: page fault

 

...............................................................................................

If you're an inexperienced sysadmin, messages like this can turn your blood cold, but don't fret yet. FreeBSD sometimes gives somewhat friendly messages that describe what's wrong, which give you a specific place to start looking, or at least a term to Google. I've seen panics that give very specific instructions on kernel options that should be set to prevent their recurrence. Other panic messages, like this one, are much more puzzling.

The only word that looks even vaguely familiar in this panic message is the fourth line from the bottom, where we see that the current process is something called "syncer". Most people don't know what the syncer is, and most of those who recognize it know better than to try to fix it. The "mysterious panic" is among the worst situations you can have in FreeBSD.

Responding to a Panic

If you get a system panic, the first thing to do is get a copy of the panic message. Since FreeBSD is no longer running at this point, the standard UNIX commands will not work–the system won't let you SSH in or out, and even simple commands like script(1) will not work. The console might be utterly locked up, or it could be in a debugger. In either event, you need the error message.

The first time I received an error message like the preceding one, I scrambled for paper and pen. Eventually I found an old envelope and a broken stub of pencil, and crawled between the server rack and the rough brick wall. I balanced the six−inch black−and−white monitor that I'd dragged back there in one hand, while with my other hand I held the old envelope against the wall. Apparently I had a third hand to copy the panic message to the envelope, because it somehow got there. Finally, scraped and cramped, I slithered back out of the rack and victoriously typed the whole mess into an email. Surely the crack FreeBSD developers would be able to look at this garbage and tell me exactly what had happened.

After all of this struggle, the initial response was quite frustrating: "Can you send a backtrace?"

I've seen many, many messages to a FreeBSD mailing list reporting problems like this, and they always get this same response. Most of the people who send these messages are never heard from again, and I understand exactly how they feel. When you've been dealing with a server that crashes, or (worse) keeps crashing, the last thing you want to do is reconfigure it.

The problem with the panic message on my envelope was that it only gave a tiny scrap of the story. It was so vague, in fact, that it was like describing a stolen car as "red, with a scratch on the fender."If you don't give the car's make, model, and VIN number or license plate, you cannot expect the police to make much headway. Similarly, without more information from your crashing kernel, the FreeBSD developers can't catch the criminal code.

There's a simple way around this problem, however: Set up your server to handle a panic before the

453

panic happens. Set it up when you install the server. That way, you'll get a backtrace automatically if it ever crashes. This might seem like a novel idea, and it certainly isn't emphasized in the FreeBSD documentation, but it makes sense to be ready for disaster. If it never happens, well, you don't have anything to complain about. If you get a panic, you're ready and you'll be able to present the FreeBSD folks with a complete debugging dump the second a problem appears.

Prerequisites

prepare for a kernel panic, you need to have the system source code installed. You'll also need one (or more) swap partitions that is at least 1MB larger than your physical memory, and preferably twice as large as your RAM. If you have 512MB of RAM, for example, you need a swap partition that is 513MB or larger, with 1024MB being preferable. (On a server, you should certainly have multiple swap partitions on multiple drives!) If your swap partition isn't large enough, you'll have to either add another hard drive with an adequate swap partition, or reinstall. (While having a /var partition with at least that much disk space free is helpful, it isn't necessary.)

If you followed the installation suggestions in the beginning of the book, you're all set.

Crash Dump Process

The kernel crash−capturing process works somewhat like this. If a properly configured system crashes, it will save a core dump of the system memory. You can't save it to a file, because the crashed kernel doesn't know about files; it only knows about partitions. The simplest place to write this dump is to the swap partition, and the dump is placed as close to the end of the swap partition as possible. Once the crashing system saves the core to swap, it reboots the computer.

During the reboot, /etc/rc enables the swap partition. It then (probably) runs fsck on the crashed disks. It has to enable swapping before running fsck, because fsck might need to use swap space. Hopefully, you have enough swap space that fsck can get everything it needs without overwriting the dump file lurking in your swap partition.

Once the system has a place where it can save a core dump, it checks the swap partition for a dump. Upon finding a core dump, savecore copies the dump from swap to the proper file, clears the dump from swap, and lets the reboot proceed. You now have a kernel core file, and can use that to get a backtrace.

The Debugging Kernel

The standard FreeBSD kernel install removes all the debugging information from the kernel before installing it, including symbols, which provide a map between the machine code and the source code. Such a map can be larger than the actual program, and nobody wants to run a kernel that's three times larger than it has to be! However, we need this map, and other debugging information, to diagnose what went wrong in the crash.

This map also includes a complete list of source−code line numbers, so the developer can learn exactly where a problem occurred. Without this information, the developer is stuck trying to map a kernel core to the source code by hand, which is somewhat like trying to assemble a million−piece puzzle without a picture, or even knowing that you have all the pieces. Overall, this is an ugly job. It's even uglier when you consider that the developer who needs to do the work is a volunteer. That's why your debugging kernel should include its symbols.

To keep the symbols, add these lines to your kernel configuration:

454

...............................................................................................

options DDB

makeoptions DEBUG=−g

...............................................................................................

The DDB option installs the DDB kernel debugger. (This isn't strictly necessary, but it can be helpful and it doesn't take up that much room.) The makeoptions you set here tell the system to build a debugging kernel.

Post−Panic Behavior

When configuring your system, you'll need to decide how you want the system to behave after a panic. Do you want the computer to reboot automatically, or do you want it to stay at the panic screen until you manually trigger a reboot? If the system is at a remote location, you'll almost certainly want the computer to reboot automatically, but if you're at the console debugging kernel changes, you might want it to wait for you to tell it to reboot.

To reboot automatically, include the kernel option DDB_UNATTENDED:

...............................................................................................

options DDB_UNATTENDED

...............................................................................................

If you don't include this option, the system will wait for you to tell it to reboot.

kernel.debug

Once you have the kernel configured the way you want, do the usual dance (described in Chapter 4) to configure and install it.

Once you've installed your new kernel, you'll find a file in the kernel compile directory called kernel.debug. This is your kernel with symbols; save it somewhere. The next time you upgrade your system or customize the kernel, this debugging kernel will be overwritten by a new debugging kernel. If you've built a kernel just for testing, you want to be sure that you have your known−to−be−good debugging kernel available.

One of the frequent causes of a failed debugging process is losing the debugging kernel and trying to debug a crashed kernel with a different kernel.debug. This won't work. I generally copy kernel.debug to /var/crash/kernel.debug.date, so I can tell when a particular debug kernel was built. This lets me date−match the current kernel to a debugging kernel, and also tells me when a kernel.debug is old enough that I can delete it.

With any luck, you'll never need these debugging kernels, though personally, I've found my luck to be unreliable. Debugging kernels take little disk space and provide quick answers when trouble hits, so I strongly suggest using them.

455

Dumpon

Now it's time to tell the system where to write the core dump—this location is the dumpdev. FreeBSD uses the swap partition as the dump device, which is why it has to be slightly larger than your physical memory. (You can use a UFS partition, but after the crash it won't be a usable UFS partition any more!)

You can get the device name from /etc/fstab. Look for a line with a FSType entry of swap; the first entry in that line is the physical device name. For example, on my laptop, my swap field in /etc/fstab looks like this:

...............................................................................................

/dev/ad0s4b none swap sw 0 0

...............................................................................................

Tell the system to use a dump device with dumpon(8), which must be set each time the system boots. Of course, as you might guess, there's an rc.conf switch for this. My swap partition is /dev/ad0s4b, so I specify this as the dump device in /etc/rc.conf:

...............................................................................................

dumpdev="/dev/ad0s4b"

...............................................................................................

Savecore

Next, tell your system where to save the dump after the reboot using savecore(8). You can change the default, /var/crash, with rc.conf's dumpdir setting. (This directory must exist; savecore will not create it!)

As you become more experienced in saving panics, you may find that you need to adjust the core−saving behavior. Read savecore(8), and set any appropriate options in savecore_flags in /etc/rc. One popular flag is −z, which compresses the core file and can save some disk space. Savecore is now smart enough to automatically eliminate unused memory from the dump, which can save a lot of room.

Upon a Crash

If you're in front of your computer when it crashes, you'll see the panic message. If the system is set to reboot automatically, numbers will start to flow by, counting the megs of memory being dumped. Finally, the computer will reboot, fsck will run, and you can watch savecore copy the memory dump from swap to a file.

If your system doesn't reboot automatically, you'll need to enter two commands after the panic, at the debugger prompt: panic to sync the disks and continue to start the reboot process. (FreeBSD supports many other debugging options, but you have to know how to use the kernel debugger to make use of them.)

Dumps and Bad Kernels

Some kernels just crash and die during boot, or won't stay up long enough to fix a problem. In that case, you need to boot with a different kernel.

456