Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Linux+ Certification Bible.pdf
Скачиваний:
47
Добавлен:
15.03.2015
Размер:
3.78 Mб
Скачать

Chapter 16 Linux Troubleshooting Basics 483

Linux administrators really earn their pay when troubleshooting the Linux system. If a system process or application halts, and the user is unable to perform

his or her job, the problem must be fixed as soon as possible. Even minor problems can quickly fill up an administrator’s time if they aren’t dealt with efficiently and correctly.

Troubleshooting is an art, but if you use the proper methodology and best practices, your troubleshooting attempts can be accurate and efficient, too. For example, you don’t do yourself any favors by replacing several hardware parts if the actual problem is software-related. Using a proper step-by-step methodology will help you avoid wasting your time on such procedures.

This chapter deals with the basics of troubleshooting and focuses on step-by-step methodical methods that you can use to efficiently solve a problem. This chapter emphasizes the importance of log files as a troubleshooting resource, along with the techniques of stopping and starting processes, and modifying their configuration to aid in the troubleshooting process. You can use a wide variety of command line tools for help when you need to carefully examine your processes and log files.

Finally, this chapter includes a section devoted to troubleshooting resources that you can refer to for help when fixing a problem.

Identifying the Problem

6.1 Identify and locate the problem by determining whether the problem is hardware, operating system, application software, configuration, or the user

When troubleshooting, stick to a step-by-step method to determine a solution to your problem. This way, you avoid making common mistakes, such as ignoring the obvious or following a path of examination that directs you away from the cause of the problem.

Methodology and Best Practices

6.2 Describe troubleshooting best practices (i.e., methodology)

This section provides a sample step-by-step process for you to follow when troubleshooting a problem. This is a good general overview of how you should examine a problem. Sometimes, however, extended downtime for a close examination is not an option. With some luck and quick thinking, though, you can achieve a solution as fast as going through step by step.

484

Part VI Troubleshooting and Maintaining System Hardware

It is much too easy to overlook the most obvious things when troubleshooting. Always start with the simplest things first. (Is the machine plugged in?)

1.Examine the symptoms: Take the time to get all the facts when the first signs of the problem are reported. Explore the following questions: Is this happening to one user or is it happening to everyone? Does the problem only happen on one particular system? Does it happen in an application, or is this a system process problem? By gathering as many facts as possible, you can get started in the right direction.

2.Examine the obvious: The seemingly most difficult problems often have a simple source. Don’t overlook the obvious! Even simple things, such as loose power cords, network cables, malfunctioning fans, or a caps lock key can all cause larger problems than you may think. On the software side, make sure that the user knows how to use a particular program. Does the system have enough disk space? Is this a simple permissions problem? By checking the obvious problems first, you can quickly move to more in-depth examinations of the systems that you are checking.

3.Work your way from the simple to the complex: Always start troubleshooting from the simplest systems to the more complex systems. For example, if the problem is reported at a user’s system, start troubleshooting from the user’s system, and then work your way up the chain from the network to the server. By using this methodical practice, you can eliminate the most simple and obvious systems first.

4.Hardware or software: You should also quickly narrow down whether the problem is hardwareor software-related. You will waste a great deal of time and money by swapping and replacing hardware parts if the source of the problem is actually software-related (and vice-versa). Make sure that all of the hardware is operating normally, and that no warning lights, strange sounds, or smells are emanating from any mechanical or electrical components. On the software side, take the time to recreate the problem with the same system. Try the same thing on another person’s workstation to attempt to recreate the problem, and then narrow it down to the server or a workstation.

5.OS or application: After you have determined that the problem is softwarerelated, you must again narrow the issue down to either an operating system or application issue. If it is an operating system issue, something within the system itself is causing the problem, such as incompatible versions or conflicting programming libraries. You can easily test application problems by trying to recreate the problem on another machine with the same application.

6.Examine log files: Check all log files for the operating system and applications. Examine the system log file for any warnings or error messages, and check the application logs for malfunctions.

7.Examine configuration: If you have narrowed down the problem to a specific process or application, examine the configuration file to ensure that it has been set up properly. Compare them to configuration files on other servers, and ensure that they don’t contain any errors. If you make a change to a