A Case for Centralized Logging
Dec 7, 2001
There are three main reasons why every system administrator will eventually need event logging: troubleshooting, resource tracking (proactive system administration), and security. Unfortunately, unless you can centrally focus this resource for monitoring and storage it will be nearly impossible to handle. One of the most important issues in an IT department is harnessing this resource. Based on how well they accomplished this, it is a true indicator of their success.
Each of the three key reasons for logging are divided below and discussed in greater detail with key points of importance. While each reason for logging might be managed without the benefit of a centralized solution, it is the intent here to demonstrate how it's impossible to achieve all three crucially important components by any other means. By demonstrating that the best logical solution for each component is a centralized solution, it should become evident that the best over-all solution would be the same as well.
The event logs are usually the best source of information for determining whether a system or network is experiencing problems. Critical events, such as a disk drive or swap space filling to capacity, or the failure of a power supply or other necessary piece of equipment, are noted in the event logs (and, we hope, with an immediate on-screen error message as well). Less serious events, such as the failure of a driver to load, or the detection of an IP address conflict, are also logged. Informational events, such as a user logging on, or a change in the configuration of an application, can be noted as well. And for most drivers and background processes, the event logs are the only facility available for reporting diagnostic information.
While local logging mechanisms can store this information for your computers, that is not the case for your network equipment. The only way to solve this would be configuring a system to be dedicated to receiving/storing all the log data for these remote devices. This is termed remote logging. To designate another computer system to perform extra duty as a remote log server can place tremendous additional load. Additionally, a system which receives and stores information as sensitive as network data is usually required to be more secure than standard systems in a company's local area network. Because of these reasons, it is quite common to designate a dedicated system for this particular duty... yes, a log server. This system is then utilized to assist in the function of troubleshooting by parsing log data of the corresponding suspected equipment.
Here are some key benefits of centralized logging for troubleshooting:
Having the answers to "why" quickly and accurately... For your boss, the president, the board of directors, ...or the television crew outside your lobby if you're a large Ecom company. Having all your logs in one location to quickly access and find the trouble.
For troubleshooting while system is down and unable to tell you what happened. (If you have remote copies of all your system logs you can look at exactly what's been going-on on that system. If the system is in a state of not being able to reboot, the trouble-shooting time is minimal by having a copy of the system logs elsewhere).
Prevention of re-occurrence. (How many times has a DBA changed kernel parameters that now cause the system to crash. Only, you don't know that until you recover the system data and get the system back up. But, before you can change the kernel, down goes the system again and lose more data because of another system panic).
Removed risk of loss of log information. (Since all the logs are copied over to a central repository there is no chance of the system crashing and corrupting the copied set of log files).
Resource Tracking - Proactive System Administration
The event logs may also contain important information on the capacity and usage of system resources. Any type of system metric that can change over time is fair game to be reported and logged. Such metrics include the frequency of users logging on and off, the duration of the use of specific applications, the amount of available disk space crossing a threshold, and print spooler activity.
It is only practical to use the event logs for tracking resource metrics that change infrequently. Metrics that must be collected using a very fast sampling rate to be useful (e.g., CPU and memory usage) are not good candidates to be stored in the event logs. The event logs would quickly fill to capacity with such information. Instead, system resources such as perfmon, NT's Task Manager and other performance monitor tools from 3rd party vendors should be used to display these types of metrics using a "close to real-time" display.
Here are some key benefits of centralized logging for resource tracking:
Proactive aid for system tuning... disk usage, memory usage, maximum number of users, DB resources - # of processes, etc. (All these logged errors will help you tune your systems before disaster strikes).
Discovery of Hardware Problems... Soft ecc-memory errors, soft disk errors, trap-pointers to cpu panics. Many times these errors start getting reported days/weeks/months prior to system crashes. All these messages must be saved to help diagnose hardware failure by computer manufacturer. Without them, you're dead in the water. Start surfing dice.com for a new job.
Discovery of Software Problems... Preventing System/Application crashes ahead of time. OS and system utilities are all designed to log errors and warnings to system logs. Most commercial applications report errors and warnings to the standard system logging mechanism. All custom software written by your employees should report to system logs as well.
Resource Tracking prior to failure - cost benefits...
How much money is saved fixing something before it breaks?
(Consider the cost of Scheduled Outages vs. unscheduled down-time. Field Service charges outside standard M-F/8-5.)
How much money are you saving not having to recover from a system crash?
(Have you lost data? How many users are idol until system is repaired?)
How much money is saved avoiding a network outage?
(How many users are also down because of the network?)
Improvement of Design... Centralized logging can provide clues for making things better. (It can also be an added layer of cost justification for improvement by reporting evidence of issues).
Event logging is a very important part of computer system security. Try as we might to make our systems secure, it is impossible to certify or guarantee that any computer system is 100% secure. There will always be security flaws that can be exploited, and unfortunately the greatest risks to security are the human users themselves. If illegal entry and access of a computer cannot be completely prevented, such problems at least need to be recorded and tracked in an audit log for the purpose of revealing the security flaws in our systems and possibly identifying those (humans or computers) that have exploited these flaws. The review of system logs for signs of problems, or potential problems, is referred to generically as auditing.
Auditing is the policy and procedure of recording pre-selected system events, and the use of the recorded events to discover and resolve security problems in the system. Under NT, auditing is performed automatically by the operating system. Similar authentication settings can be configured on UNIX and Linux variants at the "auth" level reporting. Auditing policies are controlled by the system administrator who determines which events will be reported and logged. You can think of auditing as a preemptive form of troubleshooting.
When you hear or read about a security expert solving a mysterious problem on a computer system it almost always turns out that a significant part of the solution involved looking through the event logs, even if it's only to say that "we looked through the system logs and didn't find any reports of unusual activity or events."
Information logging is of utmost importance in a properly secure setting. However, logs by themselves offer the admin little if the files aren't being reviewed. The admin himself (herself) must recognize log files as tools, and must understand the importance of learning to use them effectively by scanning through them regularly. Unfortunately, the commonly overworked admin often only uses the logs in a reactionary way because of a lack of time and because of the sheer size of the logs. Scanning through megabytes of text each day, most of which looks mind-numbingly similar, is not conducive to the exacting processing that a secure machine needs. Fortunately, by using a tool such as eBuzzSaw(1), the admin can automate log parsing and begin to use the logs in a proactive manner, thereby reducing lead time, or the head-start, as one of the bad guys' best advantages. Remember that although we may have in place the most advanced firewall devices and IDS software, due to the rate at which new vulnerabilities are discovered, if we're not regularly monitoring our logs, we ourselves will have become the tenuous weak-link in the security fence.
Three important points to consider are the following:
Logging is a primary part of IT Security... (Intrusion Detection) most often over-looked, and is considered your first step of protection by security experts.
Cheapest line of defense for a company - all systems are already capable of reporting. No reason for this not to be implemented already in one's corporate environment by sysadmin group.
Monitoring Authentication logs is quickest and most accurate way of discovering "brute-force" attacks. All other alternatives can give false-positive alerts - centralized logging can't!
A brief note on C2...
Because event logging provides an auditing mechanism, it's a key component of "C2-level security." The U.S. federal government's National Computer Security Center (NCSC) conducts evaluations of the security of computer systems. C2 is a particular level of trust at which a system can be evaluated and rated. Auditing is a key requirement of the C2 level.
Many people today are implementing Intrusion Detection Systems which is a good thing and a valuable complement to any firewall / perimeter security. The problem is that many people are relying on these intrusion detection systems, instead of, not in addition to, effectively and proactively managing logging and log files (i.e., a centralized logging solution). In fact, The FBI rate 6th highest priority on their list of top 20 vulnerabilities: "Non-existent or Incomplete Logging". Their answer for determining whether a company is vulnerable is: "Review the system logs for each major system. ...if they are not centrally stored and backed-up, you are vulnerable." Here's the complete description of this vulnerability (No. G-6) from the "FBI/SANS Top 20 List" document:
G6 - Non-existent or incomplete logging
One of the maxims of security is, "Prevention is ideal, but detection is a must." As long as you allow traffic to flow between your network and the Internet, the opportunity for an attacker to sneak in and penetrate the network, is there. New vulnerabilities are discovered every week, and there are very few ways to defend yourself against an attacker using a new vulnerability. Once you are attacked, without logs, you have little chance of discovering what the attackers did. Without that knowledge, your organization must choose between completely reloading the operating system from original media, and then hoping the data back-ups were OK, or taking the risk that you are running a system that a hacker still controls.
You cannot detect an attack if you do not know what is occurring on your network. Logs provide the details of what is occurring, what systems are being attacked, and what systems have been compromised.
Logging must be done on a regular basis on all key systems, and logs should be archived and backed up because you never know when you might need them. Most experts recommend sending all of your logs to a central log server that writes the data to a write once media, so that the attacker cannot overwrite the logs and avoid detection.
G6.2 Systems impacted:
All operating systems and network devices.
G6.5 How to protect against it:
Set up all systems to log information locally, and to send the log files to a remote system. This provides redundancy and an extra layer of security. Now the two logs can be compared against one another. Any differences could indicate suspicious activity on the system. In addition, this allows cross checking of log files. One line in a log file on a single server may not be suspicious, but the same entry on 50 servers across an organization within a minute of each other, may be a sign of a major problem.
Wherever possible, send logging information to a device that uses write-once media.
Finally, always remember, layers of security are the most effective methods to date to stop intrusions. Centralized logging is one of those layers. Configure all your systems and network devices to send logging events to a central log server as described above.
Due to the three main reasons why logging is necessary - troubleshooting, resource tracking, and security; a good centralized logging solution is essential. Whether you attempt to develop your own proprietary system or purchase one commercially available is beyond the scope of this paper. Much effort is devoted currently on various facets of all the pieces required to encompass the entire mechanism of centralized logging - from front-end "monitoring/alerting client" to back-end "database and archiving server". Only one turn-key solution is currently available today: eBuzzSaw(1) from Sentry+. It's about time someone takes all the pieces of the puzzle and bundles it into one complete package.
Considering the monumental task at developing your own proprietary solution and making each part of the mechanism work equally well, buying a polished off-the-shelf product seems like a no-brainer - especially when considering that support is included. One of the amazing features of eBuzzSaw(1) is that it seems to be equally handy at troubleshooting, resource tracking, and security. In any case, the issue should be no longer "why do we need centralized logging" but one of "why don't we have it!"
Morton, Matt, Logging and critical logs files:...SANS Institute, Dec.9, 2000.
Shaul, Matthew, Using Swatch to Utilize Your Logs, May 7, 2001
Murray, James D., Windows NT Event Logging, O'Reilly & Associates, Inc., Sept. 1998.
The Twenty Most Critical Internet Security Vulnerabilities (Updated) The Expertsí Consensus, http://www.sans.org/top20.htm
(1) eBuzzsaw, Product of Sentry Plus, http://eBuzzSaw.com
© 2001 - 2002 Sentry+ inc. All rights reserved.                                          v1.0