Handbook of Local Area Networks, 1998 Edition:LAN Basics
Click Here!
Search the site:
ITLibrary
ITKnowledge
EXPERT SEARCH
Programming Languages
Databases
Security
Web Services
Network Services
Middleware
Components
Operating Systems
User Interfaces
Groupware & Collaboration
Content Management
Productivity Applications
Hardware
Fun & Games
EarthWeb sites
Crossnodes
Datamation
Developer.com
DICE
EarthWeb.com
EarthWeb Direct
ERP Hub
Gamelan
GoCertify.com
HTMLGoodies
Intranet Journal
IT Knowledge
IT Library
JavaGoodies
JARS
JavaScripts.com
open source IT
RoadCoders
Y2K Info
Previous
Table of Contents
Next
1-6Designing Power Distribution Systems for Fault-Tolerant Networks
DAVID FENCL
What does it mean to build a fault-tolerant network? Fault tolerance is usually taken to mean redundancy, and it is true that redundancy is an important part of any fault tolerant system plan. But, according to an old parable, a wise man builds his house upon rock, a foolish man builds his house upon sand. The network administrator who looks only at redundancy as a measure of fault tolerance may be like someone who builds two houses without considering whether the foundation is like rock or like sand. The building power distribution and grounding system is a significant part of the foundation for the network. Its importance is often overlooked, but a marginal power distribution network is often the major cause of unexplained system crashes.
Those who have made a study of design techniques that enhance reliability may consider attending to the building power distribution network, and enhancing it with power conditioning devices, as a fault avoidance technique. Installation of back-up power systems (i.e., uninterruptible power supplies [UPSs]) might be considered a fault tolerance technique because it introduces an element of redundancy in the power path. Whatever the label, both solutions make positive contributions to overall system reliability.
Deploying a UPS is certainly a good start toward improving a networks fault tolerance. Without a UPS, if power stops, even for a fraction of a second, many devices on the network will lock up or reset. The effect can range from slight inconvenience to business-stopping catastrophe. The severity depends on the role of the network in the business process and the characteristics of the devices effected. Today, most microcomputer networks use a UPS to protect the data on the file server.
Reliability expectations increase, however, as LANs become necessary to the minute-by-minute activities of a business. More LAN administrators, systems planners, and support professionals are learning by experience that a UPS by itself does not always prevent system-crashing abends caused by spurious hardware interrupts. These intermittent faults occur often because not all UPS architectures isolate the server from transient anomalies in the electrical environment.
Environmental concerns are not new to computing. Every mainframe and minicomputer vendor has requirements for the systems physical environment, in particular, cool ambient temperatures, dust-free air, and system power that is isolated from other building loads and that is supplied using distribution and grounding schemes that isolate computer circuitry from lightning-induced surges and high-frequency electrical interference.
For the most part, the independent network integrators that install and support most LANs have not had as much experience with the environmental disciplines as have vendors of traditional large computing systems. For this reason, the link between environmental factors and LAN device performance is often overlooked. This chapter is intended as a reference for good environmental practices translated for high-performance, high-reliability distributed computing systems.
Managing fault tolerance in a distributed systems environment is actually harder than in a conventional big iron environment. Networks are at a disadvantage because they are more complex, the electronics operates in much harsher environments, and staffing and staff training are less thorough. The situation is summarized in Exhibit 1-6-1.
Exhibit 1-6-1. Support Comparison of Mainframe Midrange Systems and LANS
If the goal of building a fault-tolerant system is to achieve higher systems reliability through lower failure rates, all the components of the system should be designed with the goal of reducing the overall number of failures. Such a comprehensive approach includes concerns for the distributed operating environments, system administration, network management tools, device redundancy, disaster recovery, and contingency planning.
LAN SYSTEMS RELIABILITY NEEDS ASSESSMENTS
How much is system reliability worth? In one survey, 35% of respondents indicated that downtime costs exceeded $10,000 per hour, 42% said downtime cost their operations as much as $1,000 per hour, and 23% did not have any idea what downtime cost their organizations.
A standard model would be useful in assessing the potential value of reliability enhancement techniques. Such a model would consider the cost of downtime as well as the risk of downtime. The following is a simple equation for modeling the value of reliability:
Reliability Value = (Cost of Downtime × System MTBF × Site Risk Probability
where
Cost of Downtime = (System Time Value × Mean Time to Repair)
+ Cost to Repair MTBF
= Mean Time between Failures
Understanding the Time Value of a System
One of the first steps in modeling the value of reliability is to assess the function of the planned network or network segment. The network manager must understand the time value of the work group served by the system and whether the system is part of the critical path of a larger process. If the latter is the case, the time value of the system segment has to include the value of the total business effort served by the systems users.
Determining the Direct Costs of Downtime
The Time Value of Users
One method of determining the time value of users is to sum the burdened payroll costs of all system users. This could be refined by multiplying each users payroll rate times an estimate of the users system utilization rate. For example, users involved in continuous data entry activities would have a 100% system utilization rateall the time they are working, they are working on the system. A user who uses the system for only one hour each standard work day would have a system utilization rate of only 12.5% (1/8).
This method discounts the time value of users who could shift their systems tasks into other time periods, presumably after the system has recovered from whatever fault has caused the downtime. This payroll cost valuation method might make sense for systems that are important but are not in the critical path. For systems in the critical path, a valuation method based on work group output makes more sense.
Time Value for Critical Path Systems
A critical path system is one in which a system fault affects not only the immediate users of the system but affects total output of the business effort dependent upon direct systems users.
Previous
Table of Contents
Next
Use of this site is subject certain Terms & Conditions.
Copyright (c) 1996-1999 EarthWeb, Inc.. All rights reserved. Reproduction in whole or in part in any form or medium without express written permission of EarthWeb is prohibited.
Please read our privacy policy for details.
Wyszukiwarka
Podobne podstrony:
The Modern Dispatch 115 Airborne LegionnairePsalm 115 w 3 8 MODLITWY DO KRZYŻA00 (115)115 5S~1The Modern Dispatch 117 Blow the Trumpetv 04 117więcej podobnych podstron