115 117

Handbook of Local Area Networks, 1998 Edition:LAN Basics Click Here! Search the site: ITLibrary ITKnowledge EXPERT SEARCH Programming Languages Databases Security Web Services Network Services Middleware Components Operating Systems User Interfaces Groupware & Collaboration Content Management Productivity Applications Hardware Fun & Games EarthWeb sites Crossnodes Datamation Developer.com DICE EarthWeb.com EarthWeb Direct ERP Hub Gamelan GoCertify.com HTMLGoodies Intranet Journal IT Knowledge IT Library JavaGoodies JARS JavaScripts.com open source IT RoadCoders Y2K Info Previous Table of Contents Next 1-6Designing Power Distribution Systems for Fault-Tolerant Networks DAVID FENCL What does it mean to build a fault-tolerant network? Fault tolerance is usually taken to mean redundancy, and it is true that redundancy is an important part of any fault tolerant system plan. But, according to an old parable, “a wise man builds his house upon rock, a foolish man builds his house upon sand.” The network administrator who looks only at redundancy as a measure of fault tolerance may be like someone who builds two houses without considering whether the foundation is like rock or like sand. The building power distribution and grounding system is a significant part of the foundation for the network. Its importance is often overlooked, but a marginal power distribution network is often the major cause of unexplained system crashes. Those who have made a study of design techniques that enhance reliability may consider attending to the building power distribution network, and enhancing it with power conditioning devices, as a fault avoidance technique. Installation of back-up power systems (i.e., uninterruptible power supplies [UPSs]) might be considered a fault tolerance technique because it introduces an element of redundancy in the power path. Whatever the label, both solutions make positive contributions to overall system reliability. Deploying a UPS is certainly a good start toward improving a network’s fault tolerance. Without a UPS, if power stops, even for a fraction of a second, many devices on the network will lock up or reset. The effect can range from slight inconvenience to business-stopping catastrophe. The severity depends on the role of the network in the business process and the characteristics of the devices effected. Today, most microcomputer networks use a UPS to protect the data on the file server. Reliability expectations increase, however, as LANs become necessary to the minute-by-minute activities of a business. More LAN administrators, systems planners, and support professionals are learning by experience that a UPS by itself does not always prevent system-crashing abends caused by spurious hardware interrupts. These intermittent faults occur often because not all UPS architectures isolate the server from transient anomalies in the electrical environment. Environmental concerns are not new to computing. Every mainframe and minicomputer vendor has requirements for the system’s physical environment, in particular, cool ambient temperatures, dust-free air, and system power that is isolated from other building loads and that is supplied using distribution and grounding schemes that isolate computer circuitry from lightning-induced surges and high-frequency electrical interference. For the most part, the independent network integrators that install and support most LANs have not had as much experience with the environmental disciplines as have vendors of traditional large computing systems. For this reason, the link between environmental factors and LAN device performance is often overlooked. This chapter is intended as a reference for good environmental practices translated for high-performance, high-reliability distributed computing systems. Managing fault tolerance in a distributed systems environment is actually harder than in a conventional “big iron” environment. Networks are at a disadvantage because they are more complex, the electronics operates in much harsher environments, and staffing and staff training are less thorough. The situation is summarized in Exhibit 1-6-1. Exhibit 1-6-1. Support Comparison of Mainframe Midrange Systems and LANS If the goal of building a fault-tolerant system is to achieve higher systems reliability through lower failure rates, all the components of the system should be designed with the goal of reducing the overall number of failures. Such a comprehensive approach includes concerns for the distributed operating environments, system administration, network management tools, device redundancy, disaster recovery, and contingency planning. LAN SYSTEMS RELIABILITY NEEDS ASSESSMENTS How much is system reliability worth? In one survey, 35% of respondents indicated that downtime costs exceeded $10,000 per hour, 42% said downtime cost their operations as much as $1,000 per hour, and 23% did not have any idea what downtime cost their organizations. A standard model would be useful in assessing the potential value of reliability enhancement techniques. Such a model would consider the cost of downtime as well as the risk of downtime. The following is a simple equation for modeling the value of reliability: Reliability Value = (Cost of Downtime × System MTBF × Site Risk Probability where Cost of Downtime = (System Time Value × Mean Time to Repair) + Cost to Repair MTBF = Mean Time between Failures Understanding the Time Value of a System One of the first steps in modeling the value of reliability is to assess the function of the planned network or network segment. The network manager must understand the time value of the work group served by the system and whether the system is part of the critical path of a larger process. If the latter is the case, the time value of the system segment has to include the value of the total business effort served by the system’s users. Determining the Direct Costs of Downtime The Time Value of Users One method of determining the time value of users is to sum the burdened payroll costs of all system users. This could be refined by multiplying each user’s payroll rate times an estimate of the user’s system utilization rate. For example, users involved in continuous data entry activities would have a 100% system utilization rate—all the time they are working, they are working on the system. A user who uses the system for only one hour each standard work day would have a system utilization rate of only 12.5% (1/8). This method discounts the time value of users who could shift their systems tasks into other time periods, presumably after the system has recovered from whatever fault has caused the downtime. This payroll cost valuation method might make sense for systems that are important but are not in the critical path. For systems in the critical path, a valuation method based on work group output makes more sense. Time Value for Critical Path Systems A critical path system is one in which a system fault affects not only the immediate users of the system but affects total output of the business effort dependent upon direct systems users. Previous Table of Contents Next Use of this site is subject certain Terms & Conditions. Copyright (c) 1996-1999 EarthWeb, Inc.. All rights reserved. Reproduction in whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Please read our privacy policy for details.

Wyszukiwarka

Podobne podstrony:
The Modern Dispatch 115 Airborne Legionnaire
Psalm 115 w 3 8 MODLITWY DO KRZYŻA
00 (115)
115 5S~1
The Modern Dispatch 117 Blow the Trumpet
v 04 117

więcej podobnych podstron