Добавил:

Andrey Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Санкт-Петербургский государственный электротехнический университет "ЛЭТИ"

Предмет:

Электротехника

Файл:

Burgess M.Principles of network and system administration.2004.pdf

Скачиваний:

163

Добавлен:

23.08.2013

Размер:

5.65 Mб

Скачать

☆

<<< < Предыдущая 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 5859 / 6759 60 61 62 63 64 65 66 67 > Следующая >>>

504	CHAPTER 13. ANALYTICAL SYSTEM ADMINISTRATION

vehicles, or their power efﬁciency), but when it comes down to it anyone can claim that those numbers do not matter because both vehicles fulﬁll their purpose identically.

This example is not entirely contrived. System administration requires tools. Often such tools acquire a following of users who grow to like them, regardless of what the tools allow them to achieve. Also, the marketing skills of one software producer might be better than those of another. Thus one cannot rely on counting the numbers of users of a speciﬁc tool as an indication of its power or usefulness. On the other hand, one has to rely on the evaluations of the tools by their users.

In some cases one technology might be better than another only in a certain context. There might be room for several different solutions. For example, are transistors better than thermionic valve devices for building computers? Most people think so, because valve technology is large and cumbersome. But advances in Russian military aerospace technology developed miniature valves because they were robust against electromagnetic pulse interference. One can think of many examples of technologies which have clear advantages, but which cannot be proved numerically, because it boils down to what people prefer to believe about them. This last case also indicates that there is not necessarily a single universal solution to a problem.

Although questionnaires and verbal evaluations which examine experienced users’ impressions can be amongst the best methods of evaluating a hypothesis with many interacting components, the problems in making such a study objective are great. Questionnaires, in particular, can give misleading results, since they are often only returned by users who are already basically satisﬁed. Completely dissatisﬁed users will usually waste no time on what they consider to be a worthless pursuit, by ﬁlling out a questionnaire.

13.5 Evaluating a hierarchical system

Evaluating a model of system administration is a little bit like evaluating the concept of a bridge. Clearly a bridge is a structure with many components each of which contributes to the whole. The bridge either fulﬁlls its purpose in carrying trafﬁc past obstacles or it does not. In evaluating the bridge, should one then consider the performance of each brick and wire individually? Should one consider the aesthetic qualities of the bridge? There might be many different designs each with slightly different goals. Can one bridge be deemed better than another on the basis of objective measurement? Perhaps only the bridge’s maintainer is in a position to gain a feeling for which bridge is the most successful, but the success criterion might be rather vague: a collection of small differences which make the perceptible performance of the bridge optimal, but with no measurably signiﬁcant data to support the conclusion. These are the dilemmas of evaluating a complex technology.

In references [69, 334] and many others it is clear that computer scientists are embarrassed by this difﬁculty in bringing respectability to the ﬁeld of study. In fact the difﬁculty is general to all ﬁelds of technology. In order to evaluate an approach to the solution of a problem it is helpful to create a model. A model is

13.5. EVALUATING A HIERARCHICAL SYSTEM

505

comprised of a principle of operation, a collection of rules and the implementation of these rules through speciﬁc algorithms. It involves a conceptual decomposition of the problem and a number of assertions or hypotheses. System administration is full of intangibles; this restricts model building to those aspects of the problem which can be addressed in schematic terms. It is also sufﬁciently complex that it must be addressed at several different levels in an approximately hierarchical fashion.

In brief, the options we have for performing experimental studies are,

•Measurements

•Simulations

•Case studies

•User surveys

with all of the incumbent difﬁculties which these entail.

13.5.1Evaluation of the conceptual decomposition

It is a general principle in analysis that the details of lower level structure, insofar as they function, do not change the structural organization of higher levels. In physics this is called the separation of scales; in computer science it is called procedural structure or object orientation. The structure of lower levels does not affect the optimal structure of higher levels, for example. An important part of a meaningful evaluation is to sort out the conceptual hierarchy. Is the separation between high-level abstractions and low-level primitives sufﬁcient, ﬂexible, restrictive etc?

13.5.2Simplicity

Conceptual and practical simplicity are often deemed to be positive attributes of software systems and procedures. User surveys can be used to collect evidence of what users believe about this. The system designer’s belief about the relative simplicity of his/her creation is a scientiﬁc irrelevancy.

13.5.3Efﬁciency

The efﬁciency of a program or procedure might be an interesting way to evaluate it. Efﬁciency can mean many things, so the ﬁrst step is to establish precisely what is meant by efﬁciency in context.

Most system administration tasks are not resource intensive for individual hosts. The efﬁciency with which they are carried out is less important than the care with which they are carried out. The reason is simple: the time required to complete most system administration tasks is very short compared with the time most users are prepared to wait.

Efﬁciency in terms of the consumption of human time is a much more pertinent factor. An automatic system which aims to avoid human interaction is by deﬁnition

506	CHAPTER 13. ANALYTICAL SYSTEM ADMINISTRATION

more efﬁcient in man hours than one which places humans in the driving seat. This presupposes, of course, that the setup and maintenance of the automatic system is not so time-consuming in itself as to outweigh the advantages provided by such an approach.

13.5.4Evaluation of system administration as a collective effort

Few system administrators work alone. In most cases they are part of a team who all need to keep abreast of the behavior of the system and the changes made in administration policy. Automation of system administration issues does not alter this. One issue for human administrators is how well a model for administration allows them to achieve this cooperation in practice. Does the automatic system make it easier for them to follow the development of the system in i) theory and ii) practice? Here theory refers to the conceptual design of the system as a whole, and practice refers to the extent to which the theoretical design has been implemented in practice. How is the task distributed between people, systems, procedures and tools? How is responsibility delegated and how does this affect individuals? Is time saved, are accuracy and consistency improved? These issues can be evaluated in a heuristic way from the experiences of administrators. Longer-term, more objective studies could also be performed by analyzing the behavior of system administrators in action. Such studies will not be performed here.

13.5.5Cooperative software: dependency

The fragile tower of components in any functional system is the fundament of its operation. If one component fails, how resilient is the remainder of the system to this failure? This is a relevant question to pose in the evaluation of a system administration model. How do software systems depend on one another for their operation? If one system fails, will this have a knock-on effect for other systems? What are the core systems which form the basis of system operation? In the present work it is relevant to ask how the model continues to work in the event of the failure of DNS, NFS and other network services which provide infrastructure. Is it possible to immobilize an automatic system administration model?

13.5.6Evaluation of individual mechanisms

For individual pieces of software, it is sometimes possible to evaluate the efﬁciency and correctness of the components. Efﬁciency is a relative concept and, if used, it must be placed in a context. For example, efﬁciency of low-level algorithms is conceptually irrelevant to the higher levels of a program, but it might be practically relevant, i.e. one must say what is meant by efﬁciency before quoting results. The correctness of the results yielded by a mechanism/algorithm can be measured in relation to its design speciﬁcations. Without a clear mapping of input/output

13.5. EVALUATING A HIERARCHICAL SYSTEM

507

the correctness of any result produced by a mechanism is a heuristic quality. Heuristics can only be evaluated by experienced users expressing their informed opinions.

13.5.7Evidence of bugs in the software

Occasionally bugs signiﬁcantly affect the performance of software. Strictly speaking an evaluation of bugs is not part of the software evaluation itself, but of the process of software development, so while bugs should probably be mentioned they may or may not be relevant to the issues surrounding the software itself. In this work software bugs have not played any appreciable role in either the development or the effectiveness of the results so they will not be discussed in any detail.

13.5.8Evidence of design faults

In the course of developing a program one occasionally discovers faults which are of a fundamental nature, faults which cause one to rethink the whole operation of the program. Sometimes these are fatal ﬂaws, but that need not be the case. Cataloguing design faults is important for future reference to avoid making similar mistakes again. Design faults may be caused by faults in the model itself or merely in its implementation. Legacy issues might also be relevant here: how do outdated features or methods affect software by placing demands on onward compatibility, or by restricting optimal design or performance?

13.5.9Evaluation of system policies

System administration does not exist without human attitudes, behaviors and policies. These three ﬁt together inseparably. Policies are adjusted to ﬁt behavioral patterns; behavioral patterns are local phenomena. The evaluation of a system policy has only limited relevance for the wider community then: normally only relative changes are of interest, i.e. how changes in policy can move one closer to a desirable solution.

Evaluating the effectiveness of a policy in relation to the applicable social boundary conditions presents practical problems which sociologists have wrestled with for decades. The problems lie in obtaining statistically signiﬁcant samples of data to support or refute the policy. Controlled experiments are not usually feasible since they would tie up resources over long periods. No one can afford this in practice. In order to test a policy in a real situation the best one can do is to rely on heuristic information from an experienced observer (in this case the system administrator). Only an experienced observer would be able to judge the value of a policy on the basis of incomplete data. Such information is difﬁcult to trust however unless it comes from several independent sources. A better approach might be to test the policy with simulated data spanning the range from best to worst case. The advantage with simulated data is that the results are reproducible from those data and thus one has something concrete to show for the effort.

508	CHAPTER 13. ANALYTICAL SYSTEM ADMINISTRATION

13.5.10Reliability

Reliability cannot be measured until we deﬁne what we mean by it. One common deﬁnition uses the average (mean) time before failure as a measure of system reliability. This is quite simply the average amount of time we expect to elapse between serious failures of the system. Another way of expressing this is to use the average uptime, or the amount of time for which the system is responsive (waiting no more than a ﬁxed length of time for a response). Another complementary ﬁgure is then, the average downtime, which is the average amount of time the system is unavailable for work (a kind of informational entropy). We can deﬁne the reliability as the probability that the system is available:

Mean uptime

ρ = Total elapsed time

Some like to deﬁne this in terms of the Mean Time Before Failure (MTBF) and the Mean Time To Repair (MTTR), i.e.

MTBF

ρ = MTBF + MTTR .

This is clearly a number between 0 and 1. Many network device vendors quote these values with the number of 9’s it yields, e.g. 0.99999.

The effect of parallelism or redundancy on reliability can be treated as a facsimile of the Ohm’s law problem, by noting that service provision is just like a ﬂow of work (see also section 6.3 for examples of this).

Rate of service (delivery) = rate of change in information / failure fraction

This is directly analogous to Ohm’s law for the ﬂow of current through a resistance:

		I = V /R
The analogy is captured in this table:

	Potential difference V	Change in information
	Current I	Rate of service (ﬂow of information)
	Resistance R	Rate of failure

This relation is simplistic. For one thing it does not take into account variable latencies (although these could be deﬁned as failure to respond). It should be clear that this simplistic equation is full of unwarranted assumptions, and yet its simplicity justiﬁes its use for simple hand-waving. If we consider ﬁgure 6.10, it is clear that a ﬂow of service can continue, when servers work in parallel, even if one or more of them fails. In ﬁgure 6.11 it is clear that systems which are dependent on other systems are coupled in series and a failure prevents the ﬂow of service. Because of the linear relationship, we can use the usual Ohm’s law expressions for combining failure rates:

Rseries = R1 + R2 + R3 + . . .

13.5. EVALUATING A HIERARCHICAL SYSTEM							509
and			1	1			1
1
		=		+		+		. . .
	Rparallel		R1		R2		R3

These simple expressions can be used to hand-wave about the reliability of combinations of hosts. For instance, let us deﬁne the rate of failure to be a probability of failure, with a value between 0 and 1. Suppose we ﬁnd that the rate of failure of a particular kind of server is 0.1. If we couple two in parallel (a double redundancy) then we obtain an effective failure rate of

1	=	1	+	1
R	=	0.1	+	0.1

i.e. R = 0.05, the failure rate is halved. This estimate is clearly naive. It assumes, for instance, that both servers work all the time in parallel. This is seldom the case. If we run parallel servers, normally a default server will be tried ﬁrst, and, if there is no response, only then will the second backup server be contacted. Thus, in a fail-over model, this is not really applicable. Still, we use this picture for what it is worth, as a crude hand-waving tool.

The Mean Time Before Failure (MTBF) is used by electrical engineers, who ﬁnd that its values for the failures of many similar components (say light bulbs) has an exponential distribution. In other words, over large numbers of similar component failures, it is found that the probability of failure has the form

P (t) = exp(−t/τ )

or that the probability of a component lasting time t is the exponential, where τ is the mean time before failure and t is the failure time of a given component. There are many reasons why a computer system would not be expected to have this simple form. One is dependency. Computer systems are formed from many interacting components. The interactions with third party components mean that the environmental factors are always different. Again, the issue of fail-over and service latencies arises, spoiling the simple independent component picture. Mean time before failure doesn’t mean anything unless we deﬁne the conditions under which the quantity was measured. In one test at Oslo College, the following values were measured for various operating systems, averaged over several hosts of the same type.

Solaris 2.5	86 days
GNU/Linux	36 days
Windows 95	0.5 days

While we might feel that these numbers agree with our general intuition of how these operating systems perform in practice, this is not a fair comparison since the patterns of usage are different in each case. An insider could tell us that the users treat the PCs with a casual disregard, switching them on and off at will: and in spite of efforts to prevent it, the same users tend to pull the plug on GNU/Linux hosts also. The Solaris hosts, on the other hand, live in glass cages where prying ﬁngers cannot reach. Of course, we then need to ask: what is the reason why users reboot and pull the plug on the PCs? The numbers above cannot have any meaning until this has been determined; i.e. the software components

510	CHAPTER 13. ANALYTICAL SYSTEM ADMINISTRATION

of a computer system are not atomic; they are composed of many parts whose behavior is difﬁcult to catalogue.

Thus the problem with these measures of system reliability is that they are almost impossible to quantify and assigning any real meaning to them is fraught with subtlety. Unless the system fails regularly, the number of points over which it is possible to average is rather small. Moreover, the number of external factors which can lead to failure makes the comparison of any two values at different sites meaningless. In short, this quantity cannot be used for anything other than illustrative purposes. Changes in the reliability, for constant external conditions, can be used as a measure to show the effect of a single parameter from the environment. This is perhaps the only instance in which this can be made meaningful, i.e. as a means of quantitative comparison within a single experiment.

13.5.11Metrics generally

The quantiﬁers which can be usefully measured or recorded on operating systems are the variables which can be used to provide quantitative support for or against a hypothesis about system behavior. System auditing functionality can be used to record just about every operation which passes through the kernel of an operating system, but most hosts do not perform system auditing because of the huge negative effect it has on performance. Here we consider only metrics which do not require extensive auditing beyond what is normally available.

Operating system metrics are normally used for operating system performance tuning. System performance tuning requires data about the efﬁciency of an operating system. This is not necessarily compatible with the kinds of measurement required for evaluating the effectiveness of a system administration model. System administration is concerned with maintaining resource availability over time in a secure and fair manner. It is not about optimizing speciﬁc performance criteria.

Operating system metrics fall into two main classes: current values and average values for stable and drifting variables respectively. Current (immediate) values are not usually directly useful, unless the values are basically constant, since they seldom accurately reﬂect any changing property of an operating system adequately. They can be used for ﬂuctuation analysis, however, over some coarsegraining period. An averaging procedure over some time interval is the main approach of interest. The Nyquist law for sampling of a continuous signal is that the sampling rate needs to be twice the rate of the fastest peak cycle in the data if one is to resolve the data accurately. This includes data which are intended for averaging since this rule is not about accuracy of resolution but about the possible complete loss of data. The granularity required for measurement in current operating systems is summarized in the following table.

0 − 5 secs		Fine grain work
10	− 30 secs	For peak measurement
10	− 30 mins	For coarse-grain work
Hourly average		Software activity
Daily average		User activity
Weekly average		User activity

13.5. EVALUATING A HIERARCHICAL SYSTEM

511

Although kernel switching times are of the order of microseconds, this time scale is not relevant to users’ perceptions of the system. Inter-system cooperating requires many context switch cycles and I/O waits. These compound themselves into intervals of the order of seconds in practice. Users themselves spend long periods of time idle, i.e. not interacting with the system on an immediate basis. An interval of seconds is therefore sufﬁcient. Peaks of activity can happen quickly by user perceptions but they often last for protracted periods, thus ten to thirty seconds is appropriate here. Coarse-grained behavior requires lower resolution, but as long as one is looking for peaks a faster rate of sampling will always include the lower rate. There is also the issue of how quickly the data can be collected. Since the measurement process itself affects the performance of the system and uses its resources, measurement needs to be kept to a level where it does not play a signiﬁcant role in loading the system or consuming disk and memory resources.

The variables which characterize resource usage fall into various categories. Some variables are devoid of any apparent periodicity, while others are strongly periodic in the daily and weekly rhythms of the system. The amount of periodicity in a variable depends on how strongly it is coupled to a periodic driving force, such as the user community’s daily and weekly rhythms, and also how strong that driving force is (users’ behavior also has seasonal variations, vacations and deadlines etc). Since our aim is to ﬁnd a sufﬁciently complete set of variables which characterize a macrostate of the system, we must be aware of which variables are ignorable, which variables are periodic (and can therefore be averaged over a periodic interval) and which variables are not periodic (and therefore have no unique average).

Studies of total network trafﬁc have shown an allegedly self-similar (fractal) structure to network trafﬁc when viewed in its entirety [192, 324]. This is in contrast to telephonic voice trafﬁc on traditional phone networks which is bursty, the bursts following a random (Poisson) distribution in arrival time. This almost certainly precludes total network trafﬁc from a characterization of host state, but it does not preclude the use of numbers of connections/conversations between different protocols, which one would still expect to have a Poissonian proﬁle. A value of none means that any apparent peak is much smaller than the error bars (standard deviation of the mean) of the measurements when averaged over the presumed trial period. The periodic quantities are plotted on a periodic time scale, with each covering adding to the averages and variances. Non-periodic data are plotted on a straightforward, unbounded real line as an absolute value. A running average can also be computed, and an entropy, if a suitable division of the vertical axis into cells is deﬁned [42]. We shall return to the deﬁnition of entropy later.

The average type referred to below divides into two categories: pseudocontinuous and discrete. In point of fact, virtually all of the measurements made have discrete results (excepting only those which are already system averages). This categorization refers to the extent to which it is sensible to treat the average value of the variable as a continuous quantity. In some cases, it is utterly meaningless. For the reasons already indicated, there are advantages to treating measured values as continuous, so it is with this motivation that we claim a pseudo-continuity to the averaged data.

In this initial instance, the data are all collected from Oslo College’s own computer network which is an academic environment with moderate resources. One

512	CHAPTER 13. ANALYTICAL SYSTEM ADMINISTRATION

might expect our data to lie somewhere in the middle of the extreme cases which might be found amongst the sites of the world, but one should be cognizant of the limited validity of a single set of such data. We re-emphasize that the purpose of the present work is to gauge possibilities rather than to extract actualities.

Net

•Total number of packets: Characterizes the totality of trafﬁc, incoming and outgoing on the subnet. This could have a bearing on latencies and thus inﬂuence all hosts on a local subnet.

•Amount of IP fragmentation: This is a function of the protocols in use in the local environment. It should be fairly constant, unless packets are being fragmented for scurrilous reasons.

•Density of broadcast messages: This is a function of local network services. This would not be expected to have a direct bearing on the state of a host (other than the host transmitting the broadcast), unless it became so high as to cause a trafﬁc problem.

•Number of collisions: This is a function of the network community trafﬁc. Collision numbers can signiﬁcantly affect the performance of hosts wishing to communicate, thus adding to latencies. It can be brought on by sheer amount of trafﬁc, i.e. a threshold transition and by errors in the physical network, or in software. In a well-conﬁgured site, the number of collisions should be random. A strong periodic signal would tend to indicate a burdened network with too low a capacity for its users.

•Number of sockets (TCP) in and out: This gives an indication of service usage. Measurements should be separated so as to distinguish incoming and outgoing connections. We would expect outgoing connections to follow the periodicities of the local site, where as incoming connections would be a superposition of weak periodicities from many sites, with no net result. See ﬁgure 13.1.

•Number of malformed packets: This should be zero, i.e. a non-zero value here speciﬁes a problem in some networked host, or an attack on the system.

Storage

•Disk usage in bytes: This indicates the actual amount of data generated and downloaded by users, or the system. Periodicities here will be affected by whatever policy one has for garbage collection. Assuming that users do not produce only garbage, there should be a periodicity superposed on top of a steady rise.

•Disk operations per second: This is an indication of the physical activity of the disk on the local host. It is a measure of load and a signiﬁcant contribution to latency both locally and for remote hosts. The level of periodicity in this signal must depend on the relative magnitude of forces driving the host. If a

13.5. EVALUATING A HIERARCHICAL SYSTEM

513

4
3
2
1
0	6	12	18	24
0	6	12	18	24

Figure 13.1: The daily rhythm of the external logins shows a strong unambiguous peak during work hours.

host runs no network services, then it is driven mainly by users, yielding a strong periodicity. If system services dominate, these could be either random or periodic. The values are thus likely to be periodic, but not necessarily strong.

•Paging (out) rate (free memory and thrashing): These variables measure the activity of the virtual memory subsystem. In principle they can reveal problems with load. In our tests, they have proved singularly irrelevant, though we realize that we might be spoiled with the quality of our resources here. See ﬁgures 13.2 and 13.3.

Processes

•Number of privileged processes: The number of processes running the system provides an indication of the number of forked processes or active threads which are carrying out the work of the system. This should be relatively constant, with a weak periodicity indicating responses to local users’ requests. This is separated from the processes of ordinary users, since one expects the behavior of privileged (root/Administrator) processes to follow a different pattern. See ﬁgure 13.4.

•Number of non-privileged processes: This measure counts not only the number of processes but provides an indication of the range of tasks being performed by users, and the number of users by implication. This measure has a strong periodic quality, relatively quiescent during weekends, rising sharply

514	CHAPTER 13. ANALYTICAL SYSTEM ADMINISTRATION

156

117

0	6	12	18	24
0

Figure 13.2: The daily rhythm of the paging data illustrates the problems one faces in attaching meaning directly to measurements. Here we see that the error bars (signifying the standard deviation) are much larger than the variation of the graph itself. Nonetheless, there is a marginal rise in the paging activity during daytime hours, and a corresponding increase in the error bars, indicating that there is a real effect, albeit of little analytical value.

on Monday to a peak on Tuesday, followed by a gradual decline towards the weekend again. See ﬁgures 13.5 and 13.6.

•Maximum percentage CPU used in processes: This is an experimental measure which characterizes the most CPU expensive process running on the host at a given moment. The signiﬁcance of this result is not clear. It seems to have a marginally periodic behavior, but is basically inconclusive. The error bars are much larger than the variation of the average, but the magnitude of the errors increases also with the increasing average, thus, while for all intents and purposes this measure’s average must be considered irrelevant, a weak signal can be surmised. The peak value of the data might be important however, since a high max-cpu task will signiﬁcantly load the system. See ﬁgure 13.7.

Users

•Number logged on: This follows the classic pattern of low activity during the weekends, followed by a sharp rise on Monday, peaking on Tuesday and declining steadily towards the weekend again.

•Total number: This value should clearly be constant except when new user accounts are added. The average value has no meaning, but any change in this value can be signiﬁcant from a security perspective.

13.5. EVALUATING A HIERARCHICAL SYSTEM

515

156
117
78
39
0	24	48	72	96	120	144	168
0	24	48	72	96	120	144	168

Figure 13.3: The weekly rhythm of the paging data show that there is a deﬁnite daily rhythm, but again, it is drowned in the huge variances due to random inﬂuences on the system, and is therefore of no use in an analytical context.

•Average time spent logged on per user: Can signify patterns of behavior, but has a questionable relevance to the behavior of the system.

•Load average: This is the system’s own back-of-the-envelope calculation of resource usage. It provides a continuous indication of load, but on an exaggerated scale. It remains to be seen whether any useful information can be obtained from this value; its value can be quite disordered (high entropy).

•Disk usage rise per session per user per hour: The average amount of increase of disk space per user per session, indicates the way in which the system is becoming loaded. This can be used to diagnose problems caused by a single user downloading a huge amount of data from the network. During normal behavior, if users have an even productivity, this might be periodic.

•Latency of services: The latency is the amount of time we wait for an answer to a speciﬁc request. This value only becomes signiﬁcant when the system passes a certain threshold (a kind of phase transition). Once latency begins to restrict the practices of users, we can expect it to feed back and exacerbate latencies. Thus the periodicity of latencies would only be expected in a phase of the system in which user activity was in competition with the cause of the latency itself.

Part of what one wishes to identify in looking at such variables is patterns of change. These are classiﬁable but not usually quantiﬁable. They can be relevant to policy decisions as well as in ﬁne tuning of the parameters of an automatic response. Patterns of behavior include

516	CHAPTER 13. ANALYTICAL SYSTEM ADMINISTRATION

52
39
26
13
0	24	48	72	96	120	144	168
0	24	48	72	96	120	144	168

Figure 13.4: The weekly average of privileged (root) processes shows a constant daily pulse, steady on week days. During weekends, there is far less activity, but wider variance. This might be explained by assuming that root process activity is dominated by service requests from users.

–Social patterns of the users

–Systematic patterns caused by software systems.

Identifying such patterns in the variation of the metrics listed above is not an easy task, but it is the closest one can expect to come to a measurable effect in a system administration context.

In addition to measurable quantities, humans have the ability to form value judgments in a way that formal statistical analyses cannot. Human judgment is based on compounded experience and associative thinking and while it lacks scientiﬁc rigor it can be intuitively correct in a way that is difﬁcult to quantify. The down side of human perception is that prejudice is also a factor which is difﬁcult to eliminate. Also not everyone is in a position to offer useful evidence in every judgment:

–User satisfaction: software, system-availability, personal freedom

–Sysadmin satisfaction: time-saving, accuracy, simplifying, power, ease of use, utility of tools, security, adaptability.

Other heuristic impressions include the amount of dependency of a software component on other software systems, hosts or processes; also the dependency of a software system on the presence of a human being. In ref. [186] Kubicki discusses metrics for measuring customer satisfaction. These involve validated questionnaires, system availability, system response time, availability of tools, failure analysis, and time before reboot measurements.

13.5. EVALUATING A HIERARCHICAL SYSTEM

517

32
24
16
8
0	6	12	18	24
0	6	12	18	24

Figure 13.5: The daily average of non-privileged (user) processes shows an indisputable, strong daily rhythm. The variation of the graph is now greater than the uncertainty reﬂected in the error bars.

32
24
16
8
0	24	48	72	96	120	144	168
0	24	48	72	96	120	144	168

Figure 13.6: The weekly average of non-privileged (user) processes shows a constant daily pulse, quiet at the weekends, strong on Monday, rising to a peak on Tuesday and falling off again towards the weekend.

<<< < Предыдущая 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 5859 / 6759 60 61 62 63 64 65 66 67 > Следующая >>>

Соседние файлы в предмете Электротехника

#
23.08.2013423.4 Кб6Brown M.Topology management in rooftop wireless networks.2004.pdf
#
23.08.2013118.95 Кб23Brown R.G.Getting started with programmable logic devices,the 16V8 and 20V8.2000.pdf
#
23.08.20135.55 Mб6Bryan L.A.Programmable controllers.Theory and implementation.1997.pdf
#
23.08.2013231.57 Кб10Budinsky F.Automatic code generation from design patterns.pdf
#
23.08.2013209.07 Кб10Burger R.G.Printing floating-point numbers quickly and accurately.pdf
#
23.08.20135.65 Mб163Burgess M.Principles of network and system administration.2004.pdf
#
23.08.2013200.19 Кб10Burhoe W.Loudspeaker handbook and lexicon.1997.pdf
#
23.08.201372.57 Кб19Burkhardt A.J.Calculation of PCB track impedance.pdf
#
23.08.2013241.8 Кб15Buying a photovoltaic solar electric system.A consumer guide.pdf
#
23.08.2013594.24 Кб29Byers TJ.PC power supply repair.1996.pdf
#
23.08.2013260.05 Кб13C style guide.1994.pdf