Forum for Academic Software Engineering Volume 4, Number 11, Tue May 17 14:51:14 CDT 1994 Topics: Re: Formal methods Who owns the software? Re: Who owns the software? Book announcement Denver International Airport DOA, Again / The Art of Troubleshooting A------------------------------------------------------- From: mjl@cs.rit.edu (Michael J Lutz) Subject: Re: Formal methods Keith: I'm hardly surprised that the mathematics department is uneasy with the Gries approach. As an undergraduate in mathematics, I learned a lot about proofs and proof techniques, but at what would be called a "rigorous" vs. "formal" level. I'm hardly a working mathematician, but we have a couple such renegades in our C.S. department, and having seen their work, I know they aren't all that formal either. It doesn't mean they do sloppy work, but it does mean that the culture of mathematics is not as strongly rooted in formal logic as most people think -- despite the work of Hilbert, Godel, etc. I don't want to reopen the Perlis vs. Dijkstra, et. al. debates. But I do believe that we have to recognize that formal methods, in their current form, occupy the netherworld between mathematics and symbolic logic. With any luck we will be able to raise the level of discourse so that formal methods are viewed as legitimate mathematics, and, not incidentally, become codified and stylized enough to serve the same role in software development that continuous mathematics serves for natural science and engineering. By the way, I think that the Gries text is an *excellent* introduction to the foundations of formal methods, serving roughly the same purpose as the "old" calculus, where limit definitions of derivatives, delta/epsilon proofs, etc., provided the foundation for continuous mathematics. The challenge is to move to the next level, where we don't have to appeal to first principles all the time. Mike Lutz mjl@cs.rit.edu A------------------------------------------------------- From: dhecker@sju.edu (D. Hecker) Subject: Who owns the software? I will be teaching a project course this fall. The students, with my help, will develop a large software package, that, with some additional work, may be marketable. Also, it may take several runnings of this course to finish the project. Thus, it could take several years and the effort of many students (and myself) to finish the software. Questions: (1) Who owns the software? Me? The students? The University? Some combination of the above? What combination? (2) Should I get the students to sign some sort of a release form? If so, is it legal/ethical/moral to REQUIRE them to do so as a course requirement? WHat if they refuse? (3) Is it worth all the trouble, considering I really don't expect to make a bundle on this? I'm sure some of you out there have done this before, or your University has a policy in place covering just this issue. I was told that I am breaking new ground for our institution. I'd like to hear what the rest of the world is doing before I go and talk to our University's lawyers. Thanks for your help, David Hecker St. Joseph's University Philadelphia, Pa. dhecker@sju.edu DISCLAIMER: This post reflects my opinions only, and not my employer's. A------------------------------------------------------- From: bh@anarres.CS.Berkeley.EDU (Brian Harvey) Subject: Re: Who owns the software? In my opinion, you should set a good example for your students by distributing your software free under the GNU General Public License. This is a good thing for two reasons. One, as you say, is that it would be very hard to work out a fair distribution of profits, and would lead to bad feelings among the group. The other is that students should be encouraged to feel part of a community, giving as well as taking, rather than to feel that the only reason to do anything is greed. So yes, get both the students and the university to agree to this. A------------------------------------------------------- From: marco@edserve.cs.shinshu-u.ac.jp (Marco Amaral Henriques) Subject: Book announcement Title: "Capacity Planning and Performance Modeling: From Mainframes to Client-Server Systems" Authors: Daniel Menasce, George Mason University Virgilio Almeida, Federal University of Minas Gerais Larry Dowdy, Vanderbilt University Specifics: Prentice-Hall, March 1994, ISBN: 0-13-035494-5, 432 pages, $44 Table of Contents: Foreword by Peter J. Denning Preface Chapter 1. The Capacity Planning Problem Chapter 2. A Capacity Planning Methodology Chapter 3. Performance Models Chapter 4. Intuitive Solutions for a Single Processor Computer System Performance Model Chapter 5. Efficient Solutions for Computer System Models Chapter 6. Multiple Class Models Chapter 7. Performance of Client-Server Architectures Chapter 8. Models of Practical Computer Systems Chapter 9. Obtaining Input Parameters Chapter 10. Model Calibration and Validation Chapter 11. Performance of New Applications Chapter 12. Concluding Remarks Appendix A. Glossary of Terms Appendix B. Glossary of Notation Appendix C. Formulae Appendix D. Quick Manual for QSolver/1 Usage: 1) As a text for a first course in performance evaluation either at the undergraduate level or at the beginning graduate level. The book has a large number of exercises of diverse complexity. 2) As a self-study reference for professionals working in the area of capacity planning, performance analysis, or system management. The concepts are illustrated by means of many examples taken from modern computer systems such as client-server systems. Unique Features: 1) Example Based. Each chapter begins with a motivating problem. The solution to each problem is "discovered" by reasoning from first principles and intuition. Only then is the theory necessary to generalize the solution presented. 2) Software Tools. Provided with the book is a diskette that contains the Pascal source code for programs to solve multiple class open and closed queueing network models with load dependent servers. In addition, an educational version of a PC based capacity planning software package, QSolver/1, is provided. Key Words: capacity planning, performance modeling, service levels, service demands, performance metrics, data collection techniques, workload characterization, workload forecasting, queueing network models, passive resources, open/closed/mixed systems, single/multiple class models, class aggregation, multiple resource possession, client-server architectures, Markov analysis, Little's Result, birth-death systems, stochastic/operational analysis, convolution, mean value analysis, decomposition/aggregation, bounding techniques, approximations, mixed networks, load dependent servers, paging models, memory queuing, priority scheduling, I/O models, model calibration, model validation, software performance engineering, performance prediction, system sizing/tuning, fun. A------------------------------------------------------- From: tmp@netcom.com Subject: Denver International Airport DOA / The Art of Troubleshooting ==== The Apr 30 Denver, Colorado Rocky Mountain News front page story `Debacle at DIA-- Project out of control; airport won't open for months' describes how the faulty baggage system has now caused the fourth delay in the multi-billion-dollar project. Tests on Friday resulted in jammed baggage carts on elevated tracks, 100-cart pileups, bags falling onto tracks, and slow delivery. ``At times, tempers wore short as workers below the surface of the $3.7 billion airport toiled to clear jams...''. Further large-scale tests of the baggage system were cancelled. ``It's not ready. I'm not going to do anything to embarrass the city,'' said Mayor Wellington Webb. The newspaper reports that ``problems with the one-of-a-kind system are becoming so complex that Webb is searching for a NASA level consultant to assist BAE Automated Systems'' (the baggage system contractor). ``We're dealing in areas we don't even understand'' said the mayor. On Friday Continental tested transit of 429 bags from its ticket counters to its concourse, of which 38 were lost in the system. An additional 15% of the bags that arrived had to have their bar-coded tags scanned twice. An afternoon test by Continental went worse, with a 61% rate of 425 bags sent (113 rescanned, 53 lost). United ran a low-volume 300 bag test (less than 10 minutes of business) in the morning that had a 95% success rate. Their afternoon test involving outbound, inbound, and transfer baggage of 2,100 items was cancelled because of `a significant number of jams', according to Public Works spokeswoman Amy Lingg. The Rocky Mountain News reports that the baggage system is suffering problems in two basic areas: software glitches and mechanical failures. In the former case, the system probably employs real-time software of around 1 million lines of code, with `two to three errors per 1,000 lines of code typical'. ``Industry average for debugging errors takes 25% of the project's time or as many as 24 man-hours per error.'' In hardware areas, the 30,000 feet of conveyors, 19,000 bearings, 4,600 syncrhonous drives, 3,000 conveyor drives, 2,750 photo cells, 2,100 linear induction motors, 1,300 conveyor power turns, 350 motor control panels, 300 radio frequency readers, and 112 programmable logic controllers add up to a fiendish complexity proving intensely difficult to finetune to success. Currently a major problem is in vibrations that cause connections between rails to separate from each over time, with cart wheels jamming as they pass over the warped section. ``The Frankfurt Airport in Germany is the only other airport in the world to employ such a large system, and it took AEG Inc. six years to install the system and two years to correct all the software and hardware bugs.'' The newspaper devoted an entire column, albeit speculative in tone, to the software problems that plague the system, interviewing various industry experts on the difficulties of real-time software (apparently BAE engineers were unavailable or unwilling to comment). ``The BAE system employs laser scanners that read bar-coded labels placed on baggage. Experts say that means the BAE system probably employs real-time, numerical-control software. If running off-the-shelf software is the equivalent of riding a bicycle with training wheels, running real-time software is like running a unicycle.'' John Sloan, software engineer for the National Center for Atmospheric research in Boulder, commented: ``You run a test and something won't work. What you'd like to do is say `We'll fix the problem and run exactly the same test again.' But you can never produce the same circumstance two times in a row.'' The column compared the DIA baggage software problems to the now-infamous incident of the AT&T switching failure: ``Even the smallest error can cause a ripple effect that turns into a tidal wave of the kind that swamped AT&T's main switching system several years ago and shut down nearly 90% of the phone company's domestic long-distance operations for hours.'' Well-known Boulder software programmer Phil Zimmerman, developer of Pretty Good Privacy encryption software, commented: ``Software is the most complex of human inventions. It's kind of like if airplanes were like software systems, if someone forgot to tighten a screw under a seat, it would cause the engines to explode.'' The airport is now costing about $15 million a month in interest payments. Bond rating agencies Moody's and Standard & Poors currently rate DIA airport bonds `BBB' or two notches above `junk', and have notified the city that a downgrade may be imminent. This would escalate interest rates on future borrowing by the city to finance further delays in opening. Delays at this point cost the city about $1 million a day to service the $3.1 billion in bonded debt and pay operating expenses. === The Art of Troubleshooting tmp@netcom.com As a passionate software engineering practitioner I can't resist taking this opportunity to offer some editorial comments on DIA's woes. The preeminent software engineering theme of managing the complexity of a massive system is well-represented in the DIA `debacle'. I hope others knowledgeable on the subject will be able to comment on whether DIA problems can really be attributed to inherent, inescapable, inexplicable difficulties of complex system design that torment even the most brilliant and farsighted engineers, or if they are due to poor grasp of these issues by engineers and managers ill-equipped to deal with them. (At least the mayor's comment, `We're dealing in areas we don't even understand' does not exactly inspire confidence or supply evidence for the former case!) Personally I find the attitude that complex systems such as BAE's baggage system or AT&T's switching network are somehow so monumental as to transcend the centuries-old realm of basic, sound engineering principles very annoying. I think the correct perspective is not that the complexity of these projects represents some kind of limitation in human capability to tackle, but that they are exposing how inadequately-- and even ineptly-- we have faced the task of developing design and troubleshooting techniques for this relatively-new discipline of software engineering. We constantly hear the refrain that building a piece of software is not like building a bridge (something that supposedly has reached and represents a plateau in engineering excellence), that it is infinitely more complex. To the extent that this requires more sophistication on the part of the engineers and the managers tackling the design, I heartily concur. To the extent that this excuses failures in foresight or planning or performance in the final project, I angrily dissent. The basic message that failures in software teach us, I think, is that a new breed of engineer is now required to attain new levels of attention to detail, excellence, and perfection at a degree that is historically unprecedented within the engineering profession. It seems to me that the `revise a design until it is adequate' approach permeates the classic engineering mentality, and that software and hardware and complex system design show the mentality to be fundamentally flawed. In the past, if a bridge was not strong enough, the engineers just built more columns or strengthened the existing ones. What determined if it is strong enough? It is strong enough if it doesn't break! Such simplistic approaches, circularly-reasoning and ridiculous-sounding on their face, nevertheless seem to lie at the heart of how many software engineers approach design problems, and in my opinion also lie at the heart of their failures. More in software development than in any other engineering profession, poor initial design decisions cannot be easily undone, or not ameliorated at all, by `incremental remodelling'-- quite to the contrary, the flaws are often exacerbated by it, the stability further derailed by them-- and any feeble, lame, or conspicuously absent attention and *thought* lurking behind the process and edifices is inevitably, rudely, mercilessly, exposed. What I want to emphasize is that it is not the case that sophisticated software is beginning to transcend inherent human engineering capabilities-- rather, it is beginning to show the rips, holes, shears, and shoddiness in routine practices that masquerade as sound, traditional engineering approaches. Now let me apologize and atone for all this self-indulgent, platitudinous philosophizing by offering some crisp, concrete, and tangible suggestions. One area of software development that seems to me to be very ill-addressed and unexplored to date is the possibility of elevating troubleshooting to a discipline in itself. We must find ways of using the power of software itself to defray the fiendish complexities of design and smooth operation. As a case in point, a software engineer quoted in the DIA article above repeats the cliche about the impossibility of replicating conditions that led to a software failure in a complex, real-time system. But what is to stop engineers from designing an auxilliary, monitoring system that watches and tracks failures in the original system? For example, DIA engineers could mount video cameras and develop sophisticated monitoring software that actually records the exact configuration of the system that led to a failure-- making replication unnecessary! What lines of code were executing, what was the state of internal variables, at the moments a particular suitcase was misrouted? What was the state of all the switches or power sources when a switch failed? This is all basic, critical, *obvious* information. Yet today, DIA engineers (and I use them as a representative microcosm of the field) are apparently reduced to employing techniques not noticably more advanced than psychic clairvoyance to determine what led to failures! I am really quite flabbergasted at how little information some software engineers have about failures in the code *they themselves* have designed-- often it is nothing more than `it broke'! (Or, in the DIA case, with luggage-litter strewn about the floor under the conveyors, `it dropped'!) Do they believe that they will never be required to troubleshoot it? Why can't the hardware and software for troubleshooting itself be designed into the system? In the best systems it is-- robustly and elegantly-- but how long will it be before *every* software engineer realizes that a debugger is to a program like `Yin' is to `Yang'? Today, software engineers routinely, repeatedly build the equivalent of our national highway system-- without any emergency lanes! And very soon they find that they sorely need them! The engineer is not constrained by any theoretical limitations that prevent introducing an unlimited array of auxilliary monitoring, `fault anticipating' structures to his system, but the concept is completely alien to many. Many engineers will regard these suggestions with intense, conservative pessimism, saying that these approaches are already in use, are too expensive to use, etc. My response is that (1) where they are used, they are often unrefined and undeveloped, sometimes at the worst degree precisely where they are most sorely needed, (2) the cost of `information-scarce' troubleshooting is astronomical in comparison to the alternative, and (3) anyway, these objections disguise the horrifyingly naive or ignorant attitude that troubleshooting is not an inherent part of design. What is wrong with starting design with the premise, `the system will inevitably have failures that must be tracked'? Perhaps some engineers consider this approach an insult to their competence-- but those that do may later become intimately familiar with far more unequivocal and damaging intimations of inadequacy, such as the eyes of the world's plane travellers focused in terror on a $3.7 billion luggage-blender! I would be delighted if a DIA engineer wrote to RISKS to describe how sophisticated their troubleshooting tools are and that the failures in the system exist in spite of them and are simply intrinsically inescapable. But it appears to me from fragmentary press acounts that the engineers doing the troubleshooting have only the equivalent of stone-age clubs at their disposal, if any at all, and in fact are quite embarrassed, or too intimidated or oppressed to admit it publicly. To further this idea of elevating the practice of troubleshooting to a new discipline, here are some further suggestions. How about separating it as an entirely new engineering profession and developing college curriculums around it? Also, if there are very many firms that specialize in consulting in software *debugging* alone, they have escaped my attention. If we had consulting firms that were solely dedicated to `complex system troubleshooting', i.e. developing the techniques, tools, science, profession, and art thereof, surely DIA and the BAE baggage-manglers would be banging down their doors today, falling over themselves to sign a lucrative contract, and the mayor of Denver, Colorado would not be diminished and humiliated to the extent of seeking deliverance from a nebulous heroic ``NASA level consultant'' to help DIA engineers with all the ``areas we don't even understand''. E------------------------------------------------------------------- FASE Volume 4 Number 11 Send newsletter articles to fase-submit@d.umn.edu or fase@d.umn.edu Send requests to add, delete, or modify a subscription to fase-request@d.umn.edu Send problem reports, returned mail, or other correspondence about this newsletter to fase-owner@d.umn.edu or kpierce@d.umn.edu You can retrieve back issues by anonymous FTP from from ricis.cl.uh.edu. You can access them through WWW at URL http://ricis.cl.uh.edu/FASE/ Keith Pierce, Editor Laurie Werth, Advisory Committee Department of Computer Science Dept. of Computer Science University of Minnesota, Duluth Taylor Hall 2.124 Duluth, MN 55812-2496 University of Texas at Austin Telephone: (218) 726-7194 Austin, Texas 78712 Fax: (218) 726-6360 Telephone: (512) 471-9535 Email: kpierce@d.umn.edu Fax: (512)471-8885 Email: lwerth@cs.utexas.edu