Forum for Advancing Software engineering Education (FASE) Volume 8 Number 06 (101st Issue) - June 15, 1998 824 subscribers Note: If you have problems with the format of this document, try ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Table of Contents This Month's Topic: Software Metrics Next Month's Topic: Licensing of Professional Engineers in SE Upcoming Topics News Items Texas Board of Professional Engineers: Update Calls for Participation CSEE&T 99 Expanded CFP - Note Workshop Proposals are due July 1! Software Engineering Education Symposium (SEES) '98 Internet-Based Survey on Software Engineering Contact and General Information about FASE ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ From: Susan Mengel, Guest Editor This Month's Topic: Software Metrics GUEST EDITOR: SUSAN MENGEL (CS Prof, TTU) CONTENTS 1) Preface (Susan Mengel) 2) Safety as a Metric (Matthias Felleisen and Corky Cartwright) 3) Static Analysis using Verilog Logiscope (Suzanne Kocurek) 4) Software Metrics from the Viewpoint of the Trainer (Katsuhiko Hirai) 5) Using Software Metrics as a Plagrarism Detection Tool (Warren Harrison) 6) Some Recommended Sources (Susan Mengel) ###################################################################### 1) Preface Susan Mengel Assistant Prof of Computer Science Texas Tech University mengel@ttu.edu I would like to thank those who contributed to this issue of FASE on the topic of software metrics. There are articles on using safety as a metric, static analysis, metrics in training, and using metrics to detect plagarism. I also have a section on where you can go to find more information on software metrics in case you are interested in teaching a class on it or starting a metrics program. Software metrics is fast becoming a critical field for helping to produce software on-time and on-budget. I first started looking at software metrics out of, for lack of a better word, frustration. I continued to notice that students were only able to get up to a certain level of programming ability and then would not really continue to improve. Better programming examples helped to a point, but did not seem to give the skill improvement needed. I resolved to find out what was happening and so I undertook a project to analyze the introductory level programs submitted by students using static analysis to collect metrics, such as McCabe's Cyclomatic Complexity, number of lines of code, and number of paths through the program. The results were dramatic with wide ranging values for the metrics. The difference in control flow graphs among the students' code was astounding and it was clear the grading process was not helping students to improve. From this small measurement study, I am embarking as others have before me, on ways to introduce metrics into the computing curriculum so that students have a means to visualize their code through measurements and control flow graphs. The result I hope to achieve in the short-term is raising the programming ability of students and helping them to understand the connection between their original design and what their program actually looks like by showing students control flow graphs. The effort students will have to put forth will be minimal with the use of a commercial static analysis tool which can analyze their code at compile time. I am currently using Verilog Logiscope which is provided free to educators with a limited number of licenses. Oddly enough, the use of static analysis in the classroom is not widespread or is not chronicled in the literature. My concern with this lack of use is that industry is using metrics more and more to ensure safety critical systems and to improve the development process to a specific level of the Software Engineering Institute's Capability Maturity Model. Students need to have exposure to metrics in the educational setting or they will need extensive training as provided by the firm with which they get a job. Further, by the time they get to industry, any bad habits will be engrained and will be difficult to change. ###################################################################### 2) Safety as a Metric ------------------ Matthias Felleisen and Corky Cartwright Rice University Background: Mechanical engineers design systems so that they stabilize in the least-damaging position when [a part of] the device malfunctions. For example, the bars at a railroad crossing fall to the blocking position if both the primary power-supply and the backup battery fail to provide electricity. In contrast, many existing software systems typically continue to compute after a primitive operation [computation step] misinterprets data. The problem is due to the wide-spread use of unsafe programming languages. An unsafe language does not ensure that primitive program operations are applied to arguments of the proper form. For example, a program may misinterpret an array A of 10 elements as an array of 11 elements, corrupting whatever data is stored in the location corresponding to the 11th element and continuing execution using the corrupt data. Since unsafe languages generally do not detect the misapplication of program operations, they cannot detect malfunctions, much less respond by taking remedial action. As a result, a malfunctioning program may print outputs or control external devices based on corrupted and meaningless data values. A safe language prevents this basic misinterpretation of data. In contrast to an unsafe language, a safe language completely specifies the behavior of each primitive program operation. In particular, it specifies the set of data for which the operation is defined and, implicitly, those data for which it is undefined. The language implementation then ensures with a combination of compile-time and run-time checks that program data is never corrupted. If the required check is deferred until run-time, the language generates an exception if the check fails, giving the program the opportunity to take remedial action. Unfortunately, a program written in a safe language can only recover from potential malfunctions anticipated by the programmer; all others abort program execution and print a diagnostic error message. While such error detection may be sufficient in applications with a human operator (such as patient monitoring) who can manually recover from the fault, it is not very helpful in applications where no human intervention is possible (such as rocket guidance). A Semantics-Based Metric: We can make programs written in safe languages even more reliable by performing much more precise static analysis than conventional static type checking. Proof systems can reliably pinpoint all potential fault-sites for a software system. More precisely, a programming environment may analyze a program in a safe language using a conservative approximation of the semantics and isolate the program operations that the analysis cannot prove are always applied correctly. Such an environment can then lead the programmer through an examination of the isolated operations and help the programmer determine whether the site is a bug or exposes imprecision in the program analysis. In the former case, the programmer will correct the program text; in the latter case, the programmer may wish to rewrite the program, use a stronger proof system, or document why the fault cannot occur. The fewer potential fault sites a system contains, the less likely it is to signal an error. Hence, an obvious semantics-based program metric is the number of potential fault sites remaining in a program after a comprehensive static analysis. Note that this form of analysis is not meaningful for unsafe languages because a single faulting operation can corrupt any program data structure and invalidate all analysis results. Over the past nine years, the Rice Programming Languages Team has built several safety-based programming environment tools for various dialects of Scheme. The first two, constructed by Fagan [Fagan90, Cartwright & Fagan90] and Wright [Wright94, Wright & Cartwright97], were based on Milner-style type unification and used an algebra of record types. The third one, by Flanagan et al. [Flanagan96], employs a form of set-based analysis [Flanagan97, Heintze94]. The analysis infers the flow of data by interpreting program operations as naive set-operations. The analysis tool, dubbed MrSpidey, is an integral part of our Scheme programming environment [Findler97]. Currently, a team at DEC SRC is re-implementing MrSpidey for Java and will combine it with SRC's extended static checking tool [Detlefs]. Course Use: The authors have obtained an NSF grant to develop a sequence of courses that introduce students to the foundations of safety and the use of safety-based metrics in programming environments. (The course sequences will also introduce students to the use of object-oriented design patterns in sequential and distributed programs.) The first version of the course will be taught in the fall of 1998. The authors intend to post descriptions about our experiences to the mailing list. In addition, Rice's CS department will organize a workshop for interested faculty members and ndustrial trainers during the summer of 2000. The article is based on two previous papers by the authors: - Felleisen. Safety: a semantics-based software metric. Submitted for publication. - Cartwright & Felleisen. Program Verification through Soft Typing. ACM Computing Surveys, June 1996. References: Cartwright, R. and M. Fagan. Soft typing. In Proc. SIGPLAN '91 Conference on Programming Language Design and Implementation, 278--292. Detlefs D., An overview of the extended static checking system, Proceedings of The First Workshop on Formal Methods in Software Practice, 1996, ACM (SIGSOFT), January, 1--9. Fagan, M. Soft Typing: An Approach to Type Checking for Dynamically Typed Languages. Ph.D. dissertation, Rice University, October 1990. Findler, R., C. Flanagan, M. Flatt, Shriram K., and M. Felleisen. DrScheme: a pedagogic programming environment for Scheme. In Proc. 1997 Symposium on Programming Languages: Implementations and Logics. Flanagan, C. Effective Static Debugging via Componential Set-Based Analysis Ph.D. dissertation, Rice University: August 1997. Flanagan, C., M. Flatt, Shriram K., S. Weirich, and M. Felleisen. Catching bugs in the web of program invariants. In Proc. Sigplan '96 Conference on Programming Language Design and Implementation, 22--32. Heintze, N. Set based analysis of ML programs. Technical Report CMU CS 93-193. Carnegie Mellon University, July 1993. In 1994 ACM Conference on LISP and Functional Programming, 1994. Wright, A.K. Practical Soft Typing. Ph.D. dissertation, Rice University: August 1994. Wright, A.K. and R. Cartwright. A practical soft type system for Scheme. ACM TOPLAS, 1997. In 1994 ACM Conference on LISP and Functional Programming, 1994, 250--262. ###################################################################### 3) Static Analysis using Verilog Logiscope Suzanne Kocurek Verilog, Inc. Abstract: Software metrics were originally used to evaluate the quality of previously developed (legacy) code, and are now beginning to be used in the early stages of the development process. There is a positive correlation between the use of metrics at development time and the final software quality. It has been shown that using metrics improves the developer's ability to produce more reliable, more efficient, and more maintainable code. It also drives project members toward a more common style, or "in house standard". When software metrics are used in the development stage, quality is increased by catching errors early, thus significantly reducing the cost for test and maintenance. Since it is clear that using metrics directly affects the way developers write code, they are now widely accepted as a way to improve the quality of source code. What kinds of software metrics are available, and how can quality be modeled? In general there are two types of software metrics: quality metrics and metrics that reflect process and project activity. Software quality analyzers, like CodeChecker, parse existing source code and only provide quality level metrics. Static analyzers cannot obtain process and project metrics. * Modeling quality Software must first be modeled in order to evaluate the quality. The modeling approach used by LOGISCOPE is similar to those defined by Boehm [BOEHM, 75] and [McCALL, 77]. According to this approach, the software quality can be defined as a set of characteristics which: - are important to the user: the quality factors, - can be decided by the designers: the quality criteria, - can be measured for verification purposes: the quality metrics. The advantage of this approach is that quality is: - specified in terms of factors - designed in terms of criteria, - built with the help of programming rules - assessed by means of metrics. For example, if maintainability is among the most crucial quality criteria of an application, our first aim will be to find (a better) possibility of detecting, locating and correcting anomalies, of introducing minor changes and then of accomplishing the required functions with the anticipated resources [GAM-T17 (V2) July 1989]. In this case, the most crucial characteristics are determined form the maintenance engineer's viewpoint that will define the quality factor. To satisfy this factor, the software designers will opt for, among other things, a self-descriptive application defined as (...) attributes of the software that provide explanation of the implementation of a function [Ref.McCALL, 77]. The quality characteristics, which are determined by software designers, are quality criteria. In order to reach a sufficient level of self-documentation, the following programming rule will be respected when building the application: The source code must contain at least 1 comment for 5 executable statements and 1 comment at the most of 1 statement. To check that this rule is respected throughout development, CodeChecker will measure the frequency of comments (number of comments / number of statements). This frequency must be between 0.2 and 1. The quality characteristic can be verified: it is a quality metric. For example, the self-description, modularity, readability, and simplicity criteria all contribute toward satisfying the maintainability quality factor. Similarly, quality criteria can be assessed by means of a set of quality metrics (e.g.: a correct comment frequency value meeting the self-descriptive quality criterion). These contribution relationships allow the quality of an application to be modeled in tree form: FACTOR CRITERIA METRIC ---- COMMENT FREQUENCY | ----- SELF-DESCRIPTIVENESS -------- ... | | ---- NUMBER OF STATEMENT | | MAINTAINABILITY ------ SIMPLICITY ------------------ ... | | ----- ... * Quality model: The quality model is usually a text file containing the definition of parameters, which are consulted by CodeChecker to assess the quality of a program (the functions of a program are classified according to the determined quality factor). CodeChecker interprets the quality model in order to calculate a report according to the current quality factor for the current LOGISCOPE Application. The resulting report is based on information extracted from the source code files composing the Application's function. In a quality model, quality factors are associated with criteria and metrics. The threshold values associated with the metrics constitute the limit values of the quality model. - Quality factor: A quality factor is defined by a set of quality criteria. For example, the self-description, modularity, readability, and simplicity criteria contribute to satisfying the maintainability quality factor. - Quality criterion: A quality criterion is a combination of weighted metrics. The weighting coefficient associated with a metric is proportional to the metric's importance in the evaluation of the criterion. The criterion is calculated taking into account the limit values for each metric included in its definition. Classes are defined for each criterion. Each function enters into one criterion's classes as a function of the combination of its weighted metrics. - Metrics: Metrics are static measurements (i.e. obtained without executing the program). They provide indications relative to the analyzed program's complexity (how easy it is to understand, its readability, its maintainability, and its reliability). These indications are given by four types of measurements: - Textual metrics: based on counting the operands and operators used by a program. Textual metrics make it possible to identify problems relative to understanding the code, due, for example, to code repetitions, or to a component or statement that is too big. These metrics are based on the software characterization techniques developed by HALSTEAD. - Structural metrics: quantify the complexity of a function's control flow, in order to evaluate the effort required to understand its algorithm, or to test the various paths that pass through it. - Data flow metrics: counting the number of input and output parameters to a function, or the number of variables or macro- instructions used by the program. These evaluate the effort that will be required to understand and modify the program. - Comments metrics: check whether or not there are enough comments to help the reader to understand the source code. These metrics can be combined to make new ones. How can LOGISCOPE help? LOGISCOPE, by VERILOG, Inc., is a scalable toolset consisting of CodeChecker, RuleChecker, ImpactChecker, TestChecker and Viewer. LOGISCOPE is used at many universities to familiarize students with the concepts of software measurement and testing. These concepts are used in many industrial areas of software development. LOGISCOPE shows students who are new to a programming language how they can improve their style, and it educates engineers on the impact of certain software styles/constructs in terms of testing, maintenance and performance. What types of results can be obtained with LOGISCOPE? There are many different aspects to consider when developing and analyzing source code. First, students will immediately obtain information about certain criteria or metric violations after parsing the source code, or pieces thereof. Any metric violation is linked to the source code with an explanation about the measurement theory, itself, and more importantly, recommendations for modification, and the benefits of each possible modification. A typical result would look like: SIMPLICITY FAIR - VG 13 - N_STMTS 54 - AVG_S 4.76 Number of statements: STMT = 54 Range of acceptable values [1 50] Definition: ----------- Number of STateMents that can be executed between the function's header and the closing curly bracket. The following are statements: ;(empty statement) IF [ELSE] SWITCH WHILE DO FOR GOTO BREAK CONTINUE RETURN THROW TRY ASM expression;(simple statement) Statements located in the external declarations are not taken into account. Explanation: ------------ This metric is a good indicator of a function's maintainability. Experience has shown that the number of statements is correlated with most of the metrics defined by Halstead's theory (see textual metrics section). In fact, the greater the number of statements contained in a function, the greater the number of its operands and operators, which means that a greater effort will be required to understand the function. It is therefore desirable that the number of statements should be limited in each of the program's functions. Action: ------- In order to reduce the size of a function, it must be broken down into several subfunctions. This breakdown makes it possible to establish a better hierarchy of the functions performed by the program, and therefore improves maintainability. ----- As you can see in this example, there are 3 metrics used by a criteria called SIMPLICITY. LOGISCOPE is using ISO 9126 standards, introducing the concept of factor - criteria - metrics, which allows you to group several metrics into a criteria (i.e: merging VG, AVG_S and N_STMTS into SIMPLICITY). This approach will help the student differentiate between groups of metrics and criteria, since some of them have inverse relationships (e.g. efficiency and portability). In addition, the use of criteria will show that only metric combination allows you to give priority to the complex components. Consider three different C functions: function (A) using a switch and seven case statements; function (B) using seven sequential 'if/else' constructs; function (C) built of seven nested 'if/elseif' constructs. The cyclic number, VG, is identical, whereas the complexities are obviously different. The number of possible paths NPATH (A) is equal to NPATH(C). The Number of nested level N_LVL (A) is equal to N_LVL (B). This shows to the developer that he has to take into account more than one metric to evaluate a given requirement. Use of metrics as an indicator for critical programming habits. According to different requirements/standards, students must learn to adapt the use of metrics to diversified projects ranging from writing code in compliance with certain naming conventions, to the ultimate goal, of identifying critical pieces of code for particular problems. Typical objectives are: i) Detect code duplication, by means of Vocabulary Frequency. ii) Inspect functions with auxiliary exits, by means of Number of IO. iii) Look for complex functions, by means of VG and Nested Level iv) Detect highly solicited components, by means of Call Paths. What comes after metrics? Metrics/Criteria are very good indicators of software quality, however, they are entirely textual, and are usually used only as indicators. To deepen the understanding, it is recommended to graphically visualize the obtained results in order to investigate the structure of a subroutine and its calling relationship to other subroutines, as done with LOGISCOPE Viewer. Complementary to static analysis, the student should become confident in developing defined rules, such as "No Goto", which is often strongly related to metrics (see ii) above). Typical rules cover programming, naming, and presentation rules. For example, constants have to be in upper case letters, or variables have to be assigned to a value at declaration time. RulesChecker introduces this kind of concept. Bibliography [BOEHM] B.W.BOEHM - Characteristics of Software Quality -TRW North Holland, 1975. Software Engineering Economics - Prentice Hall. [McCALL] J.A. McCall - Factors in Software Quality - General Electric n77C1502, June 1977. [HAL] M.H. Halstead - Elements of Software Science - North Holland, Elsevier 1977. ###################################################################### 4) Software Metrics from the Viewpoint of the Trainer Katsuhiko Hirai Assistant General Manager of Human Competence Development Center Matsushita Systems Engineering Co. I think that the process for students in studying software engineering is important enough that students need advice on the personal statistical data dependent on them. In contrast, we used to focus only on the result of their study. So, by examining their memorized knowledge, we would be satisfied on the performance of our lecture to them. Despite our focus, the students would lose our basic training in their mind, unless we continued to have a friendly relation with them. I would like to know the trial and error conditions of my students who are studying programming, and understand what they need to progress most in my lecture. Thus I have developed a tool to measure the procedure for students to edit, assemble, and link their programs from my lecture and practice of software engineering education with freshmen of our company in 1997. In this year I am going to teach freshmen in my lecture, to consult the statistical data of my customer. I think that I can follow students like a shepherd under the data as follows: 1. Editor editing-spent-time, source-copied-count, source-combined-count, source-pasted-count, source-cut-count 2. Assemble assemble-error-categorized-count, used-instruction-base-appeared-count 3. Link linker-error-categorized-count, program-size 4. Mentality data I am going to prepare the paper which introduces the relation between learning process and mentality. In this year, I will develop a "C" program interpreter with the measurement of the learning procedure with my students for next year's freshmen's learning circumstance. ###################################################################### 5) Using Software Metrics as a Plagarism Detection Tool Warren Harrison Department of Computer Science Portland State University Portland, OR 97207-0751 warren@cs.pdx.edu http://www.cs.pdx.edu/~warren 1. Introduction Like many other institutions, we have begun to feel the need for increased "delivery efficiency" in our courses. As a consequence, a Freshman programming course that had three sections of 30 students each six years ago might now have a single section with an enrollment approaching 100. Parallel to the increase in "delivery efficiency", we have also been experimenting with increasing the "administration efficiency" of the course. This entails all aspects of management and organization, but particularly submitting homework, grading it, and returning the graded material to the student. 2. Techniques for Administrative Efficiency Most of our students' homework entails computer programs, or other material which will almost always be in an electronic form suitable for e-mail. Therefore, it has become expected that students will submit their programming solutions by e-mail. This is administered by having students mail their submissions to a special account. A mail filter on this account saves the message in one of several exercise directories depending upon the Subject: Header, with a file name consisting of the student's login name. In order to provide the student with verification of submission, the mail filter also runs a script that e-mails a response back to the student acknowledging receipt of their e-mail. As soon as the due date passes, a teaching assistant prepares a special test harness consisting of a combination of shell and Perl scripts. The test harness in turn compiles and runs each submission against a carefully engineered set of test data. Each submission's run is automatically compared against the "correct" behavior. The test harness yields a report for the class listing each program run, and the number of test cases it passed. Submissions that did not compile or encountered run-time errors are also flagged, so the teaching assistant can investigate the cause further. Likewise, submissions can also be run through other automated analyzers such as "lint", as part of the test harness, and further stylistic issues can also be raised on the automated grading report. The goal of this process, is of course, to minimize the amount of human contact involved in evaluating the homework submissions, thereby increasing the classes' administrative efficiency. 3. Dangers of Plagiarism in an Automated Grading Environment Because an automated grading environment reduces the amount of time humans spend looking at student programs, it also reduces the opportunities to find untoward similarities between submissions. In a manual environment, illicit copying is often found by noticing a strange misspelling in two different programs, a unique approach to the problem, etc. However, in both automated environments and manual environments where programs may be graded by different individuals, such opportunities are far and few between. Of course, manual examination of programs is not in and of itself adequate to catch plagiarized computer programs. Dishonest students have used simple syntactic changes for years to hide similarities. Changes, such as moving statements around, reordering declarations and global replacements of variable names often are enough to keep a human reviewer in the dark about similarities between two programs. To effectively combat plagiarism requires a process which overcomes the reduced human analysis time inherent in an automated environment and which mitigates the effect of simple syntactic changes. 4. Using Code Metrics to Detect Similarities We can very easily utilize software metric tools to capture various characteristics of a program. Because this analysis is automated, an entire class of submissions can be processed in a matter of minutes. Further, many popular code metrics defeat the effect of naming and ordering changes. In particular, we utilize Halstead's Software Science measures [1] in our efforts at detecting plagiarism. Halstead's metrics are based on four fundamental measures which are derived from a program's source code: n1 - number of unique operators n2 - number of unique operands N1 - number of total operators N2 - number of total operands Clearly, simply moving statements around and globally replacing variable names will change neither the number of unique operators and operands nor the number of total operators or operands. Likewise, changes in comments, spacing, and indentation which are other favorite ploys of dishonest students will have no effect on the Software Science measures either. For instance, here are two C functions, the second was created from the first by a judicious use of the global "substitute" command in vi and some change in indentation rules: Code Example 1: =============== void put_str() { if (j > 0) { if (sflg != 0) { ptabs(); sflg = 0; if (aflg == 1) { aflg = 0; if (tabs > 0) fprintf(f2," "); } } string[j] = '\0'; fprintf(f2,"%s",string); j = 0; } else if (sflg != 0) { sflg = 0; aflg = 0; } } Code Example 2: =============== void dump_text() { if (index > ZERO) { if (status_flag != ZERO) { ptabs(); status_flag = ZERO; if (action_flag == 1) { action_flag = ZERO; if (tabs > ZERO) fprintf(f2," "); } } text[index] = '\0'; fprintf(f2,"%s",text); index = ZERO; } else if (status_flag != ZERO) { status_flag = ZERO; action_flag = ZERO; } } The similarities between the two code segments are not immediately obvious. However, both possess the same basic Software Science measurements as computed by SET Laboratories' UX-METRIC for C++ analysis product [2]: Measurements ============ Function n1 n2 N1 N2 N N^ P/R V E VG1 VG2 LOC --- --- ---- ---- ---- ---- ---- ---- ----- --- --- ---- put_str() 14 11 46 28 74 91 1.23 344 6123 6 6 14 dump_text() 14 11 46 28 74 91 1.23 344 6123 6 6 26 Clearly, the fact that two pieces of code have the same Software Science measurements is not in and of itself evidence of plagiarism. However, it should alert the grader that perhaps two files should be studied manually to determine if they have too many similarities. Use of metrics to identify potentially plagarized code can suffer from two sorts of errors: Two code components are in fact the same, and the metrics don't detect it (Type I Error) Two code components are not related, but the metrics suggest they may be, calling for a manual analysis (Type II Error) Depending upon the resources at the disposal of the grader, one type of error may be less acceptable than another. By adjusting the metrics we consider, we can decrease the probability of one error at the expense of increasing the probability of the alternate type of error. For instance, we could limit our comparison to simply n2; n1 and n2; n2 and N2; etc. Ottenstein [3] noted that the probability of two functions having identical Software Science metrics was low if the metrics are near the mean of the sample. However, if the measurements are near the tails of the sample, the probability that they are in fact different pieces of code became quite small. This approach minimizes Type II Errors, and protects valuable grader time from being spent examining programs for plagiarism that are in fact different. On the other hand, it is likely many plagiarized pieces of code will "slip through". 5. Comparing the Measures Potentially, every program or function turned in by the members of a class could be a plagiarized copy of someone else's. If each of the "N" programs was to be compared with every other program, the amount of work to do would be on the order of N-squared. For a class size of 100 students, the metrics from each of the programs would have to be compared against the metrics of the other 99 programs. Should we be interested in comparing the metrics for each function against the metrics of the functions in all the other programs (since functions may also be renamed and moved about inside the files to further obscure copied code), the activity becomes even more time-consuming. However, by collecting the metrics from all the programs/functions and sorting them (N log N) we can identify matching measurements in linear time by simply making a sequential pass through the sorted observations. 6. Experiences Currently, our experiences using this approach are only anecdotal. We know that there are numerous ways that the metrics can be foiled by a resourceful student bent on cheating. As we "tighten the net", more and more resources are required to examine functions that are mistakenly identified as potential plagiarism (of course, the probability that we will identify copied code also increases). On the other hand, techniques such as Ottenstein's help conserve resources at the expense of missing some plagiarized code. However, we find that using automated plagiarism detection serves as an effective deterrent to cheating. Students are not quite sure what the limitations of the technique are and by carefully publicizing cases that are correctly identified their uncertainty is maintained. We are currently planning a comprehensive study to determine the effect of automated plagiarism detection upon student behavior. 7. References [1] Halstead, M., Elements of Software Science, Elsevier Publishers, New York. 1977. [2] UX-METRIC Users' Guide, SET Laboratories, Inc., Mulino, Oregon, 97042. 1994. [3] Ottenstein, K., "An Algorithmic Approach to the Detection and Prevention of Plagiarism", ACM SIGCUE Bulletin, Volume 8, #4, 1976. ###################################################################### 6) Some Recommended Sources Susan Mengel CS Prof, TTU mengel@ttu.edu Most sources on metrics are from an industry viewpoint; i.e., what metrics program is in use and how it was established. These are good tutorials on how to get started on metrics. There are fewer academic textbooks on softare metrics and what is available is at the graduate level. BOOKS ----- 1) "Software Metrics, A Rigrous & Pracitcal Approach, 2nd ed.", Norman E. Fenton & Shari Lawrence Pfleeger, PWS Publishing, 1997. This is one of the few metrics books written as a textbook with exercises. It has a good focus on experimental techniques and data collection. It also contains good material on the field of software metrics with an analysis of the work performed by researchers. 2) "Software Metrics. Measurement for Software Process Improvement", Barbara Kitchenham, NCC Blackwell, 1996. 3) "Software Metrics: Establishing a Company Wide Program", Robert B. Grady & Deborah L. Caswell, Prentice-Hall, 1987. "Practical Software Metrics for Project Management and Process Improvement", Robert B. Grady, Prentice-Hall, 1992. These books relate the effort taken to start the use of metrics and their collection at Hewlett-Packard. Very good with practical advice. 4) "Object-Oriented Software Metrics", Mark Lorenz and Jeff Kidd, Prentice Hall, 1994. This book got me started on thinking about what object-oriented metrics would be useful to collect. Very practical book with a large number of metrics and case studies on the author's own and other software. You should also look at the work by Chidamber and Kemerer for a more formal analysis of the use of object-oriented metrics. 5) "A Discipline for Software Engineering", Watts Humphrey, Addison-Wesley, 1995. A great book to analyze your own personal programming habits. TECHNICAL JOURNALS ------------------ 1) IEEE Transactions on Software Engineering 2) ACM Software Engineering Notes (SIGSE Publication) 3) IEEE Software 4) Journal of Systems and Software 5) IEEE Software WEB --- 1) Object-Oriented Metrics: an Annotated Bibliography http://dec.bournemouth.ac.uk/ESERG/bibliography.html 2) Metrics 98 http://aaron.cs.umd.edu/Metrics98 3) ACM SIGMETRICS http://www.cs.du.edu/sigmetrics/ 4) Verilog Corporation - Logiscope http://www.verilogusa.com ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ By: Don Bagert (Academic/Misc Editor) Next Month's Topic: Licensing of Professional Engineers in SE Very few countries currently license software engineers. As has been reported in FASE during the last several months (starting with the December 1997 issue), Texas has been planning to license professional engineers (PEs) in software engineering, with the final vote planned for 17 June 1998 (see article elsewhere in this issue). It is expected that the rest of the United States will follow suit soon thereafter. Licensing of the software engineers by a governmental unit has an obvious impact on the higher education of students who plan to pursue a license. For instance, in the United States, it is much easier to obtain a PE license if one has a degree from a program accredited by ABET (Accreditation Board for Engineering and Technology). As reported in previous issues of FASE, ABET is considering program criteria for accrediting software engineering programs. Another issue involves the fact that computer science programs accredited by the Computer Sciences Accreditation Board (CSAB) may be considered equivalent to an ABET-approved degree for the purposes of licensing. Finally, the morning session of the Fundamentals of Engineering (FE) exam, which is given in the United States to engineering students around the time of their graduation, includes section on topics such as statics, thermodynamics, and materials. (The nature of the morning section may change because of the inclusion of software engineering as one of the disciplines; for instance, there may be additional sections on the exam, and the student picks X out of Y parts of the exam to take. However, nothing has yet been decided on this issue.) Position statements on the issue of education of students who potentially could be licensed as software engineers are requested. Some of the questions that could be addressed are: 1. How much should a software engineering curriculum be affected by a licensing exit exam such as the FE? 2. (a) How will software engineering programs (and the potential for new programs) be affected if accredited computer science programs are considered equivalent educational background? (b) If CS students are allowed to take the same exit exam for engineer licensing as SE students, should the exam be SE-oriented, CS-oriented, or a combination? (c) How much should the computer science curriculum be affected by such an exit exam? Although the licensing issue is currently of special interest to the United States, due to ongoing events, I encourage people from outside the U.S. to share their thoughts on these issues as they relate to education and licensing in their respective countries. Please send any articles (using the format provided at the end of each issue when making submissions) to me at bagert@ttu.edu no later than July 8. ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ By: Don Bagert (Academic/Misc Editor) Upcoming topics Aug 1998: Object Technology Education and Training Guest Editor: TBA Sept 1998: Graduate SE Program Survey Results and Evaluation Guest Editor: Pete Knoke, University of Alaska Fairbanks ffpjk@aurora.alaska.edu Oct 1998: SEE&T Outside of the U.S. Guest Editor: Michael Ryan, Dublin City University mryan@compapp.dcu.ie Nov 1998: To be scheduled Dec 1998: Software Engineering Ethics Education and Training Guest Editor: Don Gotterbarn, East Tennesse State gotterba@etsu.edu All dates are subject to change. For more information about a particular issue's topic, please contact the corresponding guest editor. Please refer to the article format provided at the end of each issue when making submissions. Here are some of the other topics planned for future issues: * Accreditation * CASE Tools * Curriculum Models * Distance Learning * Software Process Improvment Education * Software Survivability Education * Student Team Projects Please send any suggestions for future topics to bagert@ttu.edu. ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ News Items ###################################################################### Texas Board of Professional Engineers: Update [Editor's Note: This is the latest of a continuing series of articles on the efforts of the Texas Board of Professional Engineers in the area of software engineering. Please refer to the December 1997, January 1998, March 1998, April 1998, and May 1998 issues of FASE for a summary of previous events.] The Texas Board is still scheduled to meet on 17 June in Fort Worth to "discuss and possibly enact rules that will recognize software engineering as a distinct engineering discipline." (http://www.main.org/peboard/softweng.htm) If the Board does proceed with such a recognition, they are still required to wait at least 20 days after such a vote before issuing any licenses in the new engineering branch. The outcome of the 17 June meeting will be reported as "breaking news" on the FASE-TALK listserv, and subsequently be included in the next issue of FASE, to be mailed on 15 July. [Editor's Note: Please refer elsewhere in this issue to a call for position statements on the issue of how the licensing of software engineers may affect their education, to be included in the July issue of FASE.] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Calls for Participation ###################################################################### From: Hossein Saiedian CSEE&T 99 Expanded CFP - Note Workshop Proposals are due July 1! CALL FOR PAPERS AND PARTICIPATION 12th Conference on Software Engineering Education & Training March 22-24 1999 New Orleans, LA, USA Sponsored by IEEE Computer Society (Pending) Supported by the Software Engineering Institute In Cooperation with the ACM SIGCSE Please join a host of international educators and trainers in the software engineering discipline for the premier conference on education and training of professional software developers. The 12th Conference on Software Engineering Education and Training (CSEE&T '99) continues a tradition of offering direction, promoting innovation and collaboration, and stimulating new instructional approaches to software engineering education and training. CSEE&T is the only conference devoted entirely to improvement in software engineering education and training. CSEE&T '99 will focus on a different theme each day. The themes are as follows: 1. Training Curricula and Distance Education 2. Professional Issues (e.g., Accreditation, Licensing, Ethics) 3. Undergraduate and Graduate Curricula Submissions. You are invited to submit research papers, experience reports, proposals for panel discussions and tutorials, and position statements for workshops in the above and other areas of software engineering education and training. You are also invited to suggest innovative topics for informal meetings and birds-of-a-feather sessions. Accepted contributions will appear in the conference proceedings published by the IEEE Computer Society Press. Selected papers and experience reports will be published in a special issue of the Journal of Systems and Software (published by Elsevier Science). CSEE&T '99 will also include several half-day workshops designed to provide a forum for a group of participants to exchange opinions on topics on software engineering research and practice, and related education and training issues. Participation in a workshop typically depends on submission of a position statement. Workshop proposals will only be accepted on one of the three conference themes. The submission format for the workshop proposals is on the CSEE&T web page. Information concerning the workshops (in order to submit position papers) will be available by July 15, 1998. Accepted position statements will be published in the conference proceedings. The 1999 conference will coordinate and synchronize its schedules with the ACM's SIGCSE Symposium on Computer Science Education which will be held in New Orleans. Joint events are planned to provide an opportunity for both software engineers and computer scientists to exchange ideas on how their activities can be more effectively integrated. For additional details, please contact the Program Chair or see the web address. Conference Chair: Hossein Saiedian, University of Nebraska at Omaha, USA, hossein@cs.unomaha.edu Program Chair: Don Bagert, Chair, Texas Tech University, USA bagert@ttu.edu Steering Committee Dan Bagert, Texas Tech University David Budgen, Keele University Neal Coulter, Florida Atlantic University Dennis Frailey, Texas Instruments Michael Lutz, Rochester Institute of Technology Mike McCracken, Georgia Tech Nancy Mead, Software Engineering Institute Michael Ryan, Dublin City University Hossein Saiedian, University of Nebraska at Omaha Program Committee Doris Carver, IEEE-CS President, Louisiana State U., USA James Cross, Auburn University, USA Jorge Diaz-Herrera, Monmouth University, USA Tom Hilburn, Embry-Riddle Aeronautical U., USA Greg Hilsop, Drexel U., USA (IS Liaison) Michael Jackson, Software Development Consultant, UK Mehdi Jazayeri, TU Vienna, Austria Pete Knoke, U. of Alaska Fairbanks, USA Renee McCauley, U. Southern Louisiana, USA Susan A. Mengel, Texas Tech University, USA Bob Noonan, College of William and Mary, USA (SIGCSE'99 Program Chair) Dale Oexmann, Rose Hulman Institute of Technology, USA David Parnas, McMaster University, Hamilton, Canada Jane Prey, U. of Virginia, USA (SIGCSE'99 Conference Chair) Michael Ryan, Dublin City U., Ireland (International Liaison) David Umphress, Seattle University, USA Ray Vaughn, Mississippi State U., USA Tutorials/Workshops Chair: Tom Hilburn (hilburn@db.erau.edu) Panels Chair: David Umphress (umphress@seattleu.edu) Birds-of-a-Feather Coordinator: Susan A. Mengel (mengel@ttu.edu) Submission Due Dates Workshop Proposals: July 1, 1998 Notification of Workshop Proposal Acceptance: July 10, 1998 Research Papers, Experience Reports Workshop, Position Papers, Tutorial and Panel Proposals: September 15, 1998 Notification of Acceptance: November 16, 1998 Camera-ready copies due: December 15, 1998 For additional information about submission requirements and conference updates please see: http://cs.unomaha.edu/CSEET99 About the location, New Orleans: "She was founded by the French, matured under the Spanish, and came to world prominence as America's second largest port city. Peoples of all colors and many nationalities have contributed to the wonderful mixture of sights, sounds, smells and tastes that are New Orleans." For additional information on New Orleans see one of the following web pages: http://www.neworleans.com, http://www.neworleanscvb.com or http://www.yatcom.com/neworl/vno.html. ###################################################################### From: Jerzy R. Nawrocki Software Engineering Education Symposium (SEES) '98 SECOND CALL FOR PAPERS SEES'98 Software Engineering Education Symposium 18-20 November, 1998 Poznan, Poland http://www.cs.put.poznan.pl/~sees98/ Submission deadline: June 30, 1998 ==================== Sponsors -------- * MOTOROLA * Copernicus-INSPIRE and Q-Labs Software Engineering * European Association for Programming Languages and Systems (EAPLS) * Polish Information Processing Society Organisers ---------- * Institute of Computing Science, Poznan University of Technology * Committee of Informatics, Polish Academy of Sciences Keynote speakers ---------------- * Nancy Mead, Carnegie-Mellon University * David Parnas, McMaster University * Ian Sommerville, University of Lancaster Satellite event --------------- * November, 21 1998 - IEEE Workshop on Real Time Systems Education (approval is pending) Important Dates --------------- Deadline for papers: June 30, 1998 Notification of acceptance: September 10, 1998 Camera-ready copies of papers: October 15, 1998 Symposium: November 18, 1998 Submission procedure -------------------- Send articles no longer than 8 pages of A4 format (as a PostScript file) to Adam Czajka at aczajka@man.poznan.pl, not later than June 30, 1998. Detailed guidelines are available on our WWW site. Proceedings will be available to the participants before the symposium. Authors of the best papers will be invited to submit extended versions to a special issue of IEEE Proceedings - Software Engineering. For registration and hotel reservation form see our WWW site. The conference fee will be 350DM. For early registration (i.e. payment before October 10, 1998) the fee will be reduced to 300DM. It will include: * participation in standard and plenary sessions, * Conference Proceedings, * coffee breaks, lunches and social program. The venue is the Centre of the Polish Academy of Sciences in Poznan, Poland (near to Poznan Opera House). ###################################################################### From: Peter Grillo Internet-Based Survey on Software Engineering Peter Grillo Computer and Information Science Department Temple University Philadelphia, PA 19422 e-mail: pgrillo@thunder.ocis.temple.edu Dear Colleague, I am conducting a Internet-based survey to identify the difference between industry and academic views on issues related to the current software engineering field and software engineering education. I would appreciate your participation in determining the gap between the skills a student gains from a university software engineering education with the skills that industry requires from these graduates. Our goal is to identify what skills a graduating software engineer would have to master in order to be effective in today's business environment. Your response is very important to me, and I would be most appreciative for your time and effort in filling out this survey. Please complete the Internet-based survey by July 31st, 1998 and feel free to contact me by email if you have any questions about the survey or this research. The URL address of the survey is http://rajiv1.cis.temple.edu/survey/ and the survey password is software. To have results that are truly representative, I need to have a high rate of participation in the completion of the survey. We encourage more than one person from an organization/department to complete this questionnaire. Our goal is to obtain a broad sampling of the software profession. To this end, please feel free to distribute the URL and password to other members of your organization/department who might be interested in participating. All information you provide will be held strictly confidential. Your answers will be non-attributable to you on a personal level, or to your company. Only aggregate results of this study will be published and no information is recorded pertaining to your identity. These results will be submitted for general publication to provide the software engineering profession with an understanding of the difference in perspectives. If you would like to receive a summary of the results, please provide your email address at then end of the survey form. Thank you, Peter Grillo Personal URL: http://rajiv1.cis.temple.edu/grillo/grillo.html Survey URL: http://rajiv1.cis.temple.edu/survey (password=software) CIS Dept URL: http://joda.cis.temple.edu ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Contact and General Information about FASE The Forum for Advancing Software engineering Education (FASE) is published on the 15th of each month by the FASE editorial board. Send newsletter articles to one of the editors, preferably by category: Articles pertinent to corporate and government training to Kathy Beckman ; Academic education, and all other categories to Don Bagert . Items must be submitted by the 8th of the month in order to be considered for inclusion in that month's issue. Also, please see the submission guidelines immediately below. FASE submission format guidelines: All submissions must be in ASCII format, and contain no more than 70 characters per line (71 including the new line character). This 70-character/line format must be viewable in a text editor such as Microsoft Notepad WITHOUT using a "word wrap" facility. All characters (outside of the newline) should in the ASCII code range from 32 to 126 (i.e. "printable" in DOS text mode). Everyone that is receiving this is on the FASE mailing list. If you wish to leave this list, write to and, in the text of your message (not the subject line), write: signoff fase To rejoin (or have someone else join) the FASE mailing list, write to and, in the text of your message (not the subject line), write: subscribe fase But what if you have something that you want to share with everyone else, before the next issue? For more real-time discussion, there is the FASE-TALK discussion list. It is our hope that it will be to FASE readers what the SIGCSE.members listserv is to that group. (For those of you that don't know, SIGCSE is the ACM Special Interest Group on Computer Science Education.) To subscribe to the FASE-TALK list, write to and, in the text of your message (not the subject line), write: subscribe fase-talk Please try to limit FASE-TALK to discussion items related to software engineering education and training; CFPs and other such items can still be submitted to the editor for inclusion into FASE. Anyone that belongs to the FASE-TALK mailing list can post to it. FASE-TALK is also used by the editors for "breaking stories" i.e. news that we feel that you would want to hear about before the next issue of FASE comes out. (We do this sparingly, though.) As always, there is no cost for subscribing to either FASE or FASE-TALK! Send requests for information problem reports, returned mail, or other correspondence about this newsletter to Back issues (dating from the very first issue) can be found on the web (with each Table of Contents) at or through ftp at . The FASE Staff: Don Bagert -- Academic/Misc Editor, ListMaster, and Archivist Dept. of Computer Science 8th and Boston Texas Tech University Lubbock TX 79409-3104 USA Phone: 806-742-1189 Fax: 806-742-3519 Email: bagert@ttu.edu Kathy Beckman -- Corporate/Government Editor Computer Data Systems One Curie Ct. Rockville MD 20850 USA Phone: 301-921-7027 Fax: 301-921-1004 Email: Kathy.Beckman@cdsi.com Laurie Werth -- Advisory Committee Taylor Hall 2.124 University of Texas at Austin Austin TX 78712 USA Phone: 512-471-9535 Fax: 512-471-8885 Email: lwerth@cs.utexas.edu Nancy Mead -- Advisory Committee Software Engineering Institute 5000 Forbes Ave. Pittsburgh, PA 15213 USA Phone: 412-268-5756 Fax: 412-268-5758 Email: nrm@sei.cmu.edu