Extend the size estimation practices established in the last two lessons toward estimation of resource and schedule usage
Read chapter 6 of the textbook
Write program 5A using PSP1.1
This chapter, on estimating development time, is the one I rather wish I'd read before the last assignment-- it relates the linear regression process to prediction and estimation, and while I'd figured much of the information out by the end of lesson 4, it would have been nice to have it earlier.
In a nutshell, the estimation part of the PSP tries to relate a set of historical estimated data (past estimates) to a set of actual data (past actual results), then use mathematics to make a prediction for a new estimate. Chapter 5 focuses on development time, so we'll use that as an example, but the processes can be used to relate any two quantities which might be correlated (and further lessons will evidently give us tools to determine that correlation). Humphrey gives three scenarios for estimating project time:
If you don't have enough historical data (estimates and results) to make a prediction for size, you take a known quantity (historical productivity in LOC/hour), and use that to estimate the shortest and longest likely times; in other words, using an example from the text, if you had written two programs and had time and size data, you might get the following information:
Table 5-1. Example: figuring estimated time from productivity data
LOC | Hours | Loc/Hour | |
172 | 7.6 | 22.63 | |
242 | 15.3 | 15.82 | |
Total: | 414 | 22.9 | 18.07 |
Using an average (18.07 LOC/hr), you might guess that your development time for a 156 LOC program might be 8.63 hours; using your lowest and highest productivities, you could guess a max time of 9.86 and 6.9 hours for the program. You now have a most likely schedule and an estimated lowest or highest.
If you have at least three sets of data, however, Humphrey advocates the use of statistics, particularly the fairly simple (if arduous by hand) linear regression calculation. Essentially, this takes pairs of numbers (estimated LOC and actual development hours, etc) as X-Y coordinates, finds the best-fit line for the data, and uses that line to extrapolate a schedule. This is a good deal more accurate, because rather than running on productivity, it gives you a statistical prediction of estimated-size-to-schedule based on historical data; the first used actual productivity numbers on an estimated size, ignoring the possibility of errors in your size estimate. The linear regression method assumes that, if you do have errors, they will be consistent, and will incorporate that data into the prediction. A likely "envelope" (prediction interval) around your most likely schedule is derived using the t-distribution and more clever algorithms. The entire process is much more math-intensive than the simple productivity calculation, but once automated would be just as easy and much more accurate. It can also be used to relate many types of correlated variables-- such as comparisons between Eiffel and C++ source code size and development time.
The linear regression calculation is fairly simple, but very time-consuming if done by hand (a great deal of summation, etc). A tool such as program 4A could be helpful indeed (in fact, it takes a great deal of restraint not to enhance it for other parameters now instead of waiting for the proper assignments!).
The rest of the chapter involves more mathematical concepts: how to combine resource estimates to get both an estimated result and an estimated prediction interval (in which we discover that combining multiple estimates gives a smaller prediction interval than a single estimate), and how to create a large estimate out of many smaller estimates. He also introduces multiple regression, a process which allows one to estimate the relative contributions of different variables on a single outcome (here, the relative contributions of the work on new, reused, and modified code [Humphrey95]). The math for this looks formidable and, of course, suitable for automation.
Given a time estimate, the remainder of chapter 5 is devoted to creating a schedule from the time estimate: identifying possible working hours, allocating project hours to work hours, creating the schedule, etc. Humphrey introduces the concept of earned value tracking to track the progress of a project; essentially, EV tracking assigns a value to each step in a project based on its estimated work time as a percentage of the estimated total work time. Adding tasks, then, reduces the estimated value of tasks already accounted for, but this does produce a fairly decent way to measure progress (a more traditional approach, the use of miniature milestones, accomplishes much the same thing without the added concept of value; any one milestone is often the same as any other, without a sense of additional value or progress for difficult or time-consuming tasks).