The Personal Software Process: an Independent Study
Prev	Chapter 8. Lesson 8: Software Quality Management	Next

Program 7a: Correlation

Using a linked list, write a program to calculate the correlation of two sets of data

Given Requirements

Requirements: Write a program to calculate the correlation between two series of numbers and determine the significance of this correlation. The formula for making the correlation calculation is given in section A3. Use the numerical integration function from program 5A to calculate the value of the t distribution and hold the data in a linked list.

Testing: Thoroughly test the program. As one test, use the data in Table D12. Here, the results for the correlation between x and y should be r=0.9543158, t = 9.0335, with 2*(1-p)=1.80*10^-5. This is a significance of substantially better than (less than) 0.005. This example is worked out in sections A3.2 and A4.1 ... also use program 7a to analyze the data on your programming exercises to date to determine the correlation between the actual new and changed LOC and the actual development time, the correlation between the estimated new and changed LOC and the actual development time, and the significance of these correlations. Prepare and submit a test report that includes these data and uses the format in table D13.

Table 8-1. D12: Total LOC and development hours for 10 Pascal programs

Item Number	Actual New and Changed LOC	Development Hours
n	x	y
1	186	15.0
2	699	69.9
3	132	6.5
4	272	22.4
5	291	28.4
6	331	65.9
7	199	19.4
8	1890	198.7
9	788	38.8
10	1601	138.2
Totals	6389	603.2

Table 8-2. Test Results Format: Program 7A

Test

Expected Value

Actual Value

2*(1-p)

Table D12

0.9543

9.0335

1.80*10^-5

Actual LOC vs Development Time

n/a

Estimated LOC vs Development Time

n/a

--[Humphrey95]

Planning

Requirements

Program 7a, like several others, analyzes two series of numbers and their relationship to each other. With that in mind, I can see no reason at all not to recycle much of program 6a (which did linear regression and prediction) to add the correlation and significance calculations, producing output as below:

Historical data read.
Beta-0: -0.351494
Beta-1: 0.0949624
Standard deviation: 19.7304
Correlation r: 0.9543
t: 9.0335
2*(1-p): 1.80e-5

Estimate at x=300
Projected y: 28.1372
t (70 percent): 1.10815
t (90 percent): 1.85955
Range (70 percent): 23.2687; UPI: 51.406; LPI: 4.86851
Range (90 percent): 39.0466; UPI: 67.1838; LPI: -10.9094

The input file format will be the same as for program 6a: pairs of comma-separated doubles in a file, optionally commented with two hyphens ("--") to begin an inline comment. The list of numbers will be terminated by the lowercase word "stop"; any single numbers on lines below that will constitute predictions and will generate predictions as shown above based on the linear regression parameters.

Size estimate

Using program 6a and historical data, the estimated size of new and changed code is 179

Time estimate

Using data for the last three programs (PROBE estimates of new/changed LOC vs actual development time), estimated total development time is 228 minutes.

Development

Design

Using Dia, a freeware diagramming tool, I've made a simplified diagram of the new features for appropriate classes, which should give a good idea of what's changing, supplemented by brief notes. Not much changes from program 6a, but that's to be expected from the requirements

Preliminary design diagram, using a form of UML

Preliminary design; some things are missing, and are caught later in the design review. I prefer to keep designs fairly sparse and implementation-independent.

Design Review

First design review-- and it did catch some things! I get the feeling from the design review checklist provided by Humphrey that I'm meant to do a full design, including pseudocode for all methods (with checks in the review checklist like "loops are properly initiated, incremented, and terminated"). I think many of these details are specific to the implementation language and should probably be left to coding, but that's my personal opinion of the design process (in fact, I'm even against the parentheses that the UML template put on my features when diagramming-- that's language-dependent! I may try and switch to BON, if I can figure out how to convince Dia to do different diagram styles).

I did find a few errors, mostly due to missing contracts (which are not really part of the implementation, but I do code them in both C++ and Eiffel). The only real algorithmic problem was the omission of the square root in the correlation calculation-- a significant omission!

Code

No surprises, although some of the number_list additions (tail, etc) required additional code not included in the design.

Reused code

The simple_input_parser, simpson_integrator, paired_number_list, and single_variable_function classes were reused in full, as were the gamma_function, error_log, is_double_equal, t_distribution, t_distribution_base, t_integral, and whitespace_stripper modules.

number_list

number_list added the head and tail methods, as well as the mapped_to, multiplied_by_list, and append features.

/*
*/

#ifndef NUMBER_LIST_H
#define NUMBER_LIST_H

#ifndef SINGLE_VARIABLE_FUNCTION_H
#include "single_variable_function.h"
#endif

#include <list>
#include <iostream>

//a class which encapsulates a list of double values, adding the features of
//mean and standard deviation
class number_list:public list < double >
{
  public:double sum (void) const;
  double mean (void) const;
  double standard_deviation (void) const;
  int entry_count (void) const;
  void add_entry (double new_entry);

  double head( void ) const;
  number_list tail( void ) const;
  number_list mapped_to( const single_variable_function& f ) const;
  number_list multiplied_by_list( const number_list& rhs ) const;
  void append( const number_list& rhs );

};

#endif

/*
*/

/*
*/

#include "number_list.h"
#include <assert.h>
#include <stdlib.h>
#include <math.h>

#ifndef CONTRACT_H
#include "contract.h"
#endif

double
number_list::sum (void) const
{
  double
    Result = 0;
  for (list < double >::const_iterator iter = begin ();
       iter != end (); ++iter)
    {
      Result += *iter;
    }
  return Result;
}

double
number_list::mean (void) const
{
  assert (entry_count () > 0);
  return sum () / entry_count ();
}

double
number_list::standard_deviation (void) const
{
  assert (entry_count () > 1);
  double
    sum_of_square_differences = 0;
  for (list < double >::const_iterator iter = begin ();
       iter != end (); ++iter)
    {
      const double
	this_square_difference = *iter - mean ();
      sum_of_square_differences +=
	this_square_difference * this_square_difference;
    }
  return sqrt (sum_of_square_differences / (entry_count () - 1));
}

int
number_list::entry_count (void) const
{
  return size ();
}

void
number_list::add_entry (double new_entry)
{
  push_back (new_entry);
}

double
number_list::head( void ) const
{
  REQUIRE( entry_count() > 0 );
  return *begin();
}

number_list
number_list::tail( void ) const
{
  number_list Result;
  list < double >::const_iterator iter = begin();
  ++iter;
  Result.insert( Result.begin(), iter, end() );
  return Result;
}

number_list
number_list::mapped_to( const single_variable_function& f ) const
{
  number_list Result;
  for (list < double >::const_iterator iter = begin ();
       iter != end (); ++iter)
    {
      Result.add_entry( f.at(*iter) );
    }
  return Result;
}

number_list
number_list::multiplied_by_list( const number_list& rhs ) const
{
  REQUIRE( entry_count() == rhs.entry_count() );
  number_list Result;
  if ( entry_count() > 0 )
    {
      Result.add_entry( head() * rhs.head() );
      Result.append( tail().multiplied_by_list( rhs.tail() ) );
    }      
  return Result;
}

void
number_list::append( const number_list& rhs )
{
  insert( end(), rhs.begin(), rhs.end() );
}

/*
*/

paired_number_list_predictor

#ifndef PAIRED_NUMBER_LIST_PREDICTOR_H
#define PAIRED_NUMBER_LIST_PREDICTOR_H

#ifndef PAIRED_NUMBER_LIST_H
#include "paired_number_list.h"
#endif
#ifndef T_DISTRIBUTIONL_H
#include "t_distribution.h"
#endif

class paired_number_list_predictor:public paired_number_list
{
  public:double variance (void) const;
  double standard_deviation (void) const;
  double projected_y (double x) const;
  double prediction_range (double x, double range) const;
  double lower_prediction_interval (double x, double range) const;
  double upper_prediction_interval (double x, double range) const;
  double t (double range) const;

  static double correlation_bottom_term( const number_list& numbers );
  double correlation_top( void ) const;
  double correlation( void ) const;
  double significance_t( void ) const;
  double significance( void ) const;

    protected:t_distribution m_t_distribution;
  double prediction_range_base (void) const;
};

#endif

/*
*/

#include "paired_number_list_predictor.h"

#ifndef CONTRACT_H
#include "contract.h"
#endif
#ifndef SQUARE_H
#include "square.h"
#endif
#ifndef T_INTEGRAL_H
#include "t_integral.h"
#endif

#include <math.h>

double
paired_number_list_predictor::variance (void) const
{
  REQUIRE (entry_count () > 2);
  double
    Result = 0;
  list < double >::const_iterator x_iter;
  list < double >::const_iterator y_iter;

  for (x_iter = m_xs.begin (), y_iter = m_ys.begin ();
       (x_iter != m_xs.end ()) && (y_iter != m_ys.end ()); ++x_iter, ++y_iter)
    {
      Result += pow (*y_iter - beta_0 () - beta_1 () * (*x_iter), 2);
    }
  Result *= 1.0 / (entry_count () - 2.0);
  return Result;
}

double
paired_number_list_predictor::standard_deviation (void) const
{
  return sqrt (variance ());
}

double
paired_number_list_predictor::projected_y (double x) const
{
  return beta_0 () + beta_1 () * x;
}

double
paired_number_list_predictor::t (double range) const
{
  const_cast <
    paired_number_list_predictor *
    >(this)->m_t_distribution.set_n (entry_count () - 2);
  return m_t_distribution.at (range);
}

double
paired_number_list_predictor::prediction_range (double x, double range) const
{
  REQUIRE (entry_count () > 0);
  const double
    a_t = t (range);
  const double
    dev = standard_deviation ();
  const double
    x_m = x_mean ();
  double
    ecount_inv = 1 / entry_count ();
  return t (range) * standard_deviation ()
    * sqrt (1.0 + 1.0 / static_cast < double >(entry_count ())
	    + pow (x - x_mean (), 2) / prediction_range_base ());
}

double
paired_number_list_predictor::lower_prediction_interval (double x,
							 double range) const
{
  return projected_y (x) - prediction_range (x, range);
}

double
paired_number_list_predictor::upper_prediction_interval (double x,
							 double range) const
{
  return projected_y (x) + prediction_range (x, range);
}

double
paired_number_list_predictor::prediction_range_base (void) const
{
  double
    Result = 0;
  for (std::list < double >::const_iterator x_iter = m_xs.begin ();
       x_iter != m_xs.end (); ++x_iter)
    {
      Result += pow ((*x_iter) - x_mean (), 2);
    }
  return Result;
}

double
paired_number_list_predictor::correlation_bottom_term( const number_list& numbers )
{
  REQUIRE( numbers.entry_count() > 0 );
  REQUIRE( numbers.sum() != 0 );
  square a_square;
  double Result = numbers.entry_count() * ( numbers.mapped_to( a_square ).sum() )
    - a_square.at( numbers.sum() );
  return Result;
}

double
paired_number_list_predictor::correlation_top( void ) const
{
  double Result = entry_count() * m_xs.multiplied_by_list( m_ys ).sum() - (x_sum() * y_sum());
  return Result;
}

double
paired_number_list_predictor::correlation( void ) const
{
  REQUIRE( correlation_bottom_term( m_xs ) != 0 );
  REQUIRE( correlation_bottom_term( m_ys ) != 0 );
  double Result = correlation_top() 
    / sqrt( correlation_bottom_term( m_xs ) * correlation_bottom_term( m_ys ) );
  return Result;
}

double
paired_number_list_predictor::significance_t( void ) const
{
  REQUIRE( correlation() != 1.0 );
  REQUIRE( entry_count() >= 2 );
  double Result = ( fabs( correlation() ) * sqrt( entry_count() - 2.0 ) )
    / sqrt( 1 - pow( correlation(), 2 ) );
  return Result;
}

double
paired_number_list_predictor::significance( void ) const
{
  t_integral t;
  t.set_n( entry_count() - 2 );
  const double p = t.at( significance_t() );
  double Result = 2.0 * ( 1.0 - p );
  return Result;
}

/*
*/

predictor_parser

#ifndef PREDICTOR_PARSER_H
#define PREDICTOR_PARSER_H

#ifndef SIMPLE_INPUT_PARSER_H
#include "simple_input_parser.h"
#endif
#ifndef PAIRED_NUMBER_LIST_PREDICTOR_H
#include "paired_number_list_predictor.h"
#endif

class predictor_parser:public simple_input_parser
{
  public:virtual void reset (void);
  virtual std::string transformed_line (const std::string & line) const;
  virtual void parse_last_line (void);
    predictor_parser (void);

    protected:bool found_end_of_historical_data;
  paired_number_list_predictor number_list;

  void parse_last_line_as_historical_data (void);
  void parse_last_line_as_end_of_historical_data (void);
  void parse_last_line_as_prediction (void);
  bool last_line_is_blank (void);
  static const std::string & historical_data_terminator;
  static const std::string & inline_comment_begin;
  bool is_double (const std::string & str);
  double double_from_string (const std::string & str);

    std::string string_stripped_of_whitespace (const std::string & str) const;
    std::string string_stripped_of_comments (const std::string & str) const;
};

#endif

/*
*/

#include "predictor_parser.h"
#ifndef WHITESPACE_STRIPPER_H
#include "whitespace_stripper.h"
#endif
#ifndef ERROR_LOG_H
#include "error_log.h"
#endif
#ifndef CONTRACT_H
#include "contract.h"
#endif

void
predictor_parser::reset (void)
{
  simple_input_parser::reset ();
  found_end_of_historical_data = false;
  number_list.reset ();
}

std::string predictor_parser::transformed_line (const std::string & str) const
{
  return whitespace_stripper::string_stripped_of_whitespace (string_stripped_of_comments (str));
}

std::string
  predictor_parser::string_stripped_of_comments (const std::string & str) const
{
  std::string::size_type comment_index = str.find (inline_comment_begin);
  return str.substr (0, comment_index);
}

void
predictor_parser::parse_last_line (void)
{
  if (last_line_is_blank ())
    {
      return;
    }
  else if (last_line () == historical_data_terminator)
    {
      parse_last_line_as_end_of_historical_data ();
    }
  else
    {
      if (!found_end_of_historical_data)
	{
	  parse_last_line_as_historical_data ();
	}
      else
	{
	  parse_last_line_as_prediction ();
	}
    }
}

bool predictor_parser::last_line_is_blank (void)
{
  if (last_line ().length () == 0)
    {
      return true;
    }
  else
    {
      return false;
    }
}

void
predictor_parser::parse_last_line_as_historical_data (void)
{
  //6 reused, 6 modified?
  error_log
    errlog;
  //split the string around the comma
  const
    std::string::size_type comma_index = last_line ().find (',');
  errlog.check_error (comma_index == last_line ().npos, "No comma");
  std::string x_string = last_line ().substr (0, comma_index);
  std::string y_string =
    last_line ().substr (comma_index + 1, last_line ().length ());
  //get values for each double and ensure they're valid
  errlog.check_error (!is_double (x_string), "X invalid:" + x_string);
  errlog.check_error (!is_double (y_string), "Y invalid:" + y_string);
  if (!errlog.error_flag ())
    {
      double
	new_x = double_from_string (x_string);
      double
	new_y = double_from_string (y_string);
      //add the entry
      cout << "added: " << new_x << ", " << new_y << "\n";
      number_list.add_entry (new_x, new_y);
    }
}

void
predictor_parser::parse_last_line_as_end_of_historical_data (void)
{
  REQUIRE (last_line () == historical_data_terminator);
  cout << "Historical data read.\n"
    << "Beta-0: " << number_list.beta_0 () << "\n"
    << "Beta-1: " << number_list.beta_1 () << "\n"
    << "Standard deviation: " << number_list.standard_deviation () << "\n";
  if ( ( number_list.entry_count() >= 2 ) 
       && ( number_list.x_sum() != 0 )
       && ( number_list.y_sum() != 0 ) )
    {
      cout << "Correlation: " << number_list.correlation() << "\n"
	   << "Significance t: " << number_list.significance_t() << "\n"
	   << "2*(1-p): " << number_list.significance() << "\n\n";
    }
  else 
    {
      cout << "Too few numbers for correlation calc, or sums do not permit correlation calc\n\n";
    }
  found_end_of_historical_data = true;
}

predictor_parser::predictor_parser (void)
{
  reset ();
}

void
predictor_parser::parse_last_line_as_prediction (void)
{
  error_log
    errlog;
  errlog.check_error (!is_double (last_line ()),
		      "Not a double: " + last_line ());
  if (!errlog.error_flag ())
    {
      const double
	x = double_from_string (last_line ());
      cout << "Estimate at x=" << x << "\n"
	<< "Projected y: " << number_list.projected_y (x) << "\n"
	<< "t (70 percent): " << number_list.t (0.7) << "\n"
	<< "t (90 percent): " << number_list.t (0.9) << "\n"
	<< "Range (70 percent): " << number_list.prediction_range (x, 0.7)
	<< "; UPI: " << number_list.upper_prediction_interval (x, 0.7)
	<< "; LPI: " << number_list.lower_prediction_interval (x, 0.7)
	<< "\nRange (90 percent): " << number_list.prediction_range (x, 0.9)
	<< "; UPI: " << number_list.upper_prediction_interval (x, 0.9)
	<< "; LPI: " << number_list.lower_prediction_interval (x, 0.9)
	<< "\n";
    }
}

bool predictor_parser::is_double (const std::string & str)
{
  bool
    Result = true;
  char *
    conversion_end = NULL;
  strtod (str.c_str (), &conversion_end);
  if (conversion_end == str.data ())
    {
      Result = false;
    }
  return Result;
}


double
predictor_parser::double_from_string (const std::string & str)
{
  REQUIRE (is_double (str));
  return strtod (str.c_str (), NULL);
}


const
  std::string & predictor_parser::historical_data_terminator = "stop";

const
  std::string & predictor_parser::inline_comment_begin = "--";

/*
*/

square

#ifndef SQUARE_H
#define SQUARE_H

#ifndef SINGLE_VARIABLE_FUNCTION_H
#include "single_variable_function.h"
#endif

class square : public single_variable_function
//returns the square of the argument
{
 public:
  virtual double at( double x ) const;
};

#endif

/*
*/

#include "square.h"

#include <math.h>

double
square::at( double x ) const
{
  return pow( x, 2 );
}


/*
*/

main

/*
*/

#include <fstream>
#include <iostream>
#include "string.h"

#ifndef PREDICTOR_PARSER_H
#include "predictor_parser.h"
#endif

istream *
input_stream_from_args (int arg_count, const char **arg_vector)
{
  istream *Result = NULL;
  if (arg_count == 1)
    {
      Result = &cin;
    }
  else
    {
      const char *help_text =
	"PSP exercise 6A: Calculate a prediction and interval given historical data.\nUsage:\n\tpsp_5a\n\n";
      cout << help_text;
    }
  return Result;
}

int
main (int arg_count, const char **arg_vector)
{
  //get the input stream, or print the help text as appropriate
  istream *input_stream = input_stream_from_args (arg_count, arg_vector);
  if (input_stream != NULL)
    {
      predictor_parser parser;
      parser.set_input_stream (input_stream);
      parser.parse_until_eof ();
    }
}

/*
*/

paired_number_list_predictor.e

 class PAIRED_NUMBER_LIST_PREDICTOR
 --reads a set of paired numbers, does linear regression, predicts results

 inherit 
    PAIRED_NUMBER_LIST
       redefine make
       end; 

 creation {ANY} 
    make

 feature {ANY} 

    variance: DOUBLE is 
       local 
	  i: INTEGER;
       do  
	  Result := 0;
	  from 
	     i := xs.lower;
	  until 
	     not (xs.valid_index(i) and ys.valid_index(i))
	  loop 
	     Result := Result + (ys.item(i) - beta_0 - beta_1 * xs.item(i)) ^ 2;
	     i := i + 1;
	  end; 
	  Result := Result / (entry_count - 2);
       end -- variance

    standard_deviation: DOUBLE is 
       do  
	  Result := variance.sqrt;
       end -- standard_deviation

    projected_y(x: DOUBLE): DOUBLE is 
       --projected value of given x, using linear regression 
       --parameters from xs and ys
       do  
	  Result := beta_0 + beta_1 * x;
       end -- projected_y

    prediction_range_base: DOUBLE is 
       --base of the prediction range, used in prediction_range
       local 
	  i: INTEGER;
       do  
	  Result := 0;
	  from 
	     i := xs.lower;
	  until 
	     not (xs.valid_index(i) and ys.valid_index(i))
	  loop 
	     Result := Result + (xs.item(i) - xs.mean) ^ 2;
	     i := i + 1;
	  end; 
       end -- prediction_range_base

    prediction_range(x, range: DOUBLE): DOUBLE is 
       --prediction range, based on given estimate and % range
       require 
	  entry_count > 0; 
       do  
	  Result := (1.0 + (1.0 / entry_count.to_double) + (((x - xs.mean) ^ 2) / prediction_range_base)).sqrt;
	  Result := t(range) * standard_deviation * Result;
       end -- prediction_range

    lower_prediction_interval(x, range: DOUBLE): DOUBLE is 
       --LPI, from [Humphrey95]
       do  
	  Result := projected_y(x) - prediction_range(x,range);
       end -- lower_prediction_interval

    upper_prediction_interval(x, range: DOUBLE): DOUBLE is 
       --UPI, from [Humphrey95]
       do  
	  Result := projected_y(x) + prediction_range(x,range);
       end -- upper_prediction_interval

    t_distribution: T_DISTRIBUTION;

    make is 
       do  
	  Precursor;
	  !!t_distribution.make;
       end -- make

    t(range: DOUBLE): DOUBLE is 
       --gets the size of the t-distribution at the given alpha range
       do  
	  t_distribution.set_n(entry_count - 2);
	  Result := t_distribution.at(range);
       end -- t

    correlation_bottom_term( numbers: NUMBER_LIST ): DOUBLE is
	  -- bottom term of the correlation
       require
	  entry_count > 0
	  numbers.sum /= 0
       local
	  square : SQUARE
      do
	 !!square
	  Result := ( entry_count * ( numbers.mapped_to( square ).sum ) ) - square.at( numbers.sum )
       end

    correlation_top : DOUBLE is
	  --top term of the correlation equation
       do
	  Result := entry_count * xs.multiplied_by_list( ys ).sum - ( xs.sum * ys.sum )
       end

    correlation : DOUBLE is
	  --correlation (r, not rsquared)
       require
	  correlation_bottom_term( xs ) /= 0
	  correlation_bottom_term( ys ) /= 0
       do
	  Result := correlation_top / ( correlation_bottom_term( ys ) * correlation_bottom_term( xs ) ).sqrt
       end

    significance_t : DOUBLE is
	  --t-portion of the significance (see [Humphrey95])
       require
	  correlation /= 1
	  entry_count >= 2
       do
	  Result := ( ( correlation.abs * ( entry_count - 2 ).sqrt ) / ( 1 - ( correlation ^ 2 ) ).sqrt )
       end

    significance : DOUBLE is
	  --2( 1 - p ); significance of the correlation
       local
	  a_t : T_INTEGRAL
	  p : DOUBLE
       do
	  !!a_t.make
	  a_t.set_n( entry_count - 2 )
	  p := a_t.at( significance_t )
	  Result := 2 * ( 1 - p )
      end
      
end -- class PAIRED_NUMBER_LIST_PREDICTOR

predictor_parser.e

class PREDICTOR_PARSER
--reads a list of number pairs, and performs linear regression analysis

inherit 
   SIMPLE_INPUT_PARSER
      redefine parse_last_line, transformed_line
      end; 
   
creation {ANY} 
   make

feature {ANY} 
   
   inline_comment_begin: STRING is "--";
   
   string_stripped_of_comment(string: STRING): STRING is 
      --strip the string of any comment
      local 
         comment_index: INTEGER;
      do  
         if string.has_string(inline_comment_begin) then 
            comment_index := string.index_of_string(inline_comment_begin);
            if comment_index = 1 then 
               Result := "";
            else 
               Result := string.substring(1,comment_index - 1);
            end; 
         else 
            Result := string;
         end; 
      end -- string_stripped_of_comment
   
   string_stripped_of_whitespace(string: STRING): STRING is 
      --strip string of whitespace
      do  
         Result := string;
         Result.left_adjust;
         Result.right_adjust;
      end -- string_stripped_of_whitespace
   
   transformed_line(string: STRING): STRING is 
      --strip comments and whitespace from parseable line      
      do  
         Result := string_stripped_of_whitespace(string_stripped_of_comment(string));
      end -- transformed_line
   
   number_list: PAIRED_NUMBER_LIST_PREDICTOR;

feature {ANY} --parsing

   found_end_of_historical_data: BOOLEAN;
   
   reset is 
      --resets the parser and makes it ready to go again
      do  
         found_end_of_historical_data := false;
         number_list.reset;
      end -- reset
   
   make is 
      do  
         !!number_list.make;
         reset;
      end -- make
   
   parse_last_line_as_historical_data is 
      --interpret last_line as a pair of comma-separated values
      local 
         error_log: ERROR_LOG;
         comma_index: INTEGER;
         x_string: STRING;
         y_string: STRING;
         new_x: DOUBLE;
         new_y: DOUBLE;
      do  
         !!error_log.make;
         comma_index := last_line.index_of(',');
         error_log.check_for_error(comma_index = last_line.count + 1,"No comma:" + last_line);
         x_string := last_line.substring(1,comma_index - 1);
         y_string := last_line.substring(comma_index + 1,last_line.count);
         error_log.check_for_error(not (x_string.is_double or x_string.is_integer),"invalid X:" + last_line);
         error_log.check_for_error(not (y_string.is_double or y_string.is_integer),"invalid Y:" + last_line);
         if not error_log.error_flag then 
            new_x := double_from_string(x_string);
            new_y := double_from_string(y_string);
            number_list.add_entry(new_x,new_y);
            std_output.put_string("added: ");
            std_output.put_double(new_x);
            std_output.put_string(", ");
            std_output.put_double(new_y);
            std_output.put_new_line;
         end; 
      end -- parse_last_line_as_historical_data
   
   double_from_string(string: STRING): DOUBLE is 
      require 
         string.is_double or string.is_integer; 
      do  
         if string.is_double then 
            Result := string.to_double;
         elseif string.is_integer then 
            Result := string.to_integer.to_double;
         end; 
      end -- double_from_string
   
   historical_data_terminator: STRING is "stop";
   
   parse_last_line_as_end_of_historical_data is 
      --interpret last line as the end of historical data
      require 
         last_line.compare(historical_data_terminator) = 0; 
      do  
         found_end_of_historical_data := true;
         std_output.put_string("Historical data read.%NBeta-0: ");
         std_output.put_double(number_list.beta_0);
         std_output.put_string("%NBeta-1: ");
         std_output.put_double(number_list.beta_1);
         std_output.put_string("%NStandard Deviation: ");
         std_output.put_double(number_list.standard_deviation);
	 std_output.put_string("%NCorrelation: ");
	 std_output.put_double(number_list.correlation);
	 std_output.put_string("%NSignificance t: ");
	 std_output.put_double(number_list.significance_t);
	 std_output.put_string("%N2*(1-p):");
	 std_output.put_double(number_list.significance);
         std_output.put_string("%N%N");
      end -- parse_last_line_as_end_of_historical_data
   
   parse_last_line_as_prediction is 
      --interpret last line as a single x, for a predictive y
      local 
         error_log: ERROR_LOG;
         x: DOUBLE;
      do  
         !!error_log.make;
         error_log.check_for_error(not (last_line.is_double or last_line.is_integer),"Not a double : " + last_line);
         if not error_log.error_flag then 
            x := double_from_string(last_line);
            std_output.put_string("Estimate at x=");
            std_output.put_double(x);
            std_output.put_string("%NProjected y: ");
            std_output.put_double(number_list.projected_y(x));
            std_output.put_string("%Nt (70 percent): ");
            std_output.put_double(number_list.t(0.7));
            std_output.put_string("%Nt (90 percent): ");
            std_output.put_double(number_list.t(0.9));
            std_output.put_string("%NRange (70 percent): ");
            std_output.put_double(number_list.prediction_range(x,0.7));
            std_output.put_string("; UPI: ");
            std_output.put_double(number_list.upper_prediction_interval(x,0.7));
            std_output.put_string("; LPI: ");
            std_output.put_double(number_list.lower_prediction_interval(x,0.7));
            std_output.put_string("%NRange (90 percent): ");
            std_output.put_double(number_list.prediction_range(x,0.9));
            std_output.put_string("; UPI: ");
            std_output.put_double(number_list.upper_prediction_interval(x,0.9));
            std_output.put_string("; LPI: ");
            std_output.put_double(number_list.lower_prediction_interval(x,0.9));
            std_output.put_new_line;
         end; 
      end -- parse_last_line_as_prediction
   
   parse_last_line is 
      --parse the last line according to state
      do  
         if not last_line.empty then 
            if last_line.compare(historical_data_terminator) = 0 then 
               parse_last_line_as_end_of_historical_data;
            else 
               if found_end_of_historical_data then 
                  parse_last_line_as_prediction;
               else 
                  parse_last_line_as_historical_data;
               end; 
            end; 
         end; 
      end -- parse_last_line

end -- class PREDICTOR_PARSER

square.e

class SQUARE
   
   inherit
      SINGLE_VARIABLE_FUNCTION
      redefine
	 at
	 
feature {ANY}
   
   at( x: DOUBLE ) : DOUBLE is
      do
	 Result := x ^ 2;
      end

end

main.e

class MAIN

creation {ANY} 
   make

feature {ANY} 
   
   make is 
      local 
         parser: PREDICTOR_PARSER;
         gamma: GAMMA_FUNCTION;
      do  
         !!parser.make;
         parser.set_input(io);
         parser.parse_until_eof;
      end -- make

end -- MAIN

Code Review

Mostly minors caught; forgot to return values occasionally, and had a header/implementation parity problem, but nothing significant.

Compile

Annoyingly enough, my design review missed some obvious fumbles-- missing #includes, wrong return types, etc. I'll be curious to see how the postmortem turns out in terms of yield, because I don't feel like this was terribly effective, but we'll see how it goes. In any case, it's my first attempt at reviews, and they did pick up several defects before compile, so I feel somewhat better.

Test

Perhaps inspections have some benefits after all! Only one error in test: a plus sign should have been a minus. And that's all. Not bad!

Table 8-3. Test Results Format: Program 7A

Test

Expected Value

Actual Value -- C++

Actual Vaule -- Eiffel

2*(1-p)

Table D12

0.9543

9.0335

1.80*10^-5

0.954316

9.03351

1.80318*10^-5

0.954316

9.033510

0.000018

Actual LOC vs Development Time

n/a

0.890682

3.39334

0.0426699

0.890682

3.393338

0.42670

Estimated LOC vs Development Time

n/a

0.976646

4.54559

0.137856

0.976646

4.545593

0.137856

Among other things, this does show that my development time is indeed related to my estimates. According to [Humphrey95] p. 513, an r² value of about 0.8 (as is the case with actual LOC vs time) indicates "a strong correlation. The relationship is adequate for planning purposes." An r² value of 0.95 (as for estimated LOC to actual development time) indicates a relationship which is "predictive and you can use it with high confidence."

This sort of news sounds peculiar (there's a closer relationship with my predicted LOC and time than there is with actual LOC and time?), but the difference in significances is telling (there's only about a 4% chance that the actual-LOC-to-time correlation would happen by chance, but around a 14% chance that the estimated-LOC-to-time correlation would happen by chance).

Postmortem

If I believe the numbers, the design and code reviews were indeed helpful, the design review being 2.5 times as effective as test, and the code review 3.3 times as effective, in terms of defects removed per hour spent in the activity. Of course, some judgement should be used, as the compilation step was 8.5 times as effective-- but the errors found were automatically caught, and not particularly significant.

PSP2 Project Plan Summary

Table 8-4. Project Plan Summary

Student:	Victor B. Putz	Date:	000121
Program:	Correlation	Program#	7A
Instructor:	Wells	Language:	C++

Summary	Plan	Actual	To date
Loc/Hour	49	26	46
Planned time	228		484
Actual time		198	513
CPI (cost/performance index)			0.94
%reused	77	87	41
Test Defects/KLOC	?	11	34
Total Defects/KLOC	?	232.55	137
Yield (defects before test/total defects)	?	95	75

Program Size	Plan	Actual	To date
Base	227	227
Deleted	0	0
Modified	0	0
Added	68	86
Reused	365	365	988
Total New and Changed	179	86	1175
Total LOC	771	678	2390
Total new/reused	0	0	0
Upper Prediction Interval (70%)	281
Lower Prediction Interval (70%)	77

Time in Phase (min):	Plan	Actual	To Date	To Date%
Planning	41	56	297	20
Design	23	38	168	11
Design Review	?	16	16	1
Code	62	40	391	26
Code Review	?	12	12	1
Compile	16	7	96	6
Test	70	10	430	28
Postmortem	16	19	111	7
Total	228	198	1521	100
Total Time UPI (70%)	280
Total Time LPI (70%)	176
Defects Injected		Actual	To Date	To Date %
Plan		0	0	0
Design	7	9	51	32
Design Review	?	0	0	0
Code	15	11	104	65
Code Review	?	0	0	0
Compile	1	0	3	2
Test	1	0	3	2
Total development	23	20	161	100
Defects Removed		Actual	To Date	To Date %
Planning		0	0	0
Design		0	0	0
Design Review	?	4	4	3
Code	5	5	36	22
Code Review	?	4	4	3
Compile	12	6	77	48
Test	6	1	40	23
Total development	23	20	161	100
After Development	0	0	0
Defect Removal Efficiency	Plan	Actual	To Date
Defects/Hour - Design Review	?	15	15
Defects/Hour - Code Review	?	20	20
Defects/Hour - Compile	?	51.4	51.4
Defects/Hour - Test	?	6	6
DRL (design review/test)	?	2.5	2.5
DRL (code review/test)	?	3.3	3.3
DRL (compile/test)	?	8.5	8.5

Eiffel code/compile/test

Time in Phase (min)	Actual	To Date	To Date %
Code	25	242	50
Code Review	10	10	2
Compile	6	112	23
Test	3	120	25
Total	44	484	100
Defects Injected	Actual	To Date	To Date %
Design	0	4	4
Code	11	97	95
Compile	0	0	0
Test	0	1	1
Total	11	102	100
Defects Removed	Actual	To Date	To Date %
Code	0	1	1
Code Review	5	5	5
Compile	5	66	65
Test	1	30	29
Total	11	102	100
Defect Removal Efficiency	Actual	To Date
Defects/Hour - Code Review	30	30
Defects/Hour - Compile	50	50
Defects/Hour - Test	20	20
DRL (code review/test)	1.5	2.5
DRL (compile/test)	2.5	2.5

Time Recording Log

Table 8-5. Time Recording Log

Student:	Victor Putz	Date:	000121
		Program:	7a

Start	Stop	Interruption Time	Delta time	Phase	Comments
000121 08:55:21	000121 10:05:58	14	56	plan
000121 10:26:13	000121 11:04:10	0	37	design
000121 11:13:47	000121 11:29:20	0	15	design review
000121 11:33:08	000121 12:13:44	0	40	code
000121 12:14:56	000121 12:26:44	0	11	code review
000121 12:29:30	000121 12:36:38	0	7	compile
000121 12:38:01	000121 12:48:00	0	9	test
000121 12:58:53	000121 13:18:04	0	19	postmortem

Table 8-6. Time Recording Log

Student:		Date:	000123
		Program:

Start	Stop	Interruption Time	Delta time	Phase	Comments
000123 12:55:47	000123 13:20:32	0	24	code
000123 13:20:33	000123 13:30:47	0	10	code review
000123 13:30:55	000123 13:36:31	0	5	compile
000123 13:36:57	000123 13:39:29	0	2	test

Defect Reporting Logs

Table 8-7. Defect Recording Log

Student:	Victor Putz	Date:	000121
		Program:	7a

Defect found	Type	Reason	Phase Injected	Phase Removed	Fix time	Comments
000121 11:14:48	ct	ig	design	design review	0	Missing contract for correlation_bottom_term; requires n > 0, numbers.sum != 0
000121 11:16:30	ct	ig	design	design review	1	Missed requirement for bottom terms to not be equal to zero
000121 11:18:00	mc	om	design	design review	1	missed square root call in correlation
000121 11:20:00	ct	ig	design	design review	1	missed requirement for correlation != 1 in significance_t, and entry_count >= 2
000121 11:47:21	wt	om	design	code	0	correlation_bottom_term should be static
000121 11:52:05	we	om	design	code	0	was setting n to entry_count - 1; should be entry_count - 2
000121 11:57:45	ct	om	design	code	3	forgot contract in head()
000121 12:01:20	md	om	design	code	2	forgot copy operator (useful in some algs)
000121 12:12:00	md	om	design	code	1
000121 12:15:31	mc	om	code	code review	0	forgot to return the head value in head()!
000121 12:16:54	wa	om	code	code review	0	strange loop logic in mapped_to
000121 12:18:08	ma	om	code	code review	0	forgot to return Result in mapped_to, multiplied_by_list
000121 12:26:00	sy	om	code	code review	0	forgot to declare at as const
000121 12:30:32	sy	om	code	compile	0	Darn, it, header/implementation parity! correlation_bottom_term static/const parity troubles
000121 12:32:13	wt	cm	code	compile	0	wrong argument type in correlation_bottom_term
000121 12:33:57	sy	om	code	compile	0	Gr... forgot an #include for T_INTEGRAL
000121 12:34:48	sy	om	code	compile	0	forgot parentheses on no-parameter feature call
000121 12:35:34	sy	om	code	compile	0	forgot to #include contract.h
000121 12:36:07	wt	cm	code	compile	0	wrong return type (missed &) on multiplied_by_list
000121 12:39:28	wa	cm	code	test	6	+ sign should have been - in correlation_bottom_term

Table 8-8. Defect Recording Log

Student:		Date:	000123
		Program:

Defect found	Type	Reason	Phase Injected	Phase Removed	Fix time	Comments
000123 13:24:40	ma	cm	code	code review	0	Forgot to increment loop counter
000123 13:24:58	ct	om	code	code review	0	forgot contracts
000123 13:25:21	ct	om	code	code review	0	forgot correlation contract
000123 13:26:06	ct	om	code	code review	0	forgot significance contract
000123 13:28:10	ma	cm	code	code review	0	forgot to initialize t in significance
000123 13:31:12	sy	cm	code	compile	0	used = instead of := for assignment
000123 13:31:45	sy	cm	code	compile	0	used := instead of = for comparison
000123 13:32:30	wn	om	code	compile	0	name clash between local and feature
000123 13:33:33	sy	ig	code	compile	1	had to change visibility of xs, ys in paired_number_list
000123 13:35:41	ma	om	code	compile	0	forgot to initialize square in correlation_bottom_term
000123 13:37:12	wn	ig	code	test	1	problems with append-- using features from Current instead of rhs for indices, not handling empty rhs.