Publications
Journal Article
When 2 + 2 should be 5: The summation fallacy in time prediction
Journal of Behavioral Decision Making 35, no. 3 (2022): e2265.Status: Published
When 2 + 2 should be 5: The summation fallacy in time prediction
Predictions of time (e.g., work hours) are often based on the aggregation of estimates of elements (e.g., activities, subtasks). The only types of estimates that can be safely aggregated by summation are those reflecting predicted average outcomes (expected values). The sums of other types of estimates, such as bounds of confidence intervals or estimates of the mode, do not have the same interpretation as their components (e.g., the sum of the 90% upper bounds is not the appropriate 90% upper bound of the sum). This can be a potential source of bias in predictions of time, as shown in Studies 1 and 2, where professionals with experience in estimation provided total estimates of time that were inconsistent with their estimates of individual tasks. Study 3 shows that this inconsistency can be attributed to improper aggregation of time estimates and demonstrates how this can produce both over- and underestimation—and also time prediction intervals that are far too wide. Study 4 suggests that the results may reflect a more general fallacy in the aggregation of probabilistic quantities. Our observations are consistent with that inconsistencies and biases are driven by a tendency towards applying a naïve summation (2+2=4) of probabilistic (stochastic) values, in situations where this is not appropriate. This summation fallacy may be in particular consequential in a context where informal estimation methods (expert-judgment based estimation) are used.
Afilliation | Software Engineering |
Project(s) | Department of IT Management |
Publication Type | Journal Article |
Year of Publication | 2022 |
Journal | Journal of Behavioral Decision Making |
Volume | 35 |
Issue | 3 |
Pagination | e2265 |
Publisher | Wiley |
When should we (not) use the mean magnitude of relative error (MMRE) as an error measure in software development effort estimation?
Information and Software Technology 143 (2022): 106784.Status: Published
When should we (not) use the mean magnitude of relative error (MMRE) as an error measure in software development effort estimation?
Context: The mean magnitude of relative error (MMRE) is an error measure frequently used to evaluate and compare the estimation performance of prediction models and software professionals.
Objective: This paper examines conditions for proper use of MMRE in effort estimation contexts.
Method: We apply research on scoring functions to identify the type of estimates that minimizes the expected value of the MMRE.
Results: We show that the MMRE is a proper error measure for estimates of the most likely (mode) effort, but not for estimates of the median or mean effort, provided that the effort usage is approximately log-normally distributed, which we argue is a reasonable assumption in many software development contexts. The relevance of the findings is demonstrated on real-world software development data.
Conclusion: MMRE is not a proper measure of the accuracy of estimates of the median or mean effort, but may be used for the accuracy evaluation of estimates of most likely effort.
Afilliation | Software Engineering |
Project(s) | Department of IT Management, EDOS: Effective Digitalization of Public Sector |
Publication Type | Journal Article |
Year of Publication | 2022 |
Journal | Information and Software Technology |
Volume | 143 |
Pagination | 106784 |
Date Published | 03/2022 |
Publisher | Elsevier |
Journal Article
Evaluation of Probabilistic Project Cost Estimates
IEEE Transactions on Engineering Management (2021): 1-16.Status: Published
Evaluation of Probabilistic Project Cost Estimates
Evaluation of cost estimates should be fair and give incentives for accuracy. These goals, we argue, are challenged by a lack of precision in what is meant by a cost estimate and the use of evaluation measures that do not reward the most accurate cost estimates. To improve the situation, we suggest the use of probabilistic cost estimates and propose guidelines on how to evaluate such estimates. The guidelines emphasize the importance of a match between the type of cost estimate provided by the estimators and the chosen cost evaluation measure, and the need for an evaluation of both the calibration and the informativeness of probabilistic cost estimates. The feasibility of the guidelines is exemplified in an analysis of a set of 69 large Norwegian governmental projects. The evaluation indicated that the projects had quite accurate and unbiased P50 estimates and that the prediction intervals were reasonably well calibrated. It also showed that the cost prediction intervals were non-informative with respect to differences in cost uncertainty and, consequently, not useful to identify projects with higher cost uncertainty. The results demonstrate the usefulness of applying the proposed cost estimation evaluation guidelines.
Afilliation | Software Engineering |
Project(s) | Department of IT Management |
Publication Type | Journal Article |
Year of Publication | 2021 |
Journal | IEEE Transactions on Engineering Management |
Pagination | 1-16 |
Date Published | 08/2021 |
Publisher | IEEE |
Journal Article
Sequence effects in the estimation of software development effort
Journal of Systems and Software 159, no. January 2020 (2020): 110448.Status: Published
Sequence effects in the estimation of software development effort
Currently, little is known about how much the sequence in which software development tasks or projects are estimated affects judgment-based effort estimates. To gain more knowledge, we examined estimation sequence effects in two experiments. In the first experiment, 362 software professionals estimated the effort of three large tasks of similar sizes, whereas in the second experiment 104 software professionals estimated the effort of four large and five small tasks. The sequence of the tasks was randomised in both experiments. The first experiment, with tasks of similar size, showed a mean increase of 10% from the first to the second and a 3% increase from the second to the third estimate. The second experiment showed that estimating a larger task after a smaller one led to a mean decrease in the estimate of 24%, and that estimating a smaller task after a larger one led to a mean increase of 25%. There was no statistically significant reduction in the sequence effect with higher competence. We conclude that more awareness about how the estimation sequence affects the estimates may reduce potentially harmful estimation biases. In particular, it may reduce the likelihood of a bias towards too low effort estimates.
Afilliation | Software Engineering |
Project(s) | Department of IT Management |
Publication Type | Journal Article |
Year of Publication | 2020 |
Journal | Journal of Systems and Software |
Volume | 159 |
Issue | January 2020 |
Pagination | 110448 |
Publisher | Elsevier |
Technical reports
Estimering av kostnader i store statlige prosjekter: Hvor gode er estimatene og usikkerhetsanalysene i KS2-rapportene?
In Concept-rapport nr. 59. Trondheim: Ex ante akademisk forlag, 2019.Status: Published
Estimering av kostnader i store statlige prosjekter: Hvor gode er estimatene og usikkerhetsanalysene i KS2-rapportene?
The external quality assurance scheme for large government investment projects (the QA scheme / the state project model) aims, among other things, to ensure that budgets are realistic and that the risk analyses of the cost estimates reflect real cost uncertainty. The extent to which budgets, estimates and risk analyses are realistic, and where there may be potentials for improvements, are the main themes of this study.
Chapter 1 describes the background and motivation for the study. The starting point is that the Concept research programme collects final costs in projects that have been through QA2 (quality assurance of cost estimate before the parliament’s investment decision). That provides a basis for studies of cost performance. As the sample of projects increase, more detailed studies of the estimates that formed the basis for the parliament’s investment decision becomes possible.
The study has three main topics. We look at:
The realism in the projects’ budgets
The realism in the point estimates in the QA2 reports, and
The realism and information value in the prediction intervals and estimation distributions.
Chapter 2 provides a review of previous studies of cost performance in projects that have been through QA2. They all show relatively good results both in terms of deviation from budgets and risk assessments. While average cost overruns reported in international studies typically have been around 30 per cent, Norwegian studies report average overruns of between two and six per cent. Other studies also typically report a strong underestimation of uncertainty. The P50 and P85 estimates from the QA2 reports on the other hand (that is, estimates that are not expected to be exceeded in 50 and 85 per cent of cases, respectively) seem to have been reasonably well calibrated. However, several authors have pointed out that the distribution of final costs to the budgets have been somewhat higher than assumed at the time of the investment decision.
The data used in the study, which is described in Chapter 3, is based on a larger sample of projects than previous studies. The analyses focus more on the estimates than previous studies have done. The analysis of the P50 and P85 estimates is based on samples of 83 and 85 projects respectively. Sufficient data for our analysis of the cost estimates were found for 70 of these projects.
In Chapter 4, we outline detailed research questions and the methodology for the analyses. In this, we motivate and indicate, based on the latest research on the area, how probability-based cost estimates should be evaluated.
We introduce an analysis of estimate deviations and estimation bias based on what is a reasonable "loss function", where the loss function is what we attempt to minimise in the estimates. We evaluate the extent to which we have been successful in estimating the real uncertainty of projects ex ante. We also assess how informative prediction intervals and estimate distributions have been. We argue that well-calibrated probability-based estimates (e.g., that 50 per cent of P50 estimates should not be exceeded) are not a sufficient evaluation criterion. In addition, we need indicators for how informative the probability-based estimates have been.
In Chapter 5, we find that the median deviations between actual costs and the P50, measured as absolute percentage deviation, is 10 per cent (mean = 12.5), and that the median deviation from the P85 is 1.5 per cent (average = 3.4). In other words, for all the projects, there is only a slight tendency for overruns, and much lower than what has been reported in international studies. Over time, however, there has been a somewhat worrying development. While there was a tendency for cost underruns in the past (an average of 6 per cent underruns of the P50 for projects with an investment decision between 2001 and 2003), there has been a tendency for cost overruns in the later years (an average of 12 per cent overruns in the period 2010-2012).
Given well-calibrated estimates, the actual cost should be below the P50 in about 50 per cent of the cases and below the P85 in about 85 per cent of the cases. However, we find that this only applies in 40 per cent of the cases for the P50 and 73 per cent for the P85. The shares have been declining over time. While in 2001-2003, 62 and 100 per cent were within the P50 and P85 respectively, in 2010-2012 there were only 21 and 43 per cent within, albeit based on a smaller sample than in the time-periods before. The reason why hit rates for the P50 and the P85 for all projects together are not so far from the intended targets is because we have gone from overestimation to underestimation. The tendency for underestimation should be reversed through better estimation and governance in future projects.
The analyses of the estimates in Chapter 6 find about the same degree of overruns and estimate deviations for the P50 and P85 estimates as those reported in Chapter 5. The P50 estimates showed a median estimate bias of -1 per cent (mean = 3 per cent). The median percentage deviation (regardless of sign) was 12 per cent (mean = 14 per cent). We calculated that the expected deviation from the P50 budget could not be less than 8-10 per cent, given some assumptions that the projects do not adapt deliveries to reduce deviations. Although the latter assumption hardly is met, this calculation suggests that the deviations are not particularly high.
We observe that there is typically a reduction from estimate to budget. The P50 budget was on average seven per cent lower than the P50 estimate and the P85 budget seven per cent lower than the P85 estimate. Although there were several projects that should have retained the original P50 and P85 estimates as P50 and P85 budget, respectively, we did not find that the adjustments overall reduced the realism. Many of the adjustments seem to be well justified.
The estimates in the QA2 reports include both point estimates, prediction intervals and estimate distributions (S-curves). Our analyses include all of these and have as their main findings are as follows:
The estimation distributions and prediction intervals are typically too narrow to reflect actual uncertainty. For example, as many as 19 per cent of the projects have a lower cost than the P10 estimate and 20 per cent more than the P90 estimate. Future estimation should take into account that the scope for project costs is broader than previously typically assumed.
Estimated cost uncertainty, estimated through the width of the prediction interval and estimate distribution, does not correlate with actual cost uncertainty, measured by cost deviations and overruns. This indicates a low ability to distinguish between projects with high and low cost uncertainty. If we become better at identifying the high-risk projects, we could potentially reduce the need for risk contingency without compromising cost performance and project execution. We show, given some assumptions, that the P85 could be 17 per cent lower if the ability to distinguish between low and high risk projects had been better. Measures to improve this capability should be given priority in the estimation work.
There are differences in estimation performance between agencies and between consultancies carrying out the external QA. Defence projects stand out by having a strong tendency to overestimate costs (their average underrun of the P50 estimate is 19 per cent) and overly narrow prediction intervals (29 per cent of projects within the 80 percent prediction range). The Norwegian Public Roads Administration also tends to estimate too narrow prediction intervals (57 per cent of projects within the 80 per cent prediction interval). Among the QA consultancies, there are no major differences in estimate deviations, but larger differences in how realistic the uncertainty is estimated. There may be differences in project complexity or other issues that explain these differences.
Given the inability to distinguish between low- and high-risk projects in the estimation work, a simple mechanical mark-up model could in theory do just as well as the more demanding QA2 estimation work. We investigated this, where the uplifts were based on historical estimate deviations, but found that the QA2 estimates did better. This indicates that the work done in the QA2 estimation provides added value, measured against simple mark-up models.
In Chapter 7, we summarize and discuss the findings. Overall, the main conclusions are that the QA2 framework is useful and that cost estimates appear to be realistic and reasonably well calibrated. However, developments over time are worrying and should lead to improvements in the estimation work. Two major areas of improvement are to specify broader estimate distributions, that is, to recognize that cost uncertainty is typically greater than that which has previously been identified in the estimation work, as well as to better distinguish between projects with low and high cost uncertainty.
Afilliation | Software Engineering |
Project(s) | Department of IT Management |
Publication Type | Technical reports |
Year of Publication | 2019 |
Secondary Title | Concept-rapport nr. 59 |
Publisher | Ex ante akademisk forlag |
Place Published | Trondheim |
ISBN Number | 78-82-93253-81-5 |
ISSN Number | 0803-9763 |
Book
Time predictions: Understanding and avoiding unrealism in project planning and everyday life
In Simula SpringerBriefs on Computing. Switzerland: Springer, 2018.Status: Published
Time predictions: Understanding and avoiding unrealism in project planning and everyday life
Afilliation | Software Engineering |
Project(s) | Department of IT Management |
Publication Type | Book |
Year of Publication | 2018 |
Secondary Title | Simula SpringerBriefs on Computing |
Series Volume | 5 |
Number of Pages | 110 |
Publisher | Springer |
Place Published | Switzerland |
Poster
Time Perception Can Influence Performance Time Predictions
Long Beach, CA: Society for Judgment and Decision Making, 2014.Status: Published
Time Perception Can Influence Performance Time Predictions
People are often inaccurate when they predict how much time they will need to do future tasks. Two experiments show that bias in task performance time predictions is correlated with individual differences in time perception. The longer participants think a given time unit (e.g., 60 seconds) is, as measured with prospective duration judgment tasks, the less time they predict they will need to do other tasks they are asked to perform. The effect only occurs when the time perception tasks precede the performance time predictions, indicating that the time unit used in a performance time prediction can be influenced by a duration judgment task.
Afilliation | Software Engineering, Software Engineering |
Publication Type | Poster |
Year of Publication | 2014 |
Publisher | Society for Judgment and Decision Making |
Place Published | Long Beach, CA |
Journal Article
From Origami to Software Development: a Review of Studies on Judgment-Based Predictions of Performance Time
Psychological Bulletin 138 (2012): 238-271.Status: Published
From Origami to Software Development: a Review of Studies on Judgment-Based Predictions of Performance Time
Afilliation | Software Engineering |
Publication Type | Journal Article |
Year of Publication | 2012 |
Journal | Psychological Bulletin |
Volume | 138 |
Number | 2 |
Pagination | 238-271 |
How Does Project Size Affect Cost Estimation Error? Statistical Artifacts and Methodological Challenges
International Journal of Project Management 30 (2012): 751-862.Status: Published
How Does Project Size Affect Cost Estimation Error? Statistical Artifacts and Methodological Challenges
Empirical studies differ in what they report as the underlying relation between project size and percent cost overrun. As a consequence, the studies also differ in their project management recommendations. We show that studies with a project size measure based on the actual cost systematically report an increase in percent cost overrun with increased project size, whereas studies with a project size measure based on the estimated cost report a decrease or no change in percent cost overrun with increased project size. The observed pattern is, we argue, to some extent a statistical artifact caused by imperfect correlation between the estimated and the actual cost. We conclude that the previous observational studies cannot be considered as providing reliable evidence in favor of an underlying project size related cost estimation bias. The more robust evidence from controlled experiments, limited to small tasks, suggests an increase in underestimation with increased project size.
Afilliation | Software Engineering, Software Engineering |
Publication Type | Journal Article |
Year of Publication | 2012 |
Journal | International Journal of Project Management |
Volume | 30 |
Number | 7 |
Pagination | 751-862 |
Journal Article
To Read Two Pages, I Need 5 Minutes, But Give Me 5 Minutes and I Will Read Four: How to Change Productivity Estimates by Inverting the Question
Applied Cognitive Psychology 25 (2011): 314-323.Status: Published
To Read Two Pages, I Need 5 Minutes, But Give Me 5 Minutes and I Will Read Four: How to Change Productivity Estimates by Inverting the Question
Past research has shown that people underestimate the time they need to complete large tasks, whereas completion times for smaller tasks are often overestimated, suggesting higher productivity estimates for larger than for smaller tasks. By replacing the traditional question about how much time a given work will take with a question about how much work can be completed within a given amount of time, we found the opposite pattern. This could reflect a general tendency to underestimate large amounts relatively to small ones both for durations and for amounts of work. We explored this idea in two studies where students estimated reading tasks, a third where IT-professionals estimated software projects, and a fourth where participants imagined a familiar walk, divided into time segments or part distances of varying lengths.
Afilliation | Software Engineering, Software Engineering |
Publication Type | Journal Article |
Year of Publication | 2011 |
Journal | Applied Cognitive Psychology |
Volume | 25 |
Number | 2 |
Pagination | 314-323 |
Journal Article
The Effects of Request Formats on Judgment-Based Effort Estimation
Journal of Systems and Software 83 (2010): 29-36.Status: Published
The Effects of Request Formats on Judgment-Based Effort Estimation
In this paper we study the effects of a change from the traditional request “How much effort is required to complete X?” to the alternative “How much can be completed in Y work-hours?”. Studies 1 and 2 report that software professionals receiving the alternative format provided much lower, and presumably more optimistic, effort estimates of the same software development work than those receiving the traditional format. Studies 3 and 4 suggest that the effect belongs to the family of anchoring effects. An implication of our results is that project managers and clients should avoid the alternative estimation request format.
Afilliation | Software Engineering, Software Engineering |
Publication Type | Journal Article |
Year of Publication | 2010 |
Journal | Journal of Systems and Software |
Volume | 83 |
Number | 1 |
Pagination | 29-36 |