Commentary: What's Wrong with Average?

W. Allen Wallace
Oct 21, 2020
16 min read

Updated: Oct 13, 2021

Large and small man, both wearing average size clothing that does not fit — “A horse that can count to ten is a remarkable horse, not a remarkable mathematician.” -Samuel Johnson

Dear Family, Friends, and Clients:

On September 18, 1947, The United States Air Force was designated as a separate branch of the United States Armed Forces as part National Security Act of 1947. Previously, the branch was known as the United States Army Air Forces and was a subdivision of the U.S. Army. The organization had roots going back to 1907 when it was officially established as the Aeronautical Division, Signal Corps. The new leadership of the USAF encountered a serious problem within its fledging branch: US pilots could not maintain control of their aircraft and were crashing at an unexpected rate. In fact, on a single day during this period, there were 17 independent incidents or accidents logged. Almost all these events were classified as pilot error because mechanical failure was highly irregular. It was an incredible mystery why expertly trained and decorated US pilots were having so many mishaps.

After several thorough investigations could not identify obvious mechanical or training insufficiencies, USAF engineers turned their attention to cockpit design, which was based on data collected from measuring hundreds of pilots in 1926. For over 20 years, cockpit layout had been consistently tailored to fit the “average” 1926 pilot; the size of the cockpit, seat depth, windshield size, and even flight suits and helmet straps were based on these measurements. Several theories, including a large increase in pilot size, were examined with no obvious conclusions. In response, the USAF commissioned the largest study on pilot size ever attempted. In 1950, thousands of pilots were measured on 140 criteria including Stature, Chest Circumferences, Sleeve Length, Crotch Height, Hip Circumference, Thigh Circumference, and Crotch Length.

A 23-year-old Lieutenant named Gilbert Daniels was hired to help with the measurement study. Daniels had been a physical anthropology major at Harvard University, a specialty that studies the unique anatomy of the human body. As Lt. Daniels measured and compiled data on 4,063 USAF pilots at Wright Air Force Base, he started to harbor doubts that “average” was relevant when it comes to human anatomy. He examined data on the 10 most important measurements to cockpit design and found the following data, which was published in his report: The “Average Man”. In the report, he identified the following:

of the original 4063 men

1055 were of approximately average stature

of these 1055 men

302 were also of approximately average chest circumference

of these 302 men

143 were also of approximately average sleeve length

of these 143 men

73 were also of approximately average crotch height

of these 73 men

28 were also of approximately average torso circumference

of these 28 men

12 were also of approximately average hip circumference

of these 12 men

6 were also of approximately average neck circumference

of these 6 men

3 were also of approximately average waist circumference

of these 3 men

2 were also of approximately average thigh circumference

of these 2 men

0 were also of approximately average crotch length

His report stated:

“The tendency to think in terms of the ‘average man’ is a pitfall into which many persons blunder when attempting to apply human body size data to design problems. Actually, it is virtually impossible to find an ‘average man’ in the Air Force population. This is not because of any unique traits of this group of men, but because of the great variability of bodily dimensions which is characteristic of all men.”

Luckily, the USAF took this report seriously. They realized that the cockpit must be fit to the man, and that it is nearly impossible to fit the man to the cockpit. Against the great protests of impossibility from manufacturers- adjustable seats, levers, and flight suits were designed and incorporated into the aircraft. These are standard features that we now take for granted in our automobiles. Incidents and accidents plummeted and flying became much safer as the USAF became the greatest military air division in the world. It appears that “any system designed around the average person is doomed to fail”, but why did we focus so much on average?

The Beginnings of Average:

What does it mean to be average? Merriam-Webster says that average is “a single value (such as a mean, mode, or median) that summarizes or represents the general significance of a set of unequal values”

The concept of “average” is revolutionary because it seeks to make data more useful by discarding information; namely, the individuality of each observation is ignored, and the summary statistic is used for analysis. Certain important knowledge like the mental state of the observer, or the age and accuracy of the equipment utilized, or the current relative humidity in the testing environment is ignored and assumed to cancel with equal and opposite magnitude effects of other anomalies. The first documented use of a “mean observation” was from 1635 according to The Seven Pillars of Statistical Wisdom, and was used to establish “variation of the needle” in calculating the adjustment factor needed to reconcile “true north” from “magnetic north” for documentation in navigational maps.

In 1874 Venus made transit across the face of the sun for the first time in 105 years, and astronomers worldwide were intent on calculating the precise duration of the journey. They flocked to the most advantageous locations for viewing across the globe. The intent was to utilize the time of transit to make calculations of the size of the solar system. Unfortunately, there was a wide variation in time calculated among astronomers. Differences in skill, equipment, weather, location, and time all contributed to this variance. The agreed upon solution was to take the average calculated time as the official statistic, as had been the convention in astronomy for decades.

Adolphe Quetelet (born in Belgium in 1796) was an unquestionably brilliant polymath who specialized in astronomy, mathematics, statistics, and sociology. As an astronomer he had observed first-hand the difficulty of measuring astronomical distance using eyes and stopwatches and learned the power of using averaging to standardize variant measures between astronomers that would later be used after the transit of Venus in 1874. In 1823 Quetelet convinced the Belgian government to build its first observatory and name him as the director. Unfortunately, during the Belgian Revolutionary War in 1830, the observatory was occupied by rebel forces, and Quetelet was unable to return to his position. The experience had a deep effect on a man who for most of his life had been insulated from social forces and politics and he concluded that if averaging was sufficient to measure the heavens, surely it was also sufficient for measuring humans.

Quetelet set out to build an academic discipline of Social Physics. His goal was to use mathematics to develop laws and policies that would lead to stability and prevent social disorder. In the 1840’s Quetelet analyzed a dataset that measured the chest size of 5,738 Scottish soldiers for use in constructing uniforms. He added all the observations together and divided by the number of observations and determined that the average chest size of a Scottish soldier was 39.75 inches. So what?

Quetelet set out to define the importance of this new statistic. Is it an estimate of normal? Is it what one would expect in a randomly selected individual? Some other esoteric meaning? Quetelet, having come from astronomy took the astronomer’s view of the importance of the mean. In astronomy it was assumed that all observations taken by a human observer contained some sort of error factor. The mean number was the minimization of all these errors and the true value of the measured statistic. It seems that average was the pinnacle of perfection, and that all deviations from average were errors or deformities. This opinion was corroborated by a proof completed by Carl Gauss that posited that the average was as close to the truth as possible. It seems that in Quetelet’s mind, we are all a collection of errors deviating from the divine template of design intended by our creator. Quetelet went on to establish the Body Mass Index (BMI) to allow us all to revel in our divergence from perfection.

A former pupil of Quetelet, Sir Francis Galton, was a wealthy Englishman whose family money had come from gun manufacturing and banking. While he admired Quetelet, he disagreed on one major point: average is not universal perfection, it should be improved upon at all costs. Average was synonymous with mediocre in his opinion. He stratified individuals by their relation to average into three categories: Eminent, Mediocre, and Imbecile. Further, Galton believed that one’s relation to average was universal. Eminent individuals were inherently smart, athletic, and savvy in business matters. The mediocre, were doomed to remain so in every way. To facilitate his shift from the ideal of average to the mediocrity of average, he sought to transform the definition from “error” to “rank”. An individual’s relative position to average was no longer the imperfection, it was the order in which this individual existed among his peers. Lest you scoff at the wisdom of this thinking, consider our modern usage of things like standardized tests and entrance exams in evaluating the academic potential of our children. So here we are in modern society, with Quetelet’s concept of the average man being perfect, and Galton’s idea of rank in determining an individual’s worth; one man striving to achieve average, another trying to be as far from average as possible.

Standardization and Ergodicity:

“In the past man was first, in the future the system will be first.” - Frederich Winslow Taylor

Frederich Winslow Taylor was born in 1856 to a wealthy Pennsylvania family. His father practiced law. During his childhood he spent two years studying in Prussia, one of the first European countries to adopt the theories of Quetelet in their educational systems. When he returned home, he attended Phillips Exeter Academy in New Hampshire. One of the formative moments of his education was in the way homework assignments were calculated using the time it took an average student to complete a problem. The instructor would have the class complete problems and snap their fingers upon completion so that the correct number of problems could be assigned to take the average boy 2 hours to complete. This type of organizational methodology inspired Taylor’s later work.

Taylor came of age amid the industrial revolution and had a front row seat to the migration from agrarian economics to industrial economics. He took a divergent path from his father and instead of studying law at Harvard he took a series of manufacturing jobs and eventually worked his way up to chief engineer at Midvale Steelworks. Before this time, factory processes were disorganized and undocumented. Companies hired the most talented workers available and had them design processes that worked for them to get the job completed in the most efficient manner. Taylor believed that innovation had no place in a modern factory. He wanted average employees who could poke a button in a certain way, at a certain time, and that all innovation and improvements should be performed on the system, by managers.

Taylor revolutionized the manufacturing industry by designing factories and systems that were incredibly efficient and that produced consistent outcomes. One can wonder if this was rewarding at all for the drones who labored in the factories, but the owners and managers were greatly satisfied by the results. Taylor even determined at one point that exactly 21 pounds of coal was the correct amount to shovel in one scoop. Shovel size, number of scoops per minute, and distance of the coal pile from the furnace were all carefully calibrated to produce ultimate efficiency.

This idea carried over into our education system. In the 1920’s, instead of teaching children to excel in school, the goal was to give a minimum education to everyone to build an efficient and obedient workforce to run Taylor’s factories. The General Education Board, founded by John D. Rockefeller, published a report in 1912 that said: “We shall not try to make these people or any of their children into philosophers or men of learning or of science. We are not to raise up from among them authors, orators, poets, or men of letters. We shall not search for embryo great artists, painters, musicians…nor lawyers, doctors, preachers, politicians, statesmen, of whom we have ample supply…The task that we set before ourselves is very simple as well as very beautiful…we will organize our children into a little community and teach them to do in a perfect way the things their fathers and mothers are doing in an imperfect way.”

Well, that sounds awful. One final note on the evolution of our education system involves Edward Thorndike. Thorndike was a Taylorist at heart but disagreed that providing a minimum of education to all was the ideal. He was a little more Galton than Quetelet and proposed a system for ranking students to pluck the gifted and talented students from the masses and give them a little better than average education and put them on a path to go to college and graduate into the professional class. Thorndike spent a good part of his later career developing and implementing systems and processes for testing and ranking students to decide who should get the good education, and who should get the standard. This system is not only the method by which our students are measured, it also greatly impacts our evaluation of teachers.

Peter Molenaar was a Professor in the Netherlands who asked an important question about the way we currently use averages to evaluate people: “how can we make decisions about an individual, when by definition averaging removes the individual’s characteristics from the data”? His question was met with outrage by his peers, and his proposal was even accused of being “anarchy”. But the important conclusion is a matter of process. The methodology of evaluating variables with averages uses the following process: aggregate, then analyze. Molenaar wondered what would happen if that process was reversed to: analyze, then aggregate. If this entire diatribe so far seems like an allegory, you have a keen eye. I would like to build on Molenaar’s question and ask what would happen if we analyzed our securities first, then aggregated. Oddly enough, that is the way it used to be done.

Modern Finance:

“Carefully and expertly formed judgments concerning the potentialities and weaknesses of securities form the best basis upon which to analyze portfolios”. - Harry M. Markowitz, Portfolio Selection

Modern Finance seems to be a little too much Quetelet and not quite enough Galton. The entire industry attempts to achieve average results, and maybe even earn enough to cover expenses. In addition, deviations from the mean in either direction are treated with equal disdain, and securities that have the nerve to outperform our expectations are punished equally to those that disappoint. In our opinion, upside surprises should be strived for and downside surprises should be protected against. Markowitz touches on this conundrum in Portfolio Selection, but his decision to use Standard Deviation instead of Semi-Variance (only penalizing for downside deviations) seems to have rested mostly on the difficulty involved in the necessary calculations, which he explicitly discloses.

Why does an entire industry of analysts and portfolio managers attempt to be average? It seems the most likely reason is that nobody ever got fired for failing conventionally. There is great risk in following your own path, even when evidence and common sense are clearly on your side. If you fail unconventionally people call you an imbecile, to use Galton’s language, and when you succeed conventionally your peers consider you eminent. In order to blaze a trail unconquered by convention, you must be more right and more careful than those seeking warmth and refuge in the center of the herd. Our contention, to paraphrase Benjamin Graham, is that we are never right or wrong because people agree with us, we are right because of the merits of our analysis.

Considering quarterly, or even annual, performance in a vacuum when your goal extends over decades is a risky proposition. Hopping from one style to the next and embracing fads that are characterized by high frequent gains, followed by infrequent catastrophic losses, is no way to comfortably build wealth over time. Consistently swinging for Grand Slams even when the bases are empty, when singles and doubles will score more runs over time, is an exercise in insanity. When the neighbor is up 5% and you are only up 3% it sometimes feels like something needs to be done quickly, lest we get left behind in the dust while everyone else gets rich. These are all normal emotions, but we firmly believe that maximizing returns without regard to risk can be hazardous to your wealth. Keep in mind you will never hear about your brother-in-law’s losses; it seems he is always so lucky at the casino, but strange how he never offers to pick up the tab at dinner. Focusing on your goals, and your progress toward those goals, is the only rational methodology for maintaining your assets over a lifetime of fads, scams, and terrible ideas backed with math.

So why do we evaluate our portfolios based on average annual returns? The history of mark-to-market accounting is a fantastically boring journey, but in the 1930’s it was suspended by Franklin Roosevelt to protect the solvency of banking institutions. Later in the 1960’s it was brought back to make corporate balance sheets more current. There was a period where the important metric by which an individual would track her investments was current dividends in relation to historical cost. Growing dividends can be a signal that revenue and earnings are expanding; and declining or eliminated dividends can be a key indicator that a company has fallen on hard times. This methodology has a built-in check to detect success and catastrophe. While this system is not recommended because it does not encapsulate all the relevant metrics involved in evaluating the merits of a particular investment portfolio, it does minimize panic in relation to security price fluctuations, and protect your capital in many cases before calamity strikes.

In modern times we take the price of a security or portfolio at the end of the year, subtract the price at the beginning of the year, add all dividends and interest received for the current year, and divide by the price at the beginning of the year. I contend that this is an ineffective method for evaluating the strength of a portfolio of volatile securities. Imagine what happens if a portfolio rises by 10% on the last day of the year only to fall 10% on the first day of the year. In this example, returns for the previous year look incredible, but the current year starts with a large loss that may take the entire year to resolve. Was any additional wealth created or destroyed with the security price fluctuations? I would contend that it was not.

This confusion is why we track the value of our investments and use the price to help us decide what actions we should take in relation to our securities. The price we are offered in a market is solely for our convenience, and if other market participants are willing to overpay us for our securities, we should oblige, and if they are asking to underpay us we should decline. So long as the earnings power of our holdings continues to remain stable, the company will continue to become more valuable over time, if interest rates do not rise a considerable amount.

A further difficulty in achieving high investment performance comes from Markowitz’s (and other financial theory proponent’s) assertion that “diversification is the only free lunch in finance”. Markowitz tells us in Portfolio Selection that degree of correlation among securities is what gives diversification its power. He cites insurance underwriting, and how lack of correlation among calamitous events allows them to earn an underwriting profit by making a lot of different bets. He further states that if all securities were 100% correlated to each other, diversification would not reduce risk at all. Since correlation lies somewhere between 0 and 1 we can use diversification to reduce risk.

I don’t disagree with the math of this statement, but the sentiment leaves me uneasy. Markowitz previously stated that the techniques he proposes are “concerned with the analysis of portfolios containing large numbers of securities” and that he is attempting to determine “a most suitable portfolio for the large private or institutional investor”. He uses the example of being able to be reasonably certain that a large number of coin flips will be approximately half heads and half tails if the flips are not correlated to each other. The same law of large numbers is at work in investment portfolios. The more securities we own, the closer to “average” our return will be on both the upside, and the downside. The issue with settling for average investment returns lies in the fact that they are characterized by wide fluctuations from fair value and significant volatility. While we may hum along earning high returns for a decade, if we do not pay attention to the prices we pay, the majority of our returns can be taken in an instant.

Research has found that the more concentrated an investment portfolio, the better the results are in the hands of a skillful investment practitioner. This is why we attempt to build portfolios with exposure to a small number of carefully selected securities that we know something about. We attempt to analyze first, then aggregate so that all the individual characteristics of the portfolio that averaging eliminates were in our favor before we discarded this information to make up our mind about whether or not we like our portfolio. The biggest problem with most modern-day investment portfolios is that they contain too many securities to exploit the superior security selection skills of an able and competent investment practitioner. The fear of deviating too far from “normal” dooms a portfolio to track average, less expenses, of course.

Our evaluation of performance on an annual or quarterly basis also works against our interests. If we have purposely chosen to deviate from average in an attempt to avoid large, infrequent, but catastrophic losses, we have to be patient and understand that the seeds of that eventual destruction are sown in the pulling forward of future investment returns in securities untethered from reality. One surefire way to determine whether your returns are earned or not is by looking at the value of your underlying companies. It is an undeniable law of nature that more money cannot be taken out of a company over time from all shareholders collectively than flows into it. These flows can be in the form of earnings, bond sales or stock sales. If someone takes out $1,000 from a company that never earns a dollar, either other shareholders will have to put in funds in the form of equity or bond sales, or some future shareholder will lose her money. While it is possible for this relationship to break down over a few years, it is impossible for it to be violated over decades.

Most research studies and portfolio selection activities in finance use the opposite approach. They are stuck in the world of Quetelet and Galton because they attempt to pick a useful metric first, such as PE Ratio or Revenue Growth, and then build portfolios using these metrics with a blind eye to all the individual data that contribute to an individual company’s success or failure. They further go down the road of Frederich Winslow Taylor and try to systematize every decision until the all the nuance and creativity are removed from the process and the entire industry owns the same securities. In addition, they use regulatory bodies and government agencies to try to promulgate and enforce their “system” as the only rational methodology of managing your money.

Your life, your dreams, and your goals are not average. To you and your family they are the most eminent goals in existence. Our job is to help you make decisions that are calculated and mathematical when it is useful, and to interject common sense into the situation when mathematics will not do. Further, we are trying to help you reach goals over decades, not over months. Our approach and our principles are built to avoid faulty logic and temporary trends that will lead to financial ruin, even if it is what everyone else is doing. I find no comfort in failure just because all my peers failed too. Our commitment to you is indestructible, no matter how strong the wind, how high the water, or how dark the night.

Warm Regards,

Allen

W. Allen Wallace, CFA, CPA/PFS, CFP®

Chief Investment Officer

Works Consulted:

Rose, T. (2016). The End of Average: How We Succeed in a World That Values Sameness (Illustrated ed.). HarperOne. The beginning of this article is a book report on Todd Rose’s The End of Average. It is a fascinating read that I cannot recommend enough if you liked the opening stories about how average became such a fixture in our thinking, and what the unforeseen consequences are.
Wright Air Development Center, & Daniels, G. (1952, December). The Average Man (No. AD010203). Aero Medical Laboratory.
Stigler, S. M. (2016). The Seven Pillars of Statistical Wisdom (Illustrated ed.). Harvard University Press.
Markowitz, H. M. (1991). Portfolio Selection: Efficient Diversification of Investments (2nd ed.). Wiley.