Powered by Blogger.

Monte Carlo Simulation - Part 1

Preliminary: How do mathematicians model randomness?

In this two-part post, I will introduce Monte Carlo methods, an umbrella term for techniques involving many random simulations of a phenomenon of interest to obtain a numerical result (or range of results). This post is a response (albeit a late one- my apologies for that) to a reader request from Anonymous dated August 2016.

In Part 1, I will explain at a high level what a Monte Carlo simulation is, what kinds of typical inputs and outputs we may expect, and the benefits and limitations of Monte Carlo methods. I use the example of retirement/investment planning software, which Anonymous also mentioned in his request.

In Part 2, I will walk through a considerably simpler example in detail: a Monte Carlo method to approximate the value of $\pi$. I will also explain two well known statistical results, the Law of Large Numbers and the Central Limit Theorem, which justify the use of Monte Carlo methods and allow us to quantify their accuracy. The $\pi$ simulation will directly illustrate the LLN and CLT.


Monte Carlo methods - high-level overview


In the most general terms, a Monte Carlo simulation is a method whereby we simulate one or many random inputs many times (usually thousands or millions), making assumptions as to their probability distributions, and combine the results in a deterministic (i.e. non-random) way to obtain a numerical output. In this section, I will explain in layman's terms why and how we may do this as well as some limitations of these methods.

When to employ a Monte Carlo method

In the reader request, Anonymous mentioned a typical situation in which an analyst may employ a Monte Carlo method: a wealth manager wishes to build a portfolio of various assets for a client with the goal of providing sufficient income for the client after his retirement. More precisely, for a prospective portfolio to be considered acceptable, it must have a sufficiently high probability of providing enough cash to cover the client's estimated living costs at all times after his retirement date.

Suppose we would like to assess the viability of a proposed portfolio consisting of a number of equities (stocks) and fixed-income assets (bonds). We will need to make static assumptions about the client's age, retirement age, initial portfolio size, and post-retirement income and living costs. We will also need to make certain assumptions regarding the probability distributions of interest rates, equity returns, debt yields, dividend rates, inflation, mortality, etc. Note that we may re-categorize some of the "static" assumptions as variable and vice-versa, depending on the goals of the analysis.

In any case, with so many variable inputs, it is no simple task to determine the probability that the proposed portfolio will be acceptable at any given time, let alone at all times. The number of random inputs could be enormous, and we do not necessarily have a tractable way of combining them all to arrive at a probability distribution for the portfolio value at some time $t$, so we can't solve this out analytically. However, we do have clear assumptions for all the inputs, and we have computers.

Instead of an analytical solution, we can use a Monte Carlo simulation to arrive at a numerical solution.

Typical inputs, outputs, and interpretation

I mentioned before that a Monte Carlo simulation consists of many "runs" of a deterministic procedure based on randomized inputs.

To perform a single run of a Monte Carlo simulation, we program a computer to simulate each input by drawing random numbers from an assumed probability distribution. In the retirement example, we would randomly generate a sample path over time for each relevant interest rate, stock price, bond price, etc. Given these paths, we can (deterministically!) compute the value of the portfolio at each time, and then it is a matter of arithmetic to determine whether the portfolio covered the living costs at each time or failed. This run represents one of the many possible scenarios which could occur.

We repeat this process for, say, 100,000 runs (in Part 2, I will elaborate further on how many runs we need to use), calculating the simulated probability of the portfolio's success as the number of successes divided by 100,000. If that probability exceeds a predetermined threshold (e.g. 90%), then we deem the portfolio "acceptable".

A typical software would also provide more sophisticated chart outputs such as the below (which I conveniently found on Google). The first image shows median portfolio values at each time based on certain fixed input parameters, with the median taken over all the simulation runs.

The second image shows a "heat map" of success probabilities based on different values of the input parameters. Each pixel on the heat map summarizes a full probability distribution (presumably of portfolio shortfall) for a fixed set of values of the input parameters: a green pixel indicates that the input parameter values (the $x$- and $y$-axis values) lead to a portfolio which is likely to cover costs in each year after retirement, while a red pixel indicates that the parameter values lead to a portfolio which is more likely to fail, i.e. to not cover costs (even with previous years' excesses) in at least one year after retirement.



The software must make assumptions about the statistical properties of the portfolio in order to generate chart outputs like the above. With respect to the probabilistic inputs (these do not include the deterministic inputs such as retirement age, tax rates, initial portfolio size, etc.), the images show only a mean and standard deviation of the investment returns. Therefore, this software likely assumes Normally distributed returns.

It's also worth noting that, for example, Life Expectancy seems to be an input here with a value of 95. This suggests that the simulations may not be randomizing that input. In interpreting this output, it would be important to read the software documentation and understand the major assumptions employed.


Benefits and limitations of Monte Carlo methods

The most obvious benefit of a Monte Carlo simulation is that it allows us to run millions of complex scenarios in just seconds or minutes, a task which would be impossible without a computer. Another key benefit is a bit more subtle.

If we didn't have computers available, we may attempt to answer our retirement question using "what-if scenarios": we would choose a few sets of very simple assumptions about equity and debt returns which allow us to use "back-of-the-envelope" arithmetic to calculate the portfolio value. For example, we may assume fixed, non-random annual returns with three cases as follows:


We may even go a step further and assume probability distributions for these returns (e.g. Normal distributions) which allow us to arrive at an analytical solution for the portfolio value's probability distribution over time. This is certainly a reasonable approach; however, these what-if scenarios give us no indication of how likely the three different cases are. A Monte Carlo simulation, on the other hand, gives us a distribution of possible outcomes and the probabilities of those outcomes. Unlikely scenarios will not be over-represented among the simulation runs.

Of course, no method is without limitations. On the practical side, without readymade software, we typically need to write our own code to perform a Monte Carlo simulation; this takes time and effort which may not be necessary. For very complex simulations, we may need to worry about the integrity of the random number generator(s) and, in some cases, the speed of the code, though these are beyond the scope of this post.

From a more qualitative standpoint, we have the "garbage in, garbage out" principle: a Monte Carlo simulation, like any analysis, is based on certain input assumptions. We must always keep in mind which assumptions went into a simulation when interpreting its results. These assumptions include both input probability distributions and the input parameter values used to calibrate those distributions.

For example, the chart outputs above probably assumed "normal" market conditions and Normally distributed returns. The above simulation may therefore underestimate the probability of portfolio failure due to a financial crisis (a non-normal and certainly non-Normal event). Similarly, if life expectancy was indeed a static input, then the analysis may underestimate the probability of failure for people who are very healthy and thus likely to live longer than average.

Finally, we must determine "reasonable" values of the input parameters. More often than not, we base these on historical data as our best guess. This implies that the simulation will weight possible scenarios based on their historical relative frequencies, while future frequencies may differ. On the other hand, assuming different parameter values based on our own informed estimates clearly introduces a different bias into the model.

In conclusion, Monte Carlo simulation is a powerful tool, but it is still just that: a tool. We must remain critical of our analysis and the assumptions that underlie it.

Stay tuned for Part 2, in which we will walk through a simple Monte Carlo method to estimate $\pi$ and introduce the Law of Large Numbers and Central Limit Theorem.