MARI elixir report

THE TIME VALUE OF MONEY IN

FINANCE

PROFESSOR’S NOTE

The examples we use in this reading are meant to show how the time value of money appears throughout finance. Don’t worry if you are not yet familiar with the securities we describe in this reading. We will see these examples again when we cover bonds and forward interest rates in Fixed Income, stocks in Equity Investments, foreign exchange in Economics, and options in Derivatives.

WARM-UP: USING A FINANCIAL CALCULATOR

For the exam, you must be able to use a financial calculator when working time value of money problems. You simply do not have the time to solve these problems any other way.

CFA Institute alflows only two types of calculators to be used for the exam: (1) the Texas

Instruments^® TI BA II Plus™ (including the BA II Plus Professional™) and (2) the HP^®12C (including the HP 12C Platinum). This reading is written primarily with the TI BA II Plus in mind. If you do not already own a calculator, purchase a TI BA II Plus! However, if you already own the HP 12C and are comfortable with it, by all means, continue to use it.

Before we begin working with financial calculators, you should familiarize yourself with your TI BA II Plus by locating the keys noted below. These are the only keys you need to know to calculate virtually all of the time value of money problems:

N = number of compounding periods

I/Y = interest rate per compounding period

PV = present value

FV = future value

PMT = annuity payments, or constant periodic cash low CPT = compute

The TI BA II Plus comes preloaded from the factory with the periods per year function (P/Y) set to 12. This automatically converts the annual interest rate (I/Y) into monthly rates. While appropriate for many loan-type problems, this feature is not suitable for the vast majority of the time value of money applications we will be studying. So, before using our SchweserNotes™, please set your P/Y key to “1" using the following sequence of keystrokes:

As long as you do not change the P/Y setting, it will remain set at one period per year until the battery from your calculator is removed (it does not change when you turn the calculator on and off). If you want to check this setting at any time, press [2nd] [P/Y]. The display should read P/Y = 1.0. If it does, press [2nd] [QUIT] to get out of the “programming” mode. If it does not, repeat the procedure previously described to set the P/Y key. With P/Y set to equal 1, it is now possible to think of I/Y as the interest rate per compounding period and N as the number of compounding periods under analysis. Thinking of these keys in this way should help you keep things straight as we work through time value of money problems.

PROFESSOR’S NOTE

We have provided an online video in the Resource Library on how to use the TI calculator. You can view it by logging in to your account at www.schweser.com.

MODULE 2.1: DISCOUNTED CASH FLOW

VALUATION

Video covering this content is available online.

LOS 2.a: Calculate and interpret the present value (PV) of fixed income and equity instruments based on expected future cash flows.

In our Rates and Returns reading, we gave examples of the relationship between present values and future values. We can simplify that relationship as follows:

If we are using continuous compounding, this is the relationship:

Fixed-Income Securities

One of the simplest examples of the time value of money concept is a pure discount debt instrument, such as a zero-coupon bond. With a pure discount instrument, the investor pays less than the face value to buy the instrument and receives the face value at maturity. The price the investor pays depends on the instrument’s yield to maturity (the discount rate applied to the face value) and the time until maturity. The amount of interest the investor earns is the difference between the face value and the purchase price.

EXAMPLE: Zero-coupon bond

A zero-coupon bond with a face value of $1,000 will mature 15 years from today. The bond has a yield to maturity of 4%. Assuming annual compounding, what is the bond’s price?

Answer:

We can infer a bond’s yield from its price using the same relationship. Rather than solving for r with algebra, we typically use our financial calculators. For this example, if we were given the price of $555.26, the face value of $1,000, and annual compounding over 15 years, we would enter the following:

Then, to get the yield, CPT I/Y = 4.00.

PROFESSOR’S NOTE

Remember to enter cash out flows as negative values and cash in flows as positive values. From the investor’s point of view, the purchase price (PV) is an out low, and the return of the face value at maturity (FV) is an in low.

In some circumstances, interest rates can be negative. A zero-coupon bond with a negative yield would be priced at a premium, which means its price is greater than its face value.

EXAMPLE: Zero-coupon bond with a negative yield

If the bond in the previous example has a yield to maturity of −0.5%, what is its price, assuming annual compounding?

Answer:

A fixed-coupon bond is only slightly more complex. With a coupon bond, the investor receives a cash interest payment each period in addition to the face value at maturity. The bond’s coupon rate is a percentage of the face value and determines the amount of the interest payments. For example, a 3% annual coupon, $1,000 bond pays 3% of $1,000, or $30, each year.

The coupon rate and the yield to maturity are two different things. We only use the coupon rate to determine the coupon payment (PMT). The yield to maturity (I/Y) is the discount rate implied by the bond’s price.

EXAMPLE: Price of an annual coupon bond

Consider a 10-year, $1,000 par value, 10% coupon, annual-pay bond. What is the value of this bond if its yield to maturity is 8%?

Answer:

The coupon payments will be 10% × $1,000 = $100 at the end of each year. The $1,000 par value will be paid at the end of Year 10, along with the last coupon payment.

The value of this bond with a discount rate (yield to maturity) of 8% is:

The calculator solution is:

The bond’s value is $1,134.20.

PROFESSOR’S NOTE

For this reading where we want to illustrate time value of money concepts, we are only using annual coupon payments and compounding periods. In the Fixed Income topic area, we will also perform these calculations for semiannual-pay bonds.

Some bonds exist that have no maturity date. We refer to these as perpetual bonds or perpetuities. We cannot speak meaningfully of the future value of a perpetuity, but its present value simplifies mathematically to the following:

An amortizing bond is one that pays a level amount each period, including its maturity period. The difference between an amortizing bond and a fixed-coupon bond is that for an amortizing bond, each payment includes some portion of the principal. With a fixed-coupon bond, the entire principal is paid to the investor on the maturity date.

Amortizing bonds are an example of an annuity instrument. For an annuity, the payment each period is calculated as follows:

We can also determine an annuity payment using a financial calculator.

EXAMPLE: Computing a loan payment

Suppose you are considering applying for a $2,000 loan that will be repaid with equal end-of-year payments over the next 13 years. If the annual interest rate for the loan is 6%, how much are your payments?

Answer:

The size of the end-of-year loan payment can be determined by inputting values for the three known variables and computing PMT. Note that FV = 0 because the loan will be fully paid off after the last payment:

Equity Securities

As with fixed-income securities, we value equity securities such as common and preferred stock as the present value of their future cash fflows. The key differences are that equity securities do not mature, and their cash flows may change over time.

Preferred stock pays a fixed dividend that is stated as a percentage of its par value (similar to the face value of a bond). As with bonds, we must distinguish between the stated percentage that determines the cash flows and the discount rate we apply to the cash flows. We say that equity investors have a required return that will induce them to own an equity share. This required return is the discount rate we use to value equity securities.

Because we can consider a preferred stock’s fixed stream of dividends to be infinite, we can use the perpetuity formula to determine its value:

EXAMPLE: Preferred stock valuation

A company’s $100 par preferred stock pays a $5.00 annual dividend and has a required return of 8%. Calculate the value of the preferred stock.

Answer:

Value of the preferred stock: D_p/k_P = $5.00/0.08 = $62.50

Common stock is a residual claim to a company’s assets after it satisfies all other claims. Common stock typically does not promise a fixed dividend payment. Instead, the company’s management decides whether and when to pay common dividends.

Because the future cash fflows are uncertain, we must use models to estimate the value of common stock. Here, we will look at three approaches analysts use frequently, which we call dividend discount models (DDMs). We will return to these examples in the Equity Investments topic area and explain when each model is appropriate.

1. Assume a constant future dividend. Under this assumption, we can value a common stock the same way we value a preferred stock, using the perpetuity formula.

2. Assume a constant growth rate of dividends. With this assumption, we can apply the constant growth DDM, also known as the Gordon growth model. In this model, we state the value of a common share as follows:

In this model, V₀ represents the PV of all the dividends in future periods, beginning with D₁. Note that k_e must be greater than g_c or the math will not work.

EXAMPLE: Gordon growth model valuation

Calculate the value of a stock that is expected to pay a $1.62 dividend next year, if dividends are expected to grow at 8% forever and the required return on equity is 12%.

3. Assume a changing growth rate of dividends. This can be done in many ways. The example we will use here (and the one that is required for the Level I CFA exam) is known as a multistage DDM. Essentially, we assume a pattern of dividends in the short term, such as a period of high growth, followed by a constant growth rate of dividends in the long term.

To use a multistage DDM, we discount the expected dividends in the short term as individual cash flows, then apply the constant growth DDM to the long term. As we saw in the previous example, the constant growth DDM gives us a value for an equity share one period before the dividend we use in the numerator.

EXAMPLE: Multistage growth

Consider a stock with dividends that are expected to grow at 15% per year for two years, after which they are expected to grow at 5% per year, indefinitely. The last dividend paid was $1.00, and k_e = 11%. Calculate the value of this stock using the multistage growth model.

Answer:

Calculate the dividends over the high growth period:

Calculate the first dividend of the constant-growth period:

Use the constant growth model to get P₂, a value for all the (infinite) dividends expected from time = 3 onward:

Finally, we can sum the present values of dividends 1 and 2 and of P₂ to get the present value of all the expected future dividends during both the high-growth and constant-growth periods:

PROFESSOR’S NOTE

A key point to notice in this example is that when we applied the dividend in Period 3 to the constant growth model, it gave us a value for the stock in Period 2. To get a value for the stock today, we had to discount this value back by two periods, along with the dividend in Period 2 that was not included in the constant growth value.

MODULE QUIZ 2.1

1. Terry Corporation preferred stock is expected to pay a $9 annual dividend in perpetuity. If the required rate of return on an equivalent investment is 11%, one share of Terry preferred should be worth: A. $81.82.

B. $99.00.

C. $122.22.

2. Dover Company wants to issue a $10 million face value of 10-year bonds with an annual coupon rate of 5%. If the investors’ required yield on Dover’s bonds is 6%, the amount the company will receive when it issues these bonds (ignoring transactions costs) will be:

A. less than $10 million.

B. equal to $10 million.

C. greater than $10 million.

MODULE 2.2: IMPLIED RETURNS AND

CASH FLOW ADDITIVITY Video coveringthis content is

available online.

LOS 2.b: Calculate and interpret the implied return of fixed-income instruments and required return and implied growth of equity instruments given the present value (PV) and cash flows.

The examples we have seen so far illustrate the relationships among present value, future cash flows, and the required rate of return. We can easily rearrange these relationships and solve for the required rate of return, given a security’s price and its future cash flows.

EXAMPLE: Yield of an annual coupon bond

Consider the 10-year, $1,000 par value, 10% coupon, annual-pay bond we examined in an earlier example, when its price was $1,134.20 at a yield to maturity of 8%.

What is its yield to maturity if its price decreases to $1,085.00?

Answer:

The bond’s yield to maturity increased to 8.69%.

Notice that the relationship between prices and yields is inverse. When the price decreases, the yield to maturity increases. When the price increases, the yield to maturity decreases. Or, equivalently, when the yield increases, the price decreases. When the yield decreases, the price increases. We will use this concept again and again when we study bonds in the Fixed Income topic area.

In our examples for equity share values, we assumed the investor’s required rate of return. In practice, the required rate of return on equity is not directly observable. Instead, we use share prices that we can observe in the market to derive implied required rates of return on equity, given our assumptions about their future cash flows.

For example, if we assume a constant rate of dividend growth, we can rearrange the constant growth DDM to solve for the required rate of return:

That is, the required rate of return on equity is the ratio of the expected dividend to the current price (which we refer to as a share’s dividend yield) plus the assumed constant growth rate.

We can also rearrange the model to solve for a stock’s implied growth rate, given a required rate of return:

That is, the implied growth rate is the required rate of return minus the dividend yield.

LOS 2.c: Explain the cash low additivity principle, its importance for the no arbitrage condition, and its use in calculating implied forward interest rates, forward exchange rates, and option values.

The cash low additivity principle refers to the fact that the PV of any stream of cash flows equals the sum of the PVs of the cash flows. If we have two series of cash flows, the sum of the PVs of the two series is the same as the PVs of the two series taken together, adding cash flows that will be paid at the same point in time. We can also divide up a series of cash flows any way we like, and the PV of the “pieces” will equal the PV of the original series.

This is a simple example of replication. In effect, we created the equivalent of the given series of uneven cash flows by combining a 4-year annuity of 100 with a 3-year zero-coupon bond of 300.

We rely on the cash low additivity principle in many of the pricing models we see in the Level I CFA curriculum. It is the basis for the no-arbitrage principle, or “law of one price,” which says that if two sets of future cash flows are identical under all conditions, they will have the same price today (or if they don’t, investors will quickly buy the lower-priced one and sell the higher-priced one, which will drive their prices together).

Three examples of valuation based on the no-arbitrage condition are forward interest rates, forward exchange rates, and option pricing using a binomial model. We will explain each of these examples in greater detail when we address the related concepts in the Fixed Income, Economics, and Derivatives topic areas. For now, just focus on how they apply the principle that equivalent future cash flows must have the same present value.

Forward Interest Rates

A forward interest rate is the interest rate for a loan to be made at some future date. The notation used must identify both the length of the loan and when in the future the money will be borrowed. Thus, 1y1y is the rate for a 1-year loan to be made one year from now; 2y1y is the rate for a 1-year loan to be made two years from now; 3y2y is the 2-year forward rate three years from now; and so on.

By contrast, a spot interest rate is an interest rate for a loan to be made today. We will use the notation S₁ for a 1-year rate today, S₂ for a 2-year rate today, and so on.

The way the cash low additivity principle applies here is that, for example, borrowing for three years at the 3-year spot rate, or borrowing for one-year periods in three successive years, should have the same cost today. This relation is illustrated as follows: (1 + S₃)³ = (1 + S₁)(1 + 1y1y)(1 + 2y1y).

In fact, any combination of spot and forward interest rates that cover the same time period should have the same cost. Using this idea, we can derive implied forward rates from spot rates that are observable in the fixed-income markets.

Forward Currency Exchange Rates

An exchange rate is the price of one country’s currency in terms of another country’s currency. For example, an exchange rate of 1.416 USD/EUR means that one euro (EUR) is worth 1.416 U.S. dollars (USD). The Level I CFA curriculum refers to the currency in the numerator (USD, in this example) as the price currency and the one in the denominator (EUR in this example) as the base currency.

Like interest rates, exchange rates can be quoted as spot rates for currency exchanges to be made today, or as forward rates for currency exchanges to be made at a future date.

The percentage difference between forward and spot exchange rates is approximately the difference between the two countries’ interest rates. This is because there is an arbitrage trade with a riskless pro it to be made when this relation does not hold.

The possible arbitrage is as follows: borrow Currency A at Interest Rate A, convert it to Currency B at the spot rate and invest it to earn Interest Rate B, and sell the proceeds from this investment forward at the forward rate to turn it back into Currency A. If the forward rate does not correctly reflect the difference between interest rates, such an arbitrage could generate a pro it to the extent that the return from investing Currency B and converting it back to Currency A with a forward contract is greater than the cost of borrowing Currency A for the period.

For spot and forward rates expressed as price currency/base currency, the no-arbitrage relation is as follows:

This formula can be rearranged as necessary to solve for specific values of the relevant terms.

EXAMPLE: Calculating the arbitrage-free forward exchange rate

Consider two currencies, the ABE and the DUB. The spot ABE/DUB exchange rate is 4.5671, the 1-year riskless ABE rate is 5%, and the 1-year riskless DUB rate is 3%. What is the 1-year forward exchange rate that will prevent arbitrage pro its?

Answer:

Rearranging our formula, we have:

and we can calculate the forward rate as:

As you can see, the forward rate is greater than the spot rate by 4.6558 / 4.5671 − 1 = 1.94%. This is approximately equal to the interest rate differential of 5% − 3% = 2%.

Option Pricing Model

An option is the right, but not the obligation, to buy or sell an asset on a future date for a specified price. The right to buy an asset is a call option, and the right to sell an asset is a put option.

Valuing options is different from valuing other securities because the owner can let an option expire unexercised. A call option owner will let the option expire if the underlying asset can be bought in the market for less than the price specified in the option. A put option owner will let the option expire if the underlying asset can be sold in the market for more than the price specified in the option. In these cases, we say an option is out of the money. If an option is in the money on its expiration date, the owner has the right to buy the asset for less, or sell the asset for more, than its market price— and, therefore, will exercise the option.

An approach to valuing options that we will use in the Derivatives topic area is a binomial model. A binomial model is based on the idea that, over the next period, some value will change to one of two possible values. To construct a one-period binomial model for pricing an option, we need the following:

A value for the underlying asset at the beginning of the period

An exercise price for the option; the exercise price can be different from the value of the underlying, and we assume the option expires one period from now

Returns that will result from an up-move and a down-move in the value of the underlying over one period

The risk-free rate over the period

As an example, we can model a call option with an exercise price of $55 on a stock that is currently valued (S₀) at $50. Let us assume that in one period, the stock’s value will either increase (S 1 u) to $60 or decrease (S 1 d) to $42. We state the return from an upmove (R^u) as $60 / $50 = 1.20, and the return from a down-move (R^d) as $42 / $50 = 0.84.

Figure 2.1: One-Period Binomial Tree

The call option will be in the money after an up-move or out of the money after a down-move. Its value at expiration after an up-move, , is $60 − $55 = $5. Its value after a down-move, , is zero.

Now, we can use no-arbitrage pricing to determine the initial value of the call option (c₀). We do this by creating a portfolio of the option and the underlying stock, such that the portfolio will have the same value following either an up-move () or a down move () in the stock. For our example, we would write the call option (that is, we grant someone else the option to buy the stock from us) and buy a number of shares of the stock that we will denote as h. We must solve for the h that results in = :

The initial value of our portfolio, V₀, is hS₀ − c₀ (we subtract c₀ because we are short the call option).

The portfolio value after an up-move, .

The portfolio value after a down-move, .

In our example, and solving for h, we get the following:

This result—the number of shares of the underlying we would buy for each call option we would write—is known as the hedge ratio for this option.

With , the value of the portfolio after one period is known with certainty. This means we can say that either must equal V₀ compounded at the risk-free rate for one period. In this example, = 0.278($42) = $11.68, or = 0.278($60) − $5 = $11.68. Let us assume the risk-free rate over one period is 3%. Then, V₀ = $11.68 / 1.03 = $11.34.

Now, we can solve for the value of the call option, c₀. Recall that V₀ = hS₀ − c₀, so c₀ = hS₀ − V₀. Here, c₀ = 0.278($50) − $11.34 = $2.56.

MODULE QUIZ 2.2

1. For an equity share with a constant growth rate of dividends, we can estimate its:

A. value as the next dividend discounted at the required rate of return.

B. growth rate as the sum of its required rate of return and its dividend yield.

C. required return as the sum of its constant growth rate and its dividend yield.

2. An investment of €5 million today is expected to produce a one-time payoff of €7 million three years from today. The annual return on this investment, assuming annual compounding, is closest to: A. 12%. B. 13%.

C. 14%.

KEY CONCEPTS

LOS 2.a

The value of a fixed-income instrument or an equity security is the present value of its future cash flows, discounted at the investor’s required rate of return:

The PV of a perpetual bond or a preferred stock , where r = required rate of return.

The PV of a common stock with a constant growth rate of dividends is:

LOS 2.b

By rearranging the present value relationship, we can calculate a security’s required rate of return based on its price and its future cash flows. The relationship between prices and required rates of return is inverse.

For an equity share with a constant rate of dividend growth, we can estimate the required rate of return as the dividend yield plus the assumed constant growth rate, or we can estimate the implied growth rate as the required rate of return minus the

dividend yield.

LOS 2.c

Using the cash low additivity principle, we can divide up a series of cash flows any way we like, and the present value of the pieces will equal the present value of the original series. This principle is the basis for the no-arbitrage condition, under which two sets of future cash flows that are identical must have the same present value.

ANSWER KEY FOR MODULE QUIZZES

Module Quiz 2.1

1. A 9 / 0.11 = $81.82 (LOS 2.a)

2. A Because the required yield is greater than the coupon rate, the present value of the bonds is less than their face value: N = 10; I/Y = 6; PMT = 0.05 × $10,000,00 =

$500,000; FV = $10,000,000; and CPT PV = −$9,263,991. (LOS 2.a)

Module Quiz 2.2

1. C Using the constant growth dividend discount model, we can estimate the required rate of return as . The estimated value of a share is all of its future dividends discounted at the required rate of return, which simplifies to if we assume a constant growth rate. We can estimate the constant

growth rate as the required rate of return minus the dividend yield. (LOS 2.b)

2. A

(LOS 2.b)

READING 3

STATISTICAL MEASURES OF ASSET RETURNS

DISPERSION

Video covering this content is available online.

MODULE 3.1: CENTRAL TENDENCY AND

LOS 3.a: Calculate, interpret, and evaluate measures of central tendency and location to address an investment problem.

Measures of Central Tendency

Measures of central tendency identify the center, or average, of a dataset. This central point can then be used to represent the typical, or expected, value in the dataset.

The arithmetic mean is the sum of the observation values divided by the number of observations. It is the most widely used measure of central tendency. An example of an arithmetic mean is a sample mean, which is the sum of all the values in a sample of a population, ΣX, divided by the number of observations in the sample, n. It is used to make inferences about the population mean. The sample mean is expressed as follows:

The median is the midpoint of a dataset, where the data are arranged in ascending or descending order. Half of the observations lie above the median, and half are below. To determine the median, arrange the data from the highest to lowest value, or lowest to highest value, and find the middle observation.

The median is important because the arithmetic mean can be affected by outliers, which are extremely large or small values. When this occurs, the median is a better measure of central tendency than the mean because it is not affected by extreme values that may actually be the result of errors in the data.

EXAMPLE: The median using an odd number of observations

What is the median return for ive portfolio managers with a 10-year annualized total returns record of 30%, 15%, 25%, 21%, and 23%?

Answer:

First, arrange the returns in descending order:

30%, 25%, 23%, 21%, 15%

Then, select the observation that has an equal number of observations above and below it—the one in the middle. For the given dataset, the third observation, 23%, is the median value.

EXAMPLE: The median using an even number of observations

Suppose we add a sixth manager to the previous example with a return of 28%. What is the median return?

Answer:

Arranging the returns in descending order gives us this:

30%, 28%, 25%, 23%, 21%, 15%

With an even number of observations, there is no single middle value. The median value, in this case, is the arithmetic mean of the two middle observations, 25% and 23%. Thus, the median return for the six managers is 24% = 0.5(25 + 23).

The mode is the value that occurs most frequently in a dataset. A dataset may have more than one mode, or even no mode. When a distribution has one value that appears most frequently, it is said to be unimodal. When a dataset has two or three values that occur most frequently, it is said to be bimodal or trimodal, respectively.

EXAMPLE: The mode

What is the mode of the following dataset? Dataset: [30%, 28%, 25%, 23%, 28%, 15%, 5%]

Answer:

The mode is 28% because it is the value appearing most frequently.

For continuous data, such as investment returns, we typically do not identify a single outcome as the mode. Instead, we divide the relevant range of outcomes into intervals, and we identify the modal interval as the one into which the largest number of observations fall.

Methods for Dealing With Outliers

In some cases, a researcher may decide that outliers should be excluded from a measure of central tendency. One technique for doing so is to use a trimmed mean. A trimmed mean excludes a stated percentage of the most extreme observations. A 1% trimmed mean, for example, would discard the lowest 0.5% and the highest 0.5% of the observations.

Another technique is to use a winsorized mean. Instead of discarding the highest and lowest observations, we substitute a value for them. To calculate a 90% winsorized mean, for example, we would determine the 5th and 95th percentile of the observations, substitute the 5th percentile for any values lower than that, substitute the 95th percentile for any values higher than that, and then calculate the mean of the revised dataset. Percentiles are measures of location, which we will address next.

Measures of Location

Quantile is the general term for a value at or below which a stated proportion of the data in a distribution lies. Examples of quantiles include the following:

Quartile. The distribution is divided into quarters.

Quintile. The distribution is divided into fifths.

Decile. The distribution is divided into tenths.

Percentile. The distribution is divided into hundredths (percentages).

Note that any quantile may be expressed as a percentile. For example, the third quartile partitions the distribution at a value such that three-fourths, or 75%, of the observations fall below that value. Thus, the third quartile is the 75th percentile. The difference between the third quartile and the first quartile (25th percentile) is known as the interquartile range.

To visualize a dataset based on quantiles, we can create a box and whisker plot, as shown in Figure 3.1. In a box and whisker plot, the box represents the central portion of the data, such as the interquartile range. The vertical line represents the entire range. In Figure 3.1, we can see that the largest observation is farther away from the center than is the smallest observation. This suggests that the data might include one or more outliers on the high side.

Figure 3.1: Box and Whisker Plot

investment problem.

Dispersion is defined as the variability around the central tendency. The common theme in finance and investments is the tradeoff between reward and variability, where the central tendency is the measure of the reward and dispersion is a measure of risk.

The range is a relatively simple measure of variability, but when used with other measures, it provides useful information. The range is the distance between the largest and the smallest value in the dataset: range = maximum value − minimum value

EXAMPLE: The range

What is the range for the 5-year annualized total returns for ive investment managers if the managers’ individual returns were 30%, 12%, 25%, 20%, and 23%?

Answer: range = 30 − 12 = 18%

The mean absolute deviation (MAD) is the average of the absolute values of the deviations of individual observations from the arithmetic mean:

The computation of the MAD uses the absolute values of each deviation from the mean because the sum of the actual deviations from the arithmetic mean is zero.

The sample variance, s², is the measure of dispersion that applies when we are evaluating a sample of n observations from a population. The sample variance is calculated using the following formula:

The denominator for s² is n − 1, one less than the sample size n. Based on the mathematical theory behind statistical procedures, the use of the entire number of sample observations, n, instead of n − 1 as the divisor in the computation of s², will systematically underestimate the population variance—particularly for small sample sizes. This systematic underestimation causes the sample variance to be a biased estimator of the population variance. Using n − 1 instead of n in the denominator, however, improves the statistical properties of s² as an estimator of the population variance.

Thus, the sample variance of 44.5(%²) can be interpreted to be an unbiased estimator of the population variance. Note that 44.5 “percent squared” is 0.00445, and you will get this value if you put the percentage returns in decimal form [e.g., (0.30 − 0.22)²].

A major problem with using variance is the dif iculty of interpreting it. The computed variance, unlike the mean, is in terms of squared units of measurement. How does one interpret squared percentages, squared dollars, or squared yen? This problem is mitigated through the use of the standard deviation. The units of standard deviation are the same as the units of the data (e.g., percentage return, dollars, euros). The sample standard deviation is the square root of the sample variance. The sample standard deviation, s, is calculated as follows:

EXAMPLE: Sample standard deviation

Compute the sample standard deviation based on the result of the preceding example.

Answer:

Because the sample variance for the preceding example was computed to be 44.5(%²), this is the sample standard deviation:

This means that on average, an individual return from the sample will deviate ±6.67% from the mean return of 22%. The sample standard deviation can be interpreted as an unbiased estimator of the population standard deviation.

A direct comparison between two or more measures of dispersion may be dif icult. For instance, suppose you are comparing the annual returns distribution for retail stocks with a mean of 8% and an annual returns distribution for a real estate portfolio with a mean of 16%. A direct comparison between the dispersion of the two distributions is not meaningful because of the relatively large difference in their means. To make a meaningful comparison, a relative measure of dispersion must be used. Relative dispersion is the amount of variability in a distribution around a reference point or benchmark. Relative dispersion is commonly measured with the coef icient of variation (CV), which is computed as follows:

CV measures the amount of dispersion in a distribution relative to the distribution’s mean. This is useful because it enables us to compare dispersion across different sets of data. In an investments setting, the CV is used to measure the risk (variability) per unit of expected return (mean). A lower CV is better.

EXAMPLE: Coef icient of variation

You have just been presented with a report that indicates that the mean monthly return on T-bills is 0.25% with a standard deviation of 0.36%, and the mean monthly return for the S&P 500 is 1.09% with a standard deviation of 7.30%. Your manager has asked you to compute the CV for these two investments and to interpret your results.

Answer:

These results indicate that there is less dispersion (risk) per unit of monthly return for T-bills than for the S&P 500 (1.44 vs. 6.70).

PROFESSOR’S NOTE

To remember the formula for CV, remember that the CV is a measure of variation, so standard deviation goes in the numerator. CV is variation per unit of return.

When we use variance or standard deviation as risk measures, we calculate risk based on outcomes both above and below the mean. In some situations, it may be more appropriate to consider only outcomes less than the mean (or some other specific value) in calculating a risk measure. In this case, we are measuring downside risk.

One measure of downside risk is target downside deviation, which is also known as target semideviation. Calculating target downside deviation is similar to calculating standard deviation, but in this case, we choose a target value against which to measure each outcome and only include deviations from the target value in our calculation if the outcomes are below that target.

The formula for target downside deviation is stated as follows:

Note that the denominator remains the sample size n minus one, even though we are not using all of the observations in the numerator.

EXAMPLE: Target downside deviation

Calculate the target downside deviation based on the data in the preceding examples, for a target return equal to the mean (22%), and for a target return of 24%.

1. A dataset has 100 observations. Which of the following measures of central tendency will be calculated using a denominator of 100?

A. The winsorized mean, but not the trimmed mean.

B. Both the trimmed mean and the winsorized mean.

C. Neither the trimmed mean nor the winsorized mean.

2. XYZ Corp. Annual Stock Returns

What is the sample standard deviation? A. 9.8%.

B. 72.4%.

C. 96.3%.

3. XYZ Corp. Annual Stock Returns

Assume an investor has a target return of 11% for XYZ stock. What is the stock’s target downside deviation? A. 9.39%.

B. 12.10%.

C. 14.80%.

MODULE 3.2: SKEWNESS, KURTOSIS, AND

CORRELATION Video covering_{this content is}

available online.

LOS 3.c: Interpret and evaluate measures of skewness and kurtosis to address an investment problem.

A distribution is symmetrical if it is shaped identically on both sides of its mean.

Distributional symmetry implies that intervals of losses and gains will exhibit the same frequency. For example, a symmetrical distribution with a mean return of zero will have losses in the −6% to −4% interval as frequently as it will have gains in the +4% to +6% interval. The extent to which a returns distribution is symmetrical is important because the degree of symmetry tells analysts if deviations from the mean are more likely to be positive or negative.

Skewness, or skew, refers to the extent to which a distribution is not symmetrical. Nonsymmetrical distributions may be either positively or negatively skewed and result from the occurrence of outliers in the dataset. Outliers are observations extraordinarily far from the mean, either above or below:

A positively skewed distribution is characterized by outliers greater than the mean (in the upper region, or right tail). A positively skewed distribution is said to be skewed right because of its relatively long upper (right) tail.

A negatively skewed distribution has a disproportionately large amount of outliers less than the mean that fall within its lower (left) tail. A negatively skewed distribution is said to be skewed left because of its long lower tail.

Skewness affects the location of the mean, median, and mode of a distribution:

For a symmetrical distribution, the mean, median, and mode are equal.

For a positively skewed, unimodal distribution, the mode is less than the median, which is less than the mean. The mean is affected by outliers; in a positively skewed distribution, there are large, positive outliers, which will tend to pull the mean upward, or more positive. An example of a positively skewed distribution is that of housing prices. Suppose you live in a neighborhood with 100 homes; 99 of them sell for $100,000, and one sells for $1,000,000. The median and the mode will be $100,000, but the mean will be $109,000. Hence, the mean has been pulled upward (to the right) by the existence of one home (outlier) in the neighborhood.

For a negatively skewed, unimodal distribution, the mean is less than the median, which is less than the mode. In this case, there are large, negative outliers that tend to pull the mean downward (to the left).

PROFESSOR’S NOTE

The key to remembering how measures of central tendency are affected by skewed data is to recognize that skew affects the mean more than the median and mode, and the mean is pulled in the direction of the skew. The relative location of the mean, median, and mode for different distribution shapes is shown in Figure 3.2. Note that the median is between the other two measures for positively or negatively skewed distributions.

Figure 3.2: Effect of Skewness on Mean, Median, and Mode

Sample skewness is equal to the sum of the cubed deviations from the mean divided by the cubed standard deviation and by the number of observations. Sample skewness for large samples is approximated as follows:

PROFESSOR’S NOTE

The LOS requires us to “interpret and evaluate” measures of skewness and kurtosis, but not to calculate them.

Note that the denominator is always positive, but that the numerator can be positive or negative depending on whether observations above the mean or observations below the mean tend to be farther from the mean, on average. When a distribution is right skewed, sample skewness is positive because the deviations above the mean are larger, on average. A left-skewed distribution has a negative sample skewness.

Dividing by standard deviation cubed standardizes the statistic and alflows interpretation of the skewness measure. If relative skewness is equal to zero, the data are not skewed. Positive levels of relative skewness imply a positively skewed distribution, whereas negative values of relative skewness imply a negatively skewed distribution. Values of sample skewness in excess of 0.5 in absolute value are considered signi icant.

Kurtosis is a measure of the degree to which a distribution is more or less peaked than a normal distribution. Leptokurtic describes a distribution that is more peaked than a normal distribution, whereas platykurtic refers to a distribution that is less peaked, or latter than a normal one. A distribution is mesokurtic if it has the same kurtosis as a normal distribution.

As indicated in Figure 3.3, a leptokurtic return distribution will have more returns clustered around the mean and more returns with large deviations from the mean (fatter tails). Relative to a normal distribution, a leptokurtic distribution will have a greater percentage of small deviations from the mean and a greater percentage of extremely large deviations from the mean. This means that there is a relatively greater probability of an observed value being either close to the mean or far from the mean. Regarding an investment returns distribution, a greater likelihood of a large deviation from the mean return is often perceived as an increase in risk.

Figure 3.3: Kurtosis

A distribution is said to exhibit excess kurtosis if it has either more or less kurtosis than the normal distribution. The computed kurtosis for all normal distributions is three. Statisticians, however, sometimes report excess kurtosis, which is defined as kurtosis minus three. Thus, a normal distribution has excess kurtosis equal to zero, a leptokurtic distribution has excess kurtosis greater than zero, and platykurtic distributions will have excess kurtosis less than zero.

Kurtosis is critical in a risk management setting. Most research about the distribution of securities returns has shown that returns are not normally distributed. Actual securities returns tend to exhibit both skewness and kurtosis. Skewness and kurtosis are critical concepts for risk management because when securities returns are modeled using an assumed normal distribution, the predictions from the models will not take into account the potential for extremely large, negative outcomes. In fact, most risk managers put very little emphasis on the mean and standard deviation of a distribution and focus more on the distribution of returns in the tails of the distribution—that is where the risk is. In general, greater excess kurtosis and more negative skew in returns distributions indicate increased risk.

Sample kurtosis for large samples is approximated using deviations raised to the fourth power:

problem.

Scatter plots are a method for displaying the relationship between two variables. With one variable on the vertical axis and the other on the horizontal axis, their paired observations can each be plotted as a single point. For example, in Panel A of Figure 3.4, the point farthest to the upper right shows that when one of the variables (on the horizontal axis) equaled 9.2, the other variable (on the vertical axis) equaled 8.5.

The scatter plot in Panel A is typical of two variables that have no clear relationship. Panel B shows two variables that have a strong linear relationship—that is, a high correlation coef icient.

A key advantage of creating scatter plots is that they can reveal nonlinear relationships, which are not described by the correlation coef icient. Panel C illustrates such a relationship. Although the correlation coef icient for these two variables is close to zero, their scatter plot shows clearly that they are related in a predictable way.

Figure 3.4: Scatter Plots

Covariance is a measure of how two variables move together. The calculation of the sample covariance is based on the following formula:

In practice, the covariance is dif icult to interpret. The value of covariance depends on the units of the variables. The covariance of daily price changes of two securities priced in yen will be much greater than their covariance if the securities are priced in dollars. Like the variance, the units of covariance are the square of the units used for the data.

Additionally, we cannot interpret the relative strength of the relationship between two variables. Knowing that the covariance of X and Y is 0.8756 tells us only that they tend to move together because the covariance is positive. A standardized measure of the linear relationship between two variables is called the correlation coef icient, or simply correlation. The correlation between two variables, X and Y, is calculated as

follows:

The properties of the correlation of two random variables, X and Y, are summarized here:

Correlation measures the strength of the linear relationship between two random variables.

Correlation has no units.

The correlation ranges from −1 to +1. That is, −1 ≤ ρ_XY ≤ +1.

If ρ_XY = 1.0, the random variables have perfect positive correlation. This means that a movement in one random variable results in a proportional positive movement in the other relative to its mean.

If ρ_XY = −1.0, the random variables have perfect negative correlation. This means that a movement in one random variable results in an exact opposite proportional movement in the other relative to its mean.

If ρ_XY = 0, there is no linear relationship between the variables, indicating that prediction of Y cannot be made on the basis of X using linear methods.

EXAMPLE: Correlation

The variance of returns on Stock A is 0.0028, the variance of returns on Stock B is 0.0124, and their covariance of returns is 0.0058. Calculate and interpret the correlation of the returns for Stocks A and B.

Answer:

First, it is necessary to convert the variances to standard deviations:

Now, the correlation between the returns of Stock A and Stock B can be computed as follows:

The fact that this value is close to +1 indicates that the linear relationship is not only positive, but also is very strong.

Care should be taken when drawing conclusions based on correlation. Causation is not implied just from signi icant correlation. Even if it were, which variable is causing change in the other is not revealed by correlation. It is more prudent to say that two variables exhibit positive (or negative) association, suggesting that the nature of any causal relationship is to be separately investigated or based on theory that can be subject to additional tests.

One question that can be investigated is the role of outliers (extreme values) in the correlation of two variables. If removing the outliers signi icantly reduces the calculated correlation, further inquiry is necessary into whether the outliers provide information or are caused by noise (randomness) in the data used.

Spurious correlation refers to correlation that is either the result of chance or present due to changes in both variables over time that is caused by their association with a third variable. For example, we can ind instances where two variables that are both related to the in lation rate exhibit signi icant correlation, but for which causation in either direction is not present.

In his book Spurious Correlation,¹ Tyler Vigen presents the following examples. The correlation between the age of each year’s Miss America and the number of ilms Nicolas Cage appeared in that year is 87%. This seems a bit random. The correlation between the U.S. spending on science, space, and technology and suicides by hanging, strangulation, and suffocation over the 1999–2009 period is 99.87%. Impressive correlation, but both variables increased in an approximately linear fashion over the period.

MODULE QUIZ 3.2

1. Which of the following is most accurate regarding a distribution of returns that has a mean greater than its median? A. It is positively skewed.

B. It is a symmetric distribution.

C. It has positive excess kurtosis.

2. A distribution of returns that has a greater percentage of small deviations from the mean and a greater percentage of extremely large deviations from the mean compared with a normal distribution: A. is positively skewed.

B. has positive excess kurtosis.

C. has negative excess kurtosis.

3. The correlation between two variables is +0.25. The most appropriate way to interpret this value is to say:

A. a scatter plot of the two variables is likely to show a strong linear relationship.

B. when one variable is above its mean, the other variable tends to be above its mean as well.

C. a change in one of the variables usually causes the other variable to change in the same direction.

KEY CONCEPTS

LOS 3.a

The arithmetic mean is the average of observations. The sample mean is the arithmetic mean of a sample:

The median is the midpoint of a dataset when the data are arranged from largest to smallest.

The mode of a dataset is the value that occurs most frequently. The modal interval is a measure of mode for continuous data.

A trimmed mean omits outliers, and a winsorized mean replaces outliers with given values, reducing the effect of outliers on the mean in both cases.

Quantile is the general term for a value at or below which lies a stated proportion of the data in a distribution. Examples of quantiles include the following:

Quartile. The distribution is divided into quarters.

Quintile. The distribution is divided into ifths.

Decile. The distribution is divided into tenths.

Percentile. The distribution is divided into hundredths (percentages).

LOS 3.b

The range is the difference between the largest and smallest values in a dataset.

Mean absolute deviation (MAD) is the average of the absolute values of the deviations from the arithmetic mean:

Variance is defined as the mean of the squared deviations from the arithmetic mean:

Standard deviation is the positive square root of the variance, and it is frequently used as a quantitative measure of risk.

The coef icient of variation (CV) for sample data, , is the ratio of the standard deviation of the sample to its mean.

Target downside deviation or semideviation is a measure of downside risk:

LOS 3.c

Skewness describes the degree to which a distribution is not symmetric about its mean. A right-skewed distribution has positive skewness. A left-skewed distribution has negative skewness.

For a positively skewed, unimodal distribution, the mean is greater than the median, which is greater than the mode. For a negatively skewed, unimodal distribution, the mean is less than the median, which is less than the mode.

Kurtosis measures the peakedness of a distribution and the probability of extreme outcomes (thickness of tails):

Excess kurtosis is measured relative to a normal distribution, which has a kurtosis of 3.

Positive values of excess kurtosis indicate a distribution that is leptokurtic (fat tails, more peaked), so the probability of extreme outcomes is greater than for a normal distribution.

Negative values of excess kurtosis indicate a platykurtic distribution (thin tails, less peaked).

LOS 3.d

Correlation is a standardized measure of association between two random variables. It ranges in value from −1 to +1 and is equal to

Scatter plots are useful for revealing nonlinear relationships that are not measured by correlation.

Correlation does not imply that changes in one variable cause changes in the other. Spurious correlation may result by chance, or from the relationships of two variables to a third variable.

ANSWER KEY FOR MODULE QUIZZES

Module Quiz 3.1

1. A The winsorized mean substitutes a value for some of the largest and smallest observations. The trimmed mean removes some of the largest and smallest observations. (LOS 3.a)

2. A The sample mean is [22% + 5% + −7% + 11% + 2% + 11%] / 6 = 7.3%. The sample standard deviation is the square root of the sample variance:

(LOS 3.b)

3. A Here are deviations from the target return:

(LOS 3.b)

Module Quiz 3.2

1. A A distribution with a mean greater than its median is positively skewed, or skewed to the right. The skew pulls the mean. Kurtosis deals with the overall shape of a distribution, not its skewness. (LOS 3.c)

2. B A distribution that has a greater percentage of small deviations from the mean and a greater percentage of extremely large deviations from the mean will be leptokurtic and will exhibit excess kurtosis (positive). The distribution will be more peaked and have fatter tails than a normal distribution. (LOS 3.c)

3. B A correlation of +0.25 indicates a positive linear relationship between the variables—one tends to be above its mean when the other is above its mean. The value 0.25 indicates that the linear relationship is not particularly strong. Correlation does not imply causation. (LOS 3.d)

¹ “Spurious Correlations,” Tyler Vigen, www.tylervigen.com

READING 4

PROBABILITY TREES AND CONDITIONAL EXPECTATIONS

MODULE 4.1: PROBABILITY MODELS, EXPECTED VALUES, AND BAYES’

Video covering

FORMULA this content is

available online.

LOS 4.a: Calculate expected values, variances, and standard deviations and demonstrate their application to investment problems.

The expected value of a random variable is the weighted average of the possible outcomes for the variable. The mathematical representation for the expected value of random variable X, that can take on any of the values from x₁ to x_n, is:

EXAMPLE: Expected earnings per share

The probability distribution of earnings per share (EPS) for Ron’s Stores is given in the following igure. Calculate the expected EPS.

EPS Probability Distribution

Answer:

The expected EPS is simply a weighted average of each possible EPS, where the weights are the probabilities of each possible outcome:

Variance and standard deviation measure the dispersion of a random variable around its expected value, sometimes referred to as the volatility of a random variable. Variance (from a probability model) can be calculated as the probability-weighted sum of the squared deviations from the mean (or expected value). The standard deviation is the positive square root of the variance. The following example illustrates the calculations for a probability model of possible returns.

Note that in a previous reading, we estimated the standard deviation of a distribution from sample data, rather than from a probability model of returns. For the sample standard deviation, we divided the sum of the squared deviations from the mean by n − 1, where n was the size of the sample. Here, we have no “n” because we have no observations; a probability model is forward-looking. We use the probability weights instead, as they describe the entire distribution of outcomes.

LOS 4.b: Formulate an investment problem as a probability tree and explain the use of conditional expectations in investment application.

You may wonder where the returns and probabilities used in calculating expected values come from. A general framework, called a probability tree, is used to show the probabilities of various outcomes. In Figure 4.1, we have shown estimates of EPS for four different events: (1) a good economy and relatively good results at the company, (2) a good economy and relatively poor results at the company, (3) a poor economy and relatively good results at the company, and (4) a poor economy and relatively poor results at the company. Using the rules of probability, we can calculate the probabilities of each of the four EPS outcomes shown in the boxes on the right-hand side of the

probability tree.

The expected EPS of $1.51 is simply calculated as follows:

Note that the probabilities of the four possible outcomes sum to 1.

Figure 4.1: A Probability Tree

Expected values or expected returns can be calculated using conditional probabilities. As the name implies, conditional expected values are contingent on the outcome of some other event. An analyst would use a conditional expected value to revise his expectations when new information arrives.

Consider the effect a tariff on steel imports might have on the returns of a domestic steel producer’s stock in the previous example. The stock’s conditional expected return, given that the government imposes the tariff, will be higher than the conditional expected return if the tariff is not imposed.

LOS 4.c: Calculate and interpret an updated probability in an investment setting using Bayes’ formula.

Bayes’ formula is used to update a given set of prior probabilities for a given event in response to the arrival of new information. The rule for updating prior probability of an event is as follows:

We can derive Bayes’ formula using the multiplication rule and noting that P(AB) = P(BA):

Because , and equals

, the joint probability of A and B divided by the unconditional probability of B.

The following example illustrates the use of Bayes’ formula. Note that A is outperform and A^C is underperform, P(BA) is (outperform + gains), P(A^CB) is (underperform + gains), and the unconditional probability P(B) is P(AB) + P(A^CB), by the total probability rule.

We sum the probability of stock gains in both states (outperform and underperform) to get 42% + 8% = 50%. Given that the stock has gains and using Bayes’ formula, the probability that the economy has outperformed is

MODULE QUIZ 4.1

1. Given the conditional probabilities in the following table and the unconditional probabilities P(Y = 1) = 0.3 and P(Y = 2) = 0.7, what is the expected value of X?

A. 5.0. B. 5.3.

C. 5.7.

2. An analyst believes that Davies Company has a 40% probability of earning more than $2 per share. She estimates that the probability that Davies Company’s credit rating will be upgraded is 70% if its earnings per share (EPS) are greater than $2, and 20% if its EPS are $2 or less. Given the information that Davies Company’s credit rating has been upgraded, what is the updated probability that its EPS are greater than $2? A. 50%. B. 60%.

C. 70%.

KEY CONCEPTS

LOS 4.a

The expected value of a random variable is the weighted average of its possible outcomes:

Variance can be calculated as the probability-weighted sum of the squared deviations from the mean or expected value. The standard deviation is the positive square root of the variance.

LOS 4.b

A probability tree shows the probabilities of two events and the conditional probabilities of two subsequent events:

Conditional expected values depend on the outcome of some other event. Forecasts of expected values for a stock’s return, earnings, and dividends can be re ined, using conditional expected values, when new information arrives that affects the expected outcome.

LOS 4.c

Bayes’ formula for updating probabilities based on the occurrence of an event O is as follows:

Equivalently, based on the following tree diagram,

ANSWER KEY FOR MODULE QUIZZES

Module Quiz 4.1

1. B

(LOS 4.a)

2. C This is an application of Bayes’ formula. As the following tree diagram shows, the updated probability that EPS are greater than $2 is

Search This Blog

Cricket is not Cricket without Shahid Afridi