# Correlation Coefficient

28th October, 2011 - Posted by in Two-Asset Binary Options

#### An Introduction to the Correlation Coefficient, Rho

When options concern two assets, S1 and S2, the relationship between the prices of the two assets needs to be considered. This relationship is the measure known as the correlation coefficient rho.

The correlation coefficient ranges from -1 to +1 inclusive and measures the linear dependence between the two assets prices. The most commonly used method of establishing the correlation coefficient is the Pearson Product-Moment Correlation Coefficient (PPMCC).

The following section provides an overview explaining what the Rho actually means, how to calculate it, plus in what circumstances it works! As ever assumptions are made in order to create a mathematical model with which to work with, and, as ever, various elements of the model are assumed constant when of course they never can be.

#### Rho

The standard representation of the PPMCC is r and ρ, r for a sample correlation coefficient and ρ for the population correlation coefficient. Since r already denotes interest rates the section on two-asset binary options uses ρ to represent the sample PPMCC.

Rho provides a number that measures by how much a scatter of points deviates from a line of best fit through those points. If the deviation is zero, i.e. the points all lie on the straight line of best fit, then Rho equals ±1; in such a case, if the gradient of the line of best fit is positive then Rho would be 1, while if the gradient were negative then Rho would equal -1. If the deviation is such that there appears no relationship between the two variables then Rho = 0.

Ρ ≠ the gradient of the line of best fit

Figure 1 – Correlation Coefficient Scatter Diagram

Figure 1 illustrates three scattergrams with the first and third showing relatively high negative and positive correlation. β is the gradient of the line of best fit.

The third scattergram illustrates a Rho of 0.99 so over the last 1% the points still have to move considerably to get all the points on the line of best fit. This feature is analysed further later on in this chapter.

Figure 2 – Correlation Coefficient Scatter Diagram

Figure 2 outlines the fact that β and ρ are fundamentally measuring different variables. The middle scattergram reflects this best with near vertical and horizontal lines of best fit as ρ→0.

Whether the line of best fit is based on  S2 = α + βS1 or S1 = α + βS2 the Rho would be the same.

#### Least Squares Regression Analysis

The above scattergrams consist of a series of fifty prices of two assets with a solid, straight line superimposed. This line is the simple format:

S2 = α + βS1

where:

S1 = Price of Asset 1

S2 = Price of Asset

α = Constant

β = Regression of Coefficient (Gradient)

The simple tenet is that if one knows S1, α and β then one can establish the price of S2. The concept involves calculating α and β so that the aggregate squared vertical differences between the nodes on the scatter graph and the regression line is minimised.

The regression coefficient, β, delineates the gradient of the regression line and has the formula:

$\beta&space;=&space;\frac{COV(x,y)}{VAR(x)}=&space;\frac{\sum_{i=1}^{n}(x_{i}-\bar{x})(y_{i}-\bar{y})}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}$

where $\bar{x}$ and $\bar{y}$ are the mean of x and y respectively.

Example:

The prices of asset 1 & 2 are displayed for 5 time periods:

1
2
3
4
5
Asset S1150.00154.72159.48153.11156.45
Asset S2200.00196.49196.02194.45191.77

The mean of the prices of S1 and S are 154.752 and 195.746 respectively.

$COV(S_{1},&space;S_{2})=(150.00-154.752)(200.00-195.746)+(154.72-154.752)(196.49-195.746)+.....+(156.45-154.752)(191.77-195.746)&space;=&space;-23.5666$

$VAR(S_{1})=(150.00-154.752)^{2}+(154.72-154.752)^{2}+...+(156.45-154.752)^{2}&space;=&space;50.51588$

$\beta&space;=\frac{-23.5666}{40.5188}&space;=&space;-0.46652$

$\alpha&space;=&space;195.746+(-0.46652)(154.752)S_{1}$

Hence the line of regression is:

$S_{2}&space;=&space;267.9406+(-0.46652)S_{1}$

#### Evaluating Rho

The formula for Rho is:

$\rho&space;=&space;\frac{COV(x,&space;y)}{\sigma&space;_{1}\sigma&space;_{2}}&space;=&space;\frac{\sum_{i=1}^{n}(x_{i}-\bar{x})(y_{i}-\bar{y})}{\sqrt{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}\sqrt{\sum_{i=1}^{n}(y_{i}-\bar{y})^{2}}}$

where σ1  and σ2 are the standard deviation of x and y respectively.

Example:

Following on from the above example, since the standard deviation is just the square root of the variance the only variable further required is:

$VAR(S_{2})&space;=&space;(200.00-195.746)^{2}+(196.49-195.746)^{2}+....+(191.77-195.746)^{2}&space;=&space;36.21332$

$\rho&space;=&space;\frac{-23.576}{\sqrt{50.51588}\sqrt{36.21332}}&space;=&space;-0.551$

N.B. The equation for β and ρ are similar, the difference being in the denominator. This difference ensures that the correlation coefficient, ρ, is not the gradient of the regression line, β, passing through the scatter diagram.

#### Assumptions

The following assumptions provide the basis to Pearson’s rho:

a)      The variables must be roughly normally distributed.

In general most financial models propound the lognormal distribution as the underlying distribution and the analysis of two-asset binary options in this section is compatible with that approach. Therefore in evaluating the appropriateness of Pearson’s Rho, there is no more reason to assert that the distributions are anything other than normal than any other form of financial modelling.

b)      A linear relationship needs to exist between the two assets.

This is, on the face of it, constitutes a ‘chicken or egg’ argument based on Pearson’s Rho possibly being assigned the role of arbiter as to whether a linear relationship exists or not. Figure 1 showed three examples of rho, two of which visibly illustrate a linear relationship, while the middle example visibly illustrates a non-linear relationship: so hardly a surprise that the linear relationships possess rhos of -0.85 and 0.99 and the non-linear relationship example has a rho of 0.03. The problem arises where Rho is significant yet the scattergram clearly depicts a non-linear relationship. This state is shown in Figure 3 where, incidentally, ρ and β have opposite signs.

Figure 3 – Correlation Coefficient Scatter Diagram

All three scattergrams provide distributions with non-linear data yet Rho for the first two distributions is as high as the absolute values 0.58 and 0.62.When using Rho to evaluate the asset price relationship then this ability to create meaningfully high levels of Rho for non-linear data is a problem. It requires a visual assertion to establish whether there is a linear relationship between the asset prices prior to evaluating Rho.

c)      There must not be a preponderance of outliers.

Outliers can have a significant effect on the rho, in general markedly reducing it. But the outliers reflect the tails of the distribution. Black swans exist, outliers exist; removing them distorts the data and presents a fantasy world where October 19th 1987 (the stock markets around the world collapsed) never happened, ditto October 2008, a world where oil didn’t lurch 6\$ in a day, etc.. Certainly, outliers, the majority of the time, will underestimate the day-to-day correlation of two assets, but conversely they can also engender a discipline and counter the complacent that may exist should they be removed.

Figure 4 – Correlation Coefficient Scatter Diagram

Figure 4 illustrates the effect that outliers can have on Rho. The first scattergram shows ten points with two outliers circled and the points have a rho of just 0.26. The second scattergram has eliminated the two outliers and rho has now jumped to 0.76.

d)     The data suffers from homoscedasticity.

Finally, possibly the most over-optimistic assumption is that homoscedasticity exists.

Figure 5 – Homoscedasticity & Heteroscedasticity

Homoscedasticity describes the distribution such that over the length of the linear relationship the variance of the distribution remains constant. Heteroscedasticity is the situation where the variance changes.Clearly when trading options, if the variance was a constant then the options market-maker would have all vega risk taken from their options portfolio, which in itself would generate increasingly competitive quotes from the market-maker. Obviously in the real world the volatility of an asset, or pair of assets, is in itself volatile, so yet again this assumption has to be recognised as hardly reflective of the real marketplace.

Summary

Rho is not the most robust of measures when assessing the relationship between the price movements of two assets. Linearity, outliers and heteroscedasticity can all have distorting effects on the representation of rho. Nevertheless, if these features of a distribution are not show-stoppers then the rho number can be used with confidence when pricing and risk managing two-asset binary options.