Share


What is Covariance?


By gobrain

Jun 14th, 2024

Covariance is a statistical measure that tells you how two variables change together. It basically assesses whether large values of one variable tend to correspond with large values of the other variable, or if small values tend to correspond with each other.

Here are some key points about covariance:

  • It measures the direction of the linear relationship between two variables.
  • A positive covariance indicates that the variables tend to move in the same direction (i.e., high values of one with high values of the other, and low values of one with low values of the other).
  • A negative covariance indicates that the variables tend to move in opposite directions (i.e., high values of one with low values of the other, and vice versa).
  • Covariance is measured in units that are the product of the units of the two variables it's comparing.

How To Calculate Covariance

  • Using the population formula: This is ideal if you have data for the entire population you're interested in.
Cov(x, y) = Σ (xi - μx) * (yi - μy) / N
  • Cov(x, y) is the covariance between variables x and y

  • Σ (sigma) represents the sum over all data points

  • xi is the value of the i-th data point for variable x

  • μx is the mean of variable x (average of all x values)

  • yi is the value of the i-th data point for variable y

  • μy is the mean of variable y (average of all y values)

  • N is the total number of data points in the population

  • Using the sample formula: This is more commonly used because obtaining data for the entire population can be impractical. It's used when you only have data for a sample of the population.

The formula is similar to the population formula, but with a slight adjustment to account for the sample size:

Cov(x, y) = Σ (xi - x̄) * (yi - ȳ) / (n - 1)
  • Cov(x, y) is the covariance between variables x and y
  • Σ (sigma) represents the sum over all data points in the sample
  • xi is the value of the i-th data point for variable x
  • x̄ is the mean of variable x in the sample (average of all x values in the sample)
  • yi is the value of the i-th data point for variable y
  • ȳ is the mean of variable y in the sample (average of all y values in the sample)
  • n is the number of data points in the sample

The idea is to find the average product of the deviations from the mean for each variable. A positive value indicates a positive covariance, a negative value indicates negative covariance, and zero indicates no linear relationship.

Covariance vs Correlation

While covariance tells you the direction of the relationship, it doesn't tell you the strength of that relationship. That's where the correlation coefficient comes in, which is a scaled version of covariance.

Similarities:

  • Measure linear relationship: They assess how two variables tend to change together in a linear fashion.
  • Zero indicates no relationship: If the covariance or correlation is zero, it signifies no linear association between the variables.

Differences

  • Strength vs. Direction:

    • Covariance indicates the direction of the relationship (positive or negative) but not the strength.
    • Positive covariance: Variables tend to move in the same direction (high values together, low values together).
    • Negative covariance: Variables tend to move in opposite directions (high value of one with low value of the other).
    • Correlation: Measures both the direction and strength of the linear relationship. It's a scaled version of covariance, ranging from -1 to +1.
    • +1: Perfect positive linear relationship (as one increases, the other increases proportionally).
    • -1: Perfect negative linear relationship (as one increases, the other decreases proportionally). Values between -1 and +1 indicate the strength of the linear relationship (closer to absolute values of 1 indicates stronger relationship).
  • Units:

    • Covariance: Measured in units that are the product of the units of the two variables being compared.
    • Correlation: Unitless (scaled between -1 and +1), making it easier to compare relationships between different data sets.

Conclusion

Finding relationship between two variables is importan in statistical analysis. In this article, we have covered covariance and its differences between correlation.

Thank you for reading.