Overview Formulas Statistics

Mean

  • Definition: The mean is the average of a set of numbers. It is calculated by summing all the values and dividing by the number of values.
  • Formula: $$\bar{x} = \frac{\sum x_i}{n}$$, where $$x_i$$ are the data points and $$n$$ is the number of data points[1][3].

Median

  • Definition: The median is the middle value in a data set when the numbers are arranged in order. If there is an even number of observations, the median is the average of the two middle numbers.
  • Calculation: Arrange data in increasing order and find the middle value[3].

Range

  • Definition: The range is the difference between the highest and lowest values in a data set.
  • Formula: $$\text{Range} = \text{Maximum value} – \text{Minimum value}$$[2][4].

Variance

  • Definition: Variance measures how far each number in the set is from the mean and thus from every other number in the set.
  • Formula for Population Variance: $$\sigma^2 = \frac{\sum (x_i – \mu)^2}{N}$$
  • Formula for Sample Variance: $$s^2 = \frac{\sum (x_i – \bar{x})^2}{n-1}$$, where $$x_i$$ are data points, $$\mu$$ is the population mean, and $$N$$ or $$n$$ is the number of data points[1][3].

Standard Deviation

  • Definition: Standard deviation is a measure of the amount of variation or dispersion in a set of values. It is the square root of variance.
  • Formula for Population Standard Deviation: $$\sigma = \sqrt{\sigma^2}$$
  • Formula for Sample Standard Deviation: $$s = \sqrt{s^2}$$[1][2][3].

Correlation Pearson’s r

  • Definition: Pearson’s r measures the linear correlation between two variables, giving a value between -1 and 1.
  • Formula: $$r = \frac{\sum (x_i – \bar{x})(y_i – \bar{y})}{\sqrt{\sum (x_i – \bar{x})^2} \cdot \sqrt{\sum (y_i – \bar{y})^2}}$$, where $$x_i$$ and $$y_i$$ are individual sample points, and $$\bar{x}$$ and $$\bar{y}$$ are their respective means.

Correlation Spearman’s rho

  • Definition: Spearman’s rho assesses how well an arbitrary monotonic function describes the relationship between two variables without assuming a linear relationship.
  • Formula: Based on ranking each variable, it calculates using Pearson’s formula on ranks.

t-test (Independent and Dependent)

  • Independent t-test: Compares means from two different groups to see if they are statistically different from each other.
  • Formula: $$t = \frac{\bar{x}_1 – \bar{x}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}$$
  • Dependent t-test (paired): Compares means from the same group at different times (e.g., before and after treatment).
  • Formula: $$t = \frac{\bar{d}}{s_d/\sqrt{n}}$$, where $$\bar{d}$$ is the mean difference between paired observations[3].

Chi-Square Test

  • Definition: The chi-square test assesses how expectations compare to actual observed data or tests for independence between categorical variables.
  • Formula for Goodness-of-Fit Test: $$\chi^2 = \sum \frac{(O_i – E_i)^2}{E_i}$$, where $$O_i$$ are observed frequencies, and $$E_i$$ are expected frequencies.

These statistical tools are fundamental for analyzing data sets, allowing researchers to summarize data, assess relationships, and test hypotheses.

Citations:
[1] https://www.geeksforgeeks.org/mathematics-mean-variance-and-standard-deviation/
[2] https://www.sciencing.com/median-mode-range-standard-deviation-4599485/
[3] https://www.csueastbay.edu/scaa/files/docs/student-handouts/marija-stanojcic-mean-median-mode-variance-standard-deviation.pdf
[4] https://www.youtube.com/watch?v=179ce7ZzFA8
[5] https://www.youtube.com/watch?v=mk8tOD0t8M0
[6] https://eng.libretexts.org/Bookshelves/Industrial_and_Systems_Engineering/Chemical_Process_Dynamics_and_Controls_(Woolf)/13:_Statistics_and_Probability_Background/13.01:_Basic_statistics-_mean_median_average_standard_deviation_z-scores_and_p-value
[7] https://www.ituc-africa.org/IMG/pdf/ITUC-Af_P4_Wks_Nbo_April_2010_Doc_8.pdf
[8] https://www.calculator.net/mean-median-mode-range-calculator.html