10.6 Chapter summary | Statistics

10.5 Five number summary

End of chapter exercises

10.6 Chapter summary (EMA7D)

Data refer to the pieces of information that have been observed and recorded, from an experiment or a survey.
Quantitative data are data that can be written as numbers. Quantitative data can be discrete or continuous.
Qualitative data are data that cannot be written as numbers. There are two common types of qualitative data: categorical and anecdotal data.
The mean is the sum of a set of values divided by the number of values in the set.
\begin{align*} \overline{x} & = \frac{1}{n}\sum _{i=1}^{n}{x}_{i} \\ & = \frac{{x}_{1} + {x}_{2} + \cdots + {x}_{n}}{n} \end{align*}
The median of a data set is the value in the central position, when the data set has been arranged from the lowest to the highest value. If there are an odd number of data, the median will be equal to one of the values in the data set. If there are an even number of data, the median will lie half way between two values in the data set.
The mode of a data set is the value that occurs most often in the set.
An outlier is a value in the data set that is not typical of the rest of the set. It is usually a value that is much greater or much less than all the other values in the data set.
Continuous quantitative data can be grouped by dividing the full range of values into a few sub-ranges. By assigning each continuous value to the sub-range or class within which it falls, the data set changes from continuous to discrete.
Dispersion is a general term for different statistics that describe how values are distributed around the centre.
The range of a data set is the difference between the maximum and minimum values in the set.
The \(p^{\text{th}}\) percentile is the value, \(v\), that divides a data set into two parts, such that \(p\%\) of the values in the data set are less than \(v\) and \(100 - p\%\) of the values are greater than \(v\). The general formula for finding the \(p^{\text{th}}\) percentile in an ordered data set of \(n\) values is
\[r = \frac{p}{100}\left(n - 1\right) + 1\]
The quartiles are the three data values that divide an ordered data set into four groups, where each group contains an equal number of data values. The lower quartile is denoted \(Q1\), the median is \(Q2\) and the upper quartile is \(Q3\).
The interquartile range is a measure of dispersion, which is calculated by subtracting the lower (first) quartile from the upper (third) quartile. This gives the range of the middle half of the data set.
The semi interquartile range is half of the interquartile range.
The five number summary consists of the minimum value, the maximum value and the three quartiles (\(Q1\), \(Q2\) and \(Q3\)).
The box-and-whisker plot is a graphical representation of the five number summary.

10.5 Five number summary

Table of Contents

End of chapter exercises

Test yourself now

10.6 Chapter summary (EMA7D)