Cartoonist guide to statistics — Part 1
The name says it all. This is an excellent book to read if you would like to brush up on statistics and along with some humor. The field I am in(Web Analytics) needs a good amount of stats so that we can derive good insights and act upon it.
So what does statistics mean to you?
1. Data collection and Analysis
2. Applying Probability
3. Modeling based on information from #1 & #2. In other words, deriving inferences.
In this post I will be focusing on some basic concepts that you need to get familiar with “Data Collection and analysis”. In the next 2 parts I will be focusing on probability and modeling.
Different ways to calculate the center of a data set
Mean — It is the sum of all data values divided by the number of data sets.
X = (10+20+30+40+60)/5 = 32
Median — Order the data from smallest to largest. If the number of data points is odd then median is the middle data point. If the number of data points is even then the median is the average of the middle 2 numbers.
X = 3,5,7,6,8 = 7
X = 3,5,7,7 = (5+7)/2 = 6
Measures of Spread
There are 2 ways to measure the spread of the data from the center.
1. Interquartile Range — Here the data is divided into groups and then then we see how far apart the extreme groups are
2. Standard Deviation — This is the average distance from the mean
SD = sqrt(s²) = sqrt(1/(n-1)sigma(xi-x~)²)
“Z scores” are defined as the distance from mean per standard deviation.
These are some of the basic concepts that you would need to understand to categorize your data and enable them to apply probability and then do some data modeling.