## Lecture 7 Part 1 .pdf

Nom original: Lecture 7_Part 1.pdf
Titre: Lecture7
Auteur: Giuliana Cortese

Ce document au format PDF 1.3 a été généré par pdftopdf filter / Mac OS X 10.7.5 Quartz PDFContext, et a été envoyé sur fichier-pdf.fr le 12/10/2015 à 18:38, depuis l'adresse IP 93.34.x.x. La présente page de téléchargement du fichier a été vue 254 fois.
Taille du document: 5.2 Mo (10 pages).
Confidentialité: fichier public

### Aperçu du document

Lecture 7
How certain are we?
Sampling and Estimation
(the average case)

Overview
• We take a simple random sample
– from a well-defined population

• The sample mean is “probably/usually” “close” to
the population mean
• By “probably/usually” we mean “in 95% of all
samples”
• By “close” we mean “within ~2 standard errors”

Populations and samples

Overview

• Defining a population
• Taking a simple random sample
• How similar is the sample to the population?

The sampling problem
• We care about population.
• We can only afford to look at samples.
• How do we know our sample is relevant?

Population and Samples
• Population (of size N)
– All the cases (individuals, objects, or groups) in
which the researcher is interested.

• Sample (of size n)
– A relatively small subset from a population.

Examples:
– The US population: ~300 million people
– The General Social Survey (GSS):
• a sample from the US population: about 3,000 people.
– Survey about quality of Universities in Italy:
• a sample from the Universities population: about 30
Universities.

N1

N1

n1

N3

n3

Stratified Sampling

N2
n2

N3=n3

Cluster Sampling

N2=n2

N4

n4

N4

Area 1

Area 4

N1

n1

N3
n3

Two-stage Sampling

N2

Two stage sampling
Area 2

N4

Area 3

Area 5

Simple random sampling:
Definition

• Define the population
– label every person

• Sample the labels

– randomly
– so everyone in the population has the same
probability of being sampled

Simple random sampling
A good example

• Define the population: this class=52 people

– label every person: Give everyone a playing card

• Sample the labels: Draw 5 cards from a second deck

– randomly: after shuffling
– so everyone in the population has the same probability of
being sampled: Everyone’s card appears once in the deck.

Sampling:
• Define the population: This class=52 people
– label every person: Put everyone in a seat

• Sample the labels: Choose 5 people from first row.
– Problems

– only first row has any chance

• not random
• not everyone has the same probability

• Sample the labels: Choose 5 volunteers.
• not everyone has the same probability

– Problem
– favors extroverts

• Sample the labels: Choose 5 people without a system
• Is it random?

– Problem

Myth: Simple random samples
are “representative”
• Actually can be quite different from population
• But
– we can usually place bounds on the difference

Estimation

• One of the major applications of statistics is
estimating population parameters from
sample statistics

– Point estimates: the estimate is a single value or
point.
– Confidence intervals: estimate of an interval
that contains the unknown population parameter
with a certain level of “confidence”.

Example

A poll may seek to estimate the proportion of adult residents of
a city that support a proposition to build a new sports stadium.
Out of a random sample of 200 people, 106 say they support
this proposition.

– Unknown population parameter (p): proportion of adults in the
city in favour of this proposition.
– Point estimate: =106/200 = 0.53 of the people supported the
).
– Confidence interval: the pollster uses a certain method 100
times. It results that 95% of the times this method contains the
unknown parameter p, whereas the method fails 5% of the times.
So he arrives at a 95% confidence interval: 0.46 &lt; p &lt; 0.60, and
he concludes that a proportion between 0.46 and 0.60 of the
population supports the proposal (media usually reports: 53%
favor the proposition with a margin of error of 7%).

Estimation Process

Point Estimate
A point estimate is a statistic taken from a sample and is
used to estimate a population parameter.
• best single-number guess at a parameter
is a point estimate of
– \$28.834K is a point estimate for the average salary of new sociology BAs

• rarely right, often close: a point estimate is only as good as the sample
it represents. If other random samples are taken from the population,
the point estimates derived from those samples are likely to vary.

Confidence Intervals

• Because of variation in sample statistics, estimating a
population parameter with a confidence interval is often
preferable to using a point estimate.

• Range (interval) of guesses at a parameter

• Width reflects uncertainty: vague but “usually” right

is a confidence interval for

– (\$27.384K, \$30.284K) is a 95% confidence interval for the
average salary of new sociology BAs.

Example

10 people are randomly sampled from the population of women
in Houston Texas between the ages of 21 and 35 years
– The mean height of the sample is computed.
– The sample mean is a sample statistic and the population
parameter is the population mean.
– This sample mean would not equal the population mean
exactly (the mean height of all women in Houston). It might
be somewhat lower or higher.

Population parameters and Statistics
─ Population measures are called Parameters.
─ Population parameters use Greek letters
µ = Greek m
π = Greek p
σ = Greek s
─ Sample measures are called Statistics.

Characteristics of
estimators/statistics
• Bias of a statistic : 'bias' refers to the error
commited by the estimator, whether it tends to
either overestimate or underestimate the
population parameter.
• Sampling variability of a statistic:
– It refers to how much the statistic varies from sample
to sample and from the population parameter.
– It is usually measured by its standard error: the smaller
the standard error, the less the sampling variability.

How much means from different samples
differ from each other

How close your particular sample mean
is likely to be to the population mean

This information is directly available
from a sampling distribution.

Sampling distribution

of a sample mean

Population
Variable
population mean
σY=1.62

US households
Y (# of children)
µY=1.75

Population

population standard deviation

Sampling error for a mean
CHILDS
1
0
2
2
1.25

A simple random sample of n=4 cases.

sample mean

The sample mean

µY = 1.75

Repeated sampling

Each Y represents the number of children in a household

Sampling distribution

2
4

0

samples of n = 4 cases

CHILDS
1 0 2
2 4 0
… … …
0 4 0

Sample mean Sampling error
1.25
-0.5
2.50
0.75

1.00
-0.75
1.75
0.00
0.81
0.81

. Suppose 5000 different researchers took simple random

Sample
1
2

5000
Mean
Std. Err.

Sampling variation
CHILDS
2
4
0
4
2.50

A different simple random sample of n = 4 cases.

sample mean

mean
—and different from the mean of the first sample
The variation from one sample to another is called sampling
variation.

Sampling distribution
of the sample mean
• The distribution of sample means over all possible samples

Mean of the sampling distribution
(of the sample mean)
1.75.

In other words, the mean of the sampling errors is zero.

Y

Variation of the sampling distribution
(of the sample mean)

— here

The variation of the sample means is less than the variation of the
original variable.

Variation of the sampling distribution
(of the sample mean)
• The most common measure of how much sample
means differ from each other is the standard
deviation of the sampling distribution of the mean.
• This standard deviation is called the standard error
of the mean (standard deviation of the sample
means)
• It is exactly

Variation of the sampling distribution
(of the sample mean)
As the sample size increases, the standard
deviation of the sample means becomes
smaller and smaller, because the population
standard deviation is being divided by larger
and larger values of the square root of n.

sample mean

Y

CHILDS
1
0
2
2
1.25

Across samples…

CHILDS
2
4
0
4
2.50

# of samples: infinite

n=4

n=64

1.75
.2025

1.75
.81

here 1.62 / 41/2 = 1.62 / 2 = 0.81

here 1.75

but just n=4 adults per sample

Population of samples

sample mean

Y

(Std. dev. of sample means)

As sample size (n) grows…
n=1

…standard error shrinks!
…shape of sampling distribution gets closer to “normal”!

n=16
1.75
.405

Summary
Sampling distribution of the mean
Mean µM
• randomly draw out all possible samples of the given size
n from the population
• compute the sample means and and determine the mean
of sample means
• this mean, denoted by µM, is the mean of the population
from which the values were sampled (that is µM equals
the population mean µ).

Summary
Sampling distribution of the mean
• take all possible samples of a given size n from a
population
• compute the sample means, and determine the
standard deviation of sample means.
• σM is computed by using the population standard
deviation divided by the square root of the sample
size.