# Fichier PDF

Partagez, hébergez et archivez facilement vos documents au format PDF

## ch1 .pdf

Nom original: ch1.pdf

Ce document au format PDF 1.2 a été généré par / GNU Ghostscript 7.07, et a été envoyé sur fichier-pdf.fr le 08/05/2012 à 19:06, depuis l'adresse IP 78.209.x.x. La présente page de téléchargement du fichier a été vue 872 fois.
Taille du document: 58 Ko (4 pages).
Confidentialité: fichier public

### Aperçu du document

1. Introdunction
• Nonparametric Inference

1.1 Nonparametric Inference
Nonparametric Inference

• Foundation of Nonparametric Inference

• Model: f ∈ P = {Pθ : θ ∈ Θ}

• Confidence Sets

• Infinite dimensional inference

• Useful Inequalities

• Distribution-free theory: more common name for classical
nonparametric statistics such as ranking
Inference: curve estimation
• Estimating cdf F (x) = Pr(X ≤ x)
• Estimating functionals T (F ): Example: mean= T (F ) =
• Estimating density function f (x) = F 0 (x)

R

xdF (x)

• Regression estimation E (Y |X = x) given (X1 , Y1 ), . . . , (Xn , Yn )
1

2

Minimax Frame work
1.2 Foundation of Nonparametric Inference
• Asymptotic equivalence between nonparametric curve estimation
and infinite dimensional Normal mean estimation (Brown and
Low, 1996; Nussbaum, 1996)
• In the problem of estimating a finite dimensional Normal mean θ,
the estimator θb is optimal in the minimax sense:
e θ),
b θ) = inf sup R(θ,
R(θ,
θe

1. The functional class F: Example - Lp [a, b] space
A function f belongs to the Lp [a, b] class if,
Z b
(f (y))p dy &lt; ∞.
a

2. The risk function

θ

b θ) = E [L(θb − θ)] and L(t) is symmetric and convex.
where R(θ,

3

Minimax frame work for nonparametric inference consists of three
basic parts

R(fb, f ) ≡ E kfb − f k2L2 = E

4

Z

(fb(y) − f (y))2 dy

3. The optimal convergence rate
(a) fb is called minimax if,

1.3 Confidence Sets

sup R(fb, f ) = inf sup R(fe, f ).
fe f ∈F

f ∈F

(b) fb is called asymptotic minimax if,

sup R(fb, f ) − inf sup R(fe, f ) = o(1).
fe f ∈F

f ∈F

(c) fb is said to attains the optimal convergence rate if,

sup R(fb, f )/ inf sup R(fe, f ) = O(1).
fe f ∈F

f ∈F

Coverage of the Wald confidence interval
Brown et al (2001) showed the erratic behavior of the coverage
probability
qof the standard Wald confidence interval

pbn ± zα/2

p
bn (1−b
pn )
n

- the coverage depends on n AND p.

sample size
coverage

592

954

1279

0.792

0.852

0.875

Table 1: Confidence interval coverage for p = 0.005

Here, fe varies over all estimators of f .

5

6

Confidence Set Cn

Confidence balls/bands for f

• Cn is a finite sample 1-α confidence set if,
inf Pr(f ∈ Cn ) ≥ 1 − α

f ∈F

Cn = {f ∈ F : kf − fbn k ≤ sn }

∀n.

• Cn is a uniform asymptotic 1-α confidence set if,
lim inf inf Pr(f ∈ Cn ) ≥ 1 − α.
n

• Confidence ball for f

f ∈F

• Confidence band for f

inf Pr(l(x) ≤ f (x) ≤ u(x)∀x ∈ X ) ≥ 1 − α

f ∈F

• Cn is a pointwise asymptotic 1-α confidence set if, for every
f ∈ F,
lim inf Pr(f ∈ Cn ) ≥ 1 − α
n

7

8

Example: Let X1 , . . . , Xn ∼ Bernoulli(p). The Wald confidence
interval is a pointwise confidence interval for p. A finite sample
confidence interval is
s

2
1
log
pbn ±
2n
α

1.4 Useful Inequalities
Probability inequalities
• Markov and Chebyshev’s inequalities provide the polynomial
bounds.
• Hoeffding and Bernstein’s inequalities provide the exponential
bounds.
• Hoeffding and Berstein’s inequalities play an important role in
estimating the error rate of classification problem.

9

10

Hoeffding’s inequality

By Jensen’s inequality,

Let Y1 , . . . , Yn be independent observations such that
E (Yi ) = 0, ai ≤ Yi ≤ bi . Let &gt; 0. Then for any t &gt; 0,
!
n
n
X
Y
2
2
Yi ≥ ≤ e−t
et (bi −ai ) /8
Pr
i=1

Proof:
Pr

n
X
i=1

Yi ≥

i=1

!

etYi ≤

Yi − ai tbi bi − Yi tai
e +
e
bi − ai
bi − ai

Hence
E (etYi ) ≤ −

ai
bi
etbi +
etai
bi − ai
bi − ai

= reu(1−r) + (1 − r)e−ru
≤ Pr(et
≤ e−t

Pn

i=1

n
Y

i=1

11

Yi

≥ et )

E (etYi )

= e−ru (1 + reu − r)
i
and u = t(bi − ai ) (We are abusing notation here.
where r = − bia−a
i
it should r = ri and ui ...)

Let g(u) = −ru + log(1 − r + reu ). Then g(0) = g 0 (0) = 0. Also
g 00 (u) ≤ 1/4 for all u &gt; 0.

12

By a Taylor expansion,

Bernstein’s inequality

g(u) = g(o) + g 0 (0)u + g 00 (ξ)

u2
,
2

ξ ∈ (o, u)

t2 (bi − ai )2
u2
=
8
8

i=1

Therefore,
Pr

n
X
i=1

Yi ≥

!

Let X1 , . . . , Xn be independent, E(Xi ) = 0 and |Xi | ≤ M . Then

n
!

X
t2
1

Xi &gt; t ≤ 2 exp −
Pr

2 v + M t/3

where v ≥
≤ e−t

n
Y

i=1

exp

t2 (bi − ai )
8

2

e−t

13

REFERENCES
1. Brown, L. Cai, T. and DasGupta, A. (2001). Interval Estimation
for a Binomial Proportion. Statistical Science 16 101-133.
2. Brown, L. and Low, M. (1996). Asymptotic equivalence of
nonparametric regression and white noise. Annals of Statistics
24 2384-2398.
3. Efron, B. (1967) The two sample problem with censored data.
Proc. Fifth Berkeley Symp. Math. Statist. Probab. 4, 831-883.
4. Nussbaum, M. (1996). Asymptotic equivalence of density
estimation and Gaussian white noise. Annals of Statistics 24
2399-2430.

15

i=1

Var (Xi ).

• Bernstein’s inequality use variance information

E (etYi )

i=1
n
Y

Pn

• If M 2 &lt; (v + M t/3)−1 , Bernstein’s inequality give more tighter
bound than Hoeffding’s inequality does.

14