class: center, middle, inverse, title-slide # Random variability ## What If: Chapter 10 ### Elena Dudukina ### 2021-03-18 --- # 10.1 Identification versus estimation .pull-left[ - In the previous chapters we ignored random variablity and focused on indentification problems - Estimand: the probablity of the event in the super population - Estimator: a rule/method that produces the numerical value of the estimand - Estimate: a numerical value of the estimand for a given sample ] .pull-right[ <blockquote class="twitter-tweet"><p lang="en" dir="ltr">For most of my career, I've used the language of 'associations' and done <a href="https://twitter.com/hashtag/schrodingersinference?src=hash&ref_src=twsrc%5Etfw">#schrodingersinference</a><br><br>Now I report each step in the estimand > estimator > estimate process. That means I am generally reporting an 'estimated total causal effect'. Even if the estimate is poor! <a href="https://t.co/kVjEFNEFNS">pic.twitter.com/kVjEFNEFNS</a></p>— Peter Tennant (@PWGTennant) <a href="https://twitter.com/PWGTennant/status/1164084443742691328?ref_src=twsrc%5Etfw">August 21, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script> ] --- # 10.1 Identification versus estimation - An estimator is consistent for an estimand if the estimates get closer to the to the patameter as the sample size increases - If the sample size size is small consistent estimators may produce estimates that are far from the super population value (estimand) --- # 10.1 Identification versus estimation - 95% confidence interval is **calibrated** if it contains the estimand in more than 95% of random samples - 95% confidence interval is **conservative** if it contains the estimand in more than 95% of samples - 95% confidence interval is **anticonservative** if it does not contain the estimand in more than 95% of samples - 95% confidence interval is **valid** if for any value of the true parameter the confodence interval is either calibrated or conservative (covers the true parameter at least 95% of the time) - 95% confidence interval is **frequentist** --- # Estimation of causal effects - Due to random variability, we cannot expect that exchangeability will always precisely hold in the sample - "Because of the presence of random sampling variability, we do not expect that exchangeability will exactly hold in our sample" --- # 10.3 The myth of the super population - Scenario 1: Infinite super population (source or target population) - Convenient fictions --> simpler statistical methods & ease of generalization - Scenario 2: Each sampled individual has a non-deterministic (stochastic) probability of a potential outcome --- # 10.4 The conditionality “principle” - Random non-exchangeability - Random observed A-L associations are ancillary statistic for the causal risk difference - "The conditionality principle states that inference on a parameter should be performed conditional on ancillary statistics" --- # 10.5 The curse of dimensionality - 100 pre-treatment binary variables produce `\(2^{100}\)` strata --- # References Hernán MA, Robins JM (2020). Causal Inference: What If. Boca Raton: Chapman & Hall/CRC (v. 31jan21)