On difficulties with checking model assumptions, including the impossibility to find dependence in a simple Gaussian sample

Date: 17 DECEMBER 2020 from 16:00 to 18:00
Event location: Teams seminar - link available ASAP
Type: Statistics Seminars

Abstract

Many statistics teachers tell their students something like "In order to apply the t-test, we have to assume that the data are i.i.d. normally distributed, and therefore these model assumptions need to be checked before applying the t-test." This statement is highly problematic in several respects. There is no good reason to believe that any real data truly are drawn i.i.d. normally. Furthermore, quite relevant aspects of these model assumptions cannot be checked.

I will highlight some problems with model assumption checking. Firstly, I will discuss tests for checking model assumptions and their effect on the subsequent inference. Testing model assumptions is a widespread and often recommended practice, however passing a model by a model checking test will automatically invalidate it ("misspecification paradox"), and much literature investigating the performance of specific procedures that run model-based tests conditionally on passing a model misspecification test comments very critically on this practice. I will present a new result that shows model checking in a more positive light, but I will also argue that helpful model checking should address a different problem from "making sure that the model holds", namely distinguishing between harmful and harmless violations of model assumptions.

Furthermore I will show that data generated from a normal distribution with a constant nonzero correlation between any two observations (which strongly invalidates inference based on the mean) cannot be distinguished from i.i.d. normal data, and will discuss the concept of "identifiability from data" of certain parameters and model assumptions, which is different from classical statistical identifiability.