A robust Hotelling test statistic for one sample case in high dimensional data


Bulut H.

COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, cilt.52, sa.13, ss.4590-4604, 2023 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 52 Sayı: 13
  • Basım Tarihi: 2023
  • Doi Numarası: 10.1080/03610926.2021.1996606
  • Dergi Adı: COMMUNICATIONS IN STATISTICS-THEORY AND METHODS
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, Business Source Elite, Business Source Premier, CAB Abstracts, Compendex, Veterinary Science Database, zbMATH, Civil Engineering Abstracts
  • Sayfa Sayıları: ss.4590-4604
  • Anahtar Kelimeler: Robust Hotelling test statistic, one-sample Hotelling test, high-dimensional data, minimum regularized covariance estimators, shrinkage-based diagonal Hotelling test, MVTests, MEAN VECTOR, FEWER OBSERVATIONS, SIZE
  • Ondokuz Mayıs Üniversitesi Adresli: Evet

Özet

The Hotelling T-2 statistic is used to test the hypothesis about the location parameter of multivariate Gaussian distribution, and it is significantly sensitive to outliers. Also, we cannot calculate it when the sample size is less than the number of variables because this statistic needs the inverse of the covariance matrix, and the sample covariance matrix is singular in high dimensional data. Although a new approach, based on shrinkage estimation, was proposed to solve this singularity problem, this estimator is still sensitive to outliers. On the other hand, a robust one sample Hotelling T-2 statistic was proposed by using the minimum covariance determinant (MCD) estimates instead of classical ones. Since the MCD estimates cannot be calculated when n < p, this statistic cannot be used in high-dimensional data. This study proposes to use the minimum regularized covariance determinant (MRCD) estimator instead of classical or MCD. The MRCD estimator is a robust location and scatter estimator, which can be calculated in high-dimensional data. We obtain the asymptotic distribution of the proposed test statistic using Monte Carlo simulations and examine the power and robustness properties of the test statistic with simulated datasets. As a result, we show that the approximate distribution of the test statistic is proper, and the proposed robust test statistic can be used to test the hypothesis about the location parameter of contaminated high dimensional data. Finally, we construct an R function in the MVTests package to perform our proposed test statistic.