Confronting quasi-separation in logistic mixed effects for linguistic data: A Bayesian approach


Mixed effects regression models are widely used by language researchers. However, these regressions are implemented with an algorithm which may not converge on a solution. While convergence issues in linear mixed effects models can often be addressed with careful experiment design and model building, logistic mixed effects models introduce the possibility of separation or quasi-separation, which can cause problems for model estimation that result in convergence errors or in unreasonable model estimates. These problems cannot be solved by experiment or model design. In this paper, we discuss (quasi-)separation with the language researcher in mind, explaining what it is, how it causes problems for model estimation, and why it can be expected in linguistic datasets. Using real linguistic datasets, we then show how Bayesian models can be used to overcome convergence issues introduced by quasi-separation, whereas frequentist approaches fail. On the basis of these demonstrations, we advocate for the adoption of Bayesian models as a practical solution to dealing with convergence issues when modeling binary linguistic data.

In Press at Journal of Quantitative Linguistics