Brain-to-Text Benchmark '24: Lessons Learned
Journal:
arXiv
Published Date:
Dec 23, 2024
Abstract
Speech brain-computer interfaces aim to decipher what a person is trying to
say from neural activity alone, restoring communication to people with
paralysis who have lost the ability to speak intelligibly. The Brain-to-Text
Benchmark '24 and associated competition was created to foster the advancement
of decoding algorithms that convert neural activity to text. Here, we summarize
the lessons learned from the competition ending on June 1, 2024 (the top 4
entrants also presented their experiences in a recorded webinar). The largest
improvements in accuracy were achieved using an ensembling approach, where the
output of multiple independent decoders was merged using a fine-tuned large
language model (an approach used by all 3 top entrants). Performance gains were
also found by improving how the baseline recurrent neural network (RNN) model
was trained, including by optimizing learning rate scheduling and by using a
diphone training objective. Improving upon the model architecture itself proved
more difficult, however, with attempts to use deep state space models or
transformers not yet appearing to offer a benefit over the RNN baseline. The
benchmark will remain open indefinitely to support further work towards
increasing the accuracy of brain-to-text algorithms.