Simulation of Natural Language from Brain Activity Using Wearable EEG and Deep Learning
Journal:
medRxiv
Published Date:
Jan 15, 2026
Abstract
Severe motor impairments such as amyotrophic lateral sclerosis and locked-in syndrome lead to partial or complete loss of speech, severely restricting communication as voluntary motor control deteriorates. In this study, we developed a non-invasive, wearable EEG-based brain-computer interface that reconstructs coherent natural language sentences by decoding linguistic components directly from neural activity. EEG data were collected from 20 healthy volunteers using two bilateral temporal electrodes (T7/T8) sampled at 128 Hz, with signals segmented into 4-second windows (512 samples) and normalized to a [-1, 1] range. Participants silently generated 6 pronouns, 80 verbs, and 217 object nouns across six languages under controlled cognitive tasks. Separate 1D convolutional neural networks were trained to classify pronouns and to regress verbs and nouns into 300-dimensional FastText semantic embeddings using cosine similarity loss, with data splits of 70/30 or 80/20 for training and validation and augmentation applied exclusively to training data. Pronoun decoding achieved over 60% training accuracy but showed reduced generalization (validation AUC {approx} 0.55), reflecting the context-dependent nature of indexical language, while verb and noun models demonstrated stable convergence over 200-300 epochs and successfully mapped EEG features into semantic space. A Siamese network integrated decoded components into a shared embedding space to ensure semantic coherence prior to sentence generation, enabling the production of grammatically correct and contextually appropriate sentences aligned with experimental conditions such as hunger and thirst. Validtaion cohort of 10 individuals was used to predict their thoughts related to hunger, while fasting. 80% accuracy was achieved. These findings demonstrate that bilateral temporal EEG signals alone are sufficient to recover structured linguistic intent when combined with similarity-based semantic modeling, advancing a scalable, non-invasive communication framework for clinical translation.