A CNN-transformer hybrid approach for decoding visual neural activity into text.
Journal:
Computer methods and programs in biomedicine
Published Date:
Dec 14, 2021
Abstract
BACKGROUND AND OBJECTIVE: Most studies used neural activities evoked by linguistic stimuli such as phrases or sentences to decode the language structure. However, compared to linguistic stimuli, it is more common for the human brain to perceive the outside world through non-linguistic stimuli such as natural images, so only relying on linguistic stimuli cannot fully understand the information perceived by the human brain. To address this, an end-to-end mapping model between visual neural activities evoked by non-linguistic stimuli and visual contents is demanded.