Encoder/Decoder Transformer Model ASR

Encoder-decoder attention extraction in ASR transcribe

Is your feature request related to a problem? Please describe. There are many studies showing that the encoder-decoder can be used for auxiliary tasks (e.g. with DTW to get word-level timestamps, or ...

Searchenginejournal.com

Google Announces A New Era For Voice Search

Google announced a major update to voice search that uses AI to make it faster and more accurate, calling it a new era. Google announced an update to its voice search, which changes how voice search ...

GitHub

Instruction improvements on Whisper ASR README

python run_whisper.py \ --encoder exported_model_directory/encoder_model.onnx \ --decoder exported_model_directory/decoder_model.onnx \ --model-type whisper-base ...

blockchain

NVIDIA Riva TTS Enhances Multilingual Speech and Voice Cloning

NVIDIA introduces Riva TTS models enhancing multilingual speech synthesis and voice cloning, with applications in AI agents, digital humans, and more, featuring advanced architecture and preference ...

IEEE

Evaluation of Encoder-Only Transformer for Multi-Step Traffic Flow Prediction

Abstract: Traffic flow prediction is critical for Intelligent Transportation Systems to alleviate congestion and optimize traffic management. The existing basic Encoder-Decoder Transformer model for ...

C&EN

Pairwise Attention: Leveraging Mass Differences to Enhance De Novo Sequencing of Mass Spectra

Creative Commons (CC): This is a Creative Commons license. Attribution (BY): Credit must be given to the creator. A fundamental challenge in mass spectrometry-based proteomics is determining which ...

VentureBeat

Nvidia launches fully open source transcription AI model Parakeet-TDT-0.6B-V2 on Hugging Face

Nvidia has become one of the most valuable companies in the world in recent years thanks to the stock market noticing how much demand there is for graphics processing units (GPUs), the powerful chips ...

marktechpost

Transformer Meets Diffusion: How the Transfusion Architecture Empowers GPT-4o’s Creativity

OpenAI’s GPT-4o represents a new milestone in multimodal AI: a single model capable of generating fluent text and high-quality images in the same output sequence. Unlike previous systems (e.g., ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results