在如今这个快节奏的时代,语音识别技术已成为我们日常生活不可或缺的一部分。随着智能手机和智能语音助手的普及,越来越多的人开始探索语音识别这一技术。然而学习手机语音识别并不是一件容易的事情。如果你想掌握这一技能,你需要坚持不懈地学习,通过不断的尝试和实践来提高自己的技能。在本文中我们将为你详细介绍学习手机语音识别的方法,帮助你快速掌握这一技术。
语音识别学习方法详解
开源项目1.kaldi
链接:https://github.com/kaldi-asr/kaldi
简介:最流行的语音识别工具包,不过在神经网络时代有些落后,目前作者Daniel Povey 在小米在major update,期待下一代kaldi。
2.espnet
链接:https://github.com/espnet/espnet
简介:espnet是基于pytorch的端到端语音工具包,不仅包括ASR还包括了TTS。espnet使用kaldi进行特征提取等,espnet2不再依赖kaldi,不过espnet2下的recipe还不是很多。espnet和kaldi一样有很多egs,并且包括了各种主流的端到端方法,比如CTC,RNN-T, Transformer等。
3.wenet
链接:https://github.com/mobvoi/wenet
简介:出门问问开源的ASR toolkit,实现了 Unified Two Pass (U2) 流式和非流式端到端模型,基于pytorch,可以同时部署在服务器和端上。有aishell-1的例子,中文语音识别可以学习借鉴 。
4.wav2letter
链接:https://github.com/facebookresearch/wav2letter
简介:C++实现的语音识别框架,运行效率高
5.speech-transformer
链接:https://github.com/kaituoxu/Speech-Transformer
简介:pytorch实现的transformer中文语音识别(aishell)
6.ARM-KWS
链接:https://github.com/ARM-software/ML-KWS-for-MCU
简介:arm开源的在他们mcu上的kws,英文识别,输出的单元整个英文WORD
7.kws
链接:https://github.com/robin1001/kws_on_android
简介:西工大张彬彬开源的中文唤醒,中文语音唤醒可以借鉴,采用fbank + dnn + fst的方案。
课程:1.王赟的知乎live:语音识别技术的前世今生
链接:https://www.zhihu.com/lives/843853238078963712
2.语音识别:从入门到精通
链接:https://www.shenlanxueyuan.com/course/380
呀,这里有一个我们的课程(不好意思的自荐了!)
那就来点一下看看呗!
↓↓↓↓↓↓↓↓↓

1.Kaldi 语音识别实战
简介:本书以目前流行的开源语音识别工具 Kaldi 为切入点,深入浅出地讲解了语音识别前沿的技术及它们的实践应用,适合语音技术相关研究人员及互联网从业人员学习参考。
2.解析深度学习:语音识别实践
豆瓣评分:8.0
简介:《解析深度学习:语音识别实践》是首部介绍语音识别中深度学习技术细节的专著。全书首先概要介绍了传统语音识别理论和经典的深度神经网络核心算法。接着全面而深入地介绍了深度学习在语音识别中的应用,包括"深度神经网络-隐马尔可夫混合模型"的训练和优化,特征表示学习、模型融合、自适应,以及以循环神经网络为代表的若干先进深度学习技术。本书适合有一定机器学习或语音识别基础的学生、研究者或从业者阅读
3.Speech and Language Processing, 2nd Edition
豆瓣评分:9.6
简介:书将深入的语言分析与健壮的统计方法结合起来,新版更是涉及了大量的现代技术,将自然语言处理、计算语言学以及语音识别等内容融合在一本书中,把各种技术相互联系起来,让读者了解怎样才能最佳地利用每种技术,怎样才能将各种技术结合起来使用。本书写作风格引人入胜,深入技术细节而又不让人感觉枯燥,
2021论文:1.Sequence-to-Sequence Piano Transcription with Transformers, Curtis Hawthorne et al.
链接:https://arxiv.org/pdf/2107.09142.pdf
2.PeriodNet: A non-autoregressive waveform generation model with a structure separating periodic and aperiodic components, Yukiya Hono et al.
链接:https://arxiv.org/pdf/2102.07786
3.N-Singer: A Non-Autoregressive Korean Singing Voice Synthesis System for Pronunciation Enhancement, Gyeong-Hoon Lee et al.
链接:https://arxiv.org/pdf/2106.15205.pdf
4.MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis, Jaesung Tae et al.
链接:https://arxiv.org/pdf/2106.07886.pdf
5.MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training, Mingliang Zeng et al.
链接:https://arxiv.org/pdf/2106.05630.pdf
6.Residual Energy-Based Models for End-to-End Speech Recognition(2021), Qiujia Li et al.
链接:https://arxiv.org/pdf/2103.14152v1.pdf
7.Multi-Task Learning for End-to-End ASR Word and Utterance Confidence with Deletion Prediction, David Qiu et al.
链接:https://arxiv.org/pdf/2104.12870.pdf
8.AdaSpeech: Adaptive Text to Speech for Custom Voice, Mingjian Chen et al.
链接:https://arxiv.org/pdf/2103.00993
9.A Survey on Neural Speech Synthesis, Xu Tan et al. [pdf]
链接:https://arxiv.org/pdf/2106.15561.pdf
10.DiffWave: A Versatile Diffusion Model for Audio Synthesis, Zhifeng Kong et al.
链接:https://arxiv.org/pdf/2009.09761
11.Diff-TTS: A Denoising Diffusion Model for Text-to-Speech, Myeonghun Jeong et al.
链接:https://arxiv.org/pdf/2104.01409.pdf
12.Fre-GAN: Adversarial Frequency-consistent Audio Synthesis, Ji-Hoon Kim et al.
链接:https://arxiv.org/pdf/2106.02297.pdf
13.Full-band LPCNet: A real-time neural vocoder for 48 kHz audio with a CPU, Keisuke Matsubara et al.
链接:https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9455356
14.Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech, Vadim Popov et al.
链接:https://arxiv.org/pdf/2105.06337.pdf
15.High-fidelity and low-latency universal neural vocoder based on multiband WaveRNN with data-driven linear prediction for discrete waveform modeling, Patrick Lumban Tobing et al.
链接:https://arxiv.org/pdf/2105.09856.pdf
16.Hierarchical Prosody Modeling for Non-Autoregressive Speech Synthesis, Chung-Ming Chien et al.
链接:https://arxiv.org/pdf/2011.06465.pdf
17.ItoˆTTS and ItoˆWave: Linear Stochastic Differential Equation Is All You Need For Audio Generation, Shoule Wu et al.
链接:https://arxiv.org/pdf/2105.07583.pdf
18.PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS, Ye Jia et al.
链接:https://arxiv.org/pdf/2103.15060.pdf
19.Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling, Isaac Elias et al.
链接:https://arxiv.org/pdf/2103.14574
20.Transformer-based Acoustic Modeling for Streaming Speech Synthesis(2021), Chunyang Wu et al.
链接:https://research.fb.com/wp-content/uploads/2021/06/Transformer-based-Acoustic-Modeling-for-Streaming-Speech-Synthesis.pdf
21.Triple M: A Practical Neural Text-to-speech System With Multi-guidance Attention And Multi-band Multi-time Lpcnet, Shilun Lin et al.
链接:https://arxiv.org/pdf/2102.00247
22.TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction, Stanislav Beliaev et al.
链接:https://arxiv.org/pdf/2104.08189
23.Towards Multi-Scale Style Control for Expressive Speech Synthesis(2021), Xiang Li et al.
链接:https://arxiv.org/pdf/2104.03521
24.An Improved StarGAN for Emotional Voice Conversion: Enhancing Voice Quality and Data Augmentation, Xiangheng He et al.
链接:https://arxiv.org/pdf/2107.08361
25.crank: An Open-Source Software for Nonparallel Voice Conversion Based on Vector-Quantized Variational Autoencoder, Kazuhiro Kobayashi et al.
链接:https://arxiv.org/pdf/2103.02858
26.CVC: Contrastive Learning for Non-parallel Voice Conversion Tingle Li et al.
链接:https://arxiv.org/pdf/2011.00782.pdf
27.NoiseVC: Towards High Quality Zero-Shot Voice Conversion, Shijun Wang et al.
链接:https://arxiv.org/pdf/2104.06074.pdf
28.On Prosody Modeling for ASR+TTS based Voice Conversion, Wen-Chin Huang et al.
链接:https://arxiv.org/pdf/2107.09477.pdf
29.StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion, Yinghao Aaron Li et al.
链接:https://arxiv.org/pdf/2107.10394.pdf
30.ixSpeech: Data Augmentation for Low-resource Automatic Speech Recognition, Linghui Meng et al.
链接:https://arxiv.org/pdf/2102.12664
*部分资料整理自CSDN以及Github,侵删。
想要学习语音识别的伙伴看看这里哦!
↓↓↓↓↓↓↓↓↓

学习手机语音识别需要耐心和恒心,需要多练习、多尝试、多思考。通过合理的学习方法和技巧,我们可以更好地掌握手机语音识别技术,提高我们的生活和工作效率。相信只要我们不断学习、不断进步,就能够在这个领域中获得更多的成就和发展。