microsoft/wavlm-base

WavLM-Base

Microsoft’s WavLM
The base model pretrained on 16kHz sampled speech audio. When using the model, make sure that your speech input is also sampled at 16kHz.
Note: This model does not have a tokenizer as it was pretrained on audio alone. In order to use this model speech recognition, a tokenizer should be created and the model should be fine-tuned on labeled text data. Check out this blog for more in-detail explanation of how to fine-tune the model.
The model was pre-trained on 960h of Librispeech.
Paper: WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Authors: Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei
Abstract
Self-supervised learning (SSL) achieves great success in speech recognition, while limited exploration has been attempted for other speech processing tasks. As speech signal contains multi-faceted information including speaker identity, paralinguistics, spoken content, etc., learning universal representations for all speech tasks is challenging. In this paper, we propose a new pre-trained model, WavLM, to solve full-stack downstream speech tasks. WavLM is built based on the HuBERT framework, with an emphasis on both spoken content modeling and speaker identity preservation. We first equip the Transformer structure with gated relative position bias to improve its capability on recognition tasks. For better speaker discrimination, we propose an utterance mixing training strategy, where additional overlapped utterances are created unsupervisely and incorporated during model training. Lastly, we scale up the training dataset from 60k hours to 94k hours. WavLM Large achieves state-of-the-art performance on the SUPERB benchmark, and brings significant improvements for various speech processing tasks on their representative benchmarks.
The original model can be found under https://github.com/microsoft/unilm/tree/master/wavlm.

Usage

This is an English pre-trained speech model that has to be fine-tuned on a downstream task like speech recognition or audio classification before it can be
used in inference. The model was pre-trained in English and should therefore perform well only in English. The model has been shown to work well on the SUPERB benchmark.
Note: The model was pre-trained on phonemes rather than characters. This means that one should make sure that the input text is converted to a sequence
of phonemes before fine-tuning.

Speech Recognition

To fine-tune the model for speech recognition, see the official speech recognition example.

Speech Classification

To fine-tune the model for speech classification, see the official audio classification example.

Speaker Verification

TODO

Speaker Diarization

TODO

Contribution

The model was contributed by cywang and patrickvonplaten.

License

The official license can be found here

前往AI网址导航

搜狗高速浏览器怎么将高速模式切换为兼容模式搜狗高速浏览器将高速模式切换为兼容模式的方法

搜狗高速浏览器是很多小伙伴电脑上安装的浏览器，被多数小伙伴使用，这款浏览器不管在操作界面上还是在浏览速度上都是非常棒的，该浏览器中提供了高速浏览模式和兼容模式，下方是关于如何使用搜狗高速浏览器切换兼容模式的具体操作方法，如果你需要的情况下可以看看方法教程，希望对大家有所帮助。

搜狗浏览器怎么记住登陆账号密码搜狗浏览器记住登陆账号密码教程

搜狗高速浏览器是我们常常需要使用到的一款浏览器，很多小伙伴在操作电脑的时候，需要登录自己需要的网站，于是就会使用搜狗高速浏览器进行登录，当你通过搜狗高速浏览器登录网站的时候，在登录网站的页面上就会出现一个记住密码的提示，那么我们可以点击记住密码，之后该网站的密码账号就会自动的保存在搜狗高速浏览器中，之后我们下次登录该网站时，就不用手动输入账号密码了，直接登录即可，下方是关于如何使用搜狗高速浏览器自动保存账号密码的具体操作方法，如果你需要的情况下可以看看方法教程，希望对大家有所帮助。

王者荣耀段位等级表2024是什么王者荣耀段位等级顺序介绍

王者荣耀段位等级表2024是什么?在王者荣耀中，一共分为7个段位，分别是倔强青铜、持续白银、荣耀黄金、尊贵铂金、永恒钻石、至尊星耀、最强王者、荣耀王者，王者荣耀中玩家可以通过排位不断提升自己的段位等级，那么具体段位等级表还有玩家不清楚，下面本站下载为各位小伙伴整理出王者荣耀段位等级顺序介绍。

microsoft/wavlm-base

WavLM-Base

Usage

Speech Recognition

Speech Classification

Speaker Verification

Speaker Diarization

Contribution

License

雨课堂怎么退出课堂雨课堂退出课堂教程

搜狗高速浏览器怎么将高速模式切换为兼容模式搜狗高速浏览器将高速模式切换为兼容模式的方法

搜狗浏览器怎么记住登陆账号密码搜狗浏览器记住登陆账号密码教程

原神4.7版本前瞻直播什么时候开始 4.7版本前瞻直播开始时间介绍

对马岛之魂pc配置要求是什么 pc最低配置要求介绍

王者荣耀段位等级表2024是什么王者荣耀段位等级顺序介绍

microsoft/wavlm-base

WavLM-Base

Usage

Speech Recognition

Speech Classification

Speaker Verification

Speaker Diarization

Contribution

License

雨课堂怎么退出课堂 雨课堂退出课堂教程

搜狗高速浏览器怎么将高速模式切换为兼容模式 搜狗高速浏览器将高速模式切换为兼容模式的方法

搜狗浏览器怎么记住登陆账号密码 搜狗浏览器记住登陆账号密码教程

原神4.7版本前瞻直播什么时候开始 4.7版本前瞻直播开始时间介绍

对马岛之魂pc配置要求是什么 pc最低配置要求介绍

王者荣耀段位等级表2024是什么 王者荣耀段位等级顺序介绍

雨课堂怎么退出课堂雨课堂退出课堂教程

搜狗高速浏览器怎么将高速模式切换为兼容模式搜狗高速浏览器将高速模式切换为兼容模式的方法

搜狗浏览器怎么记住登陆账号密码搜狗浏览器记住登陆账号密码教程

王者荣耀段位等级表2024是什么王者荣耀段位等级顺序介绍