Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Bayesian Models for Unit Discovery on a Very Low Resource Language (1802.06053v2)

Published 16 Feb 2018 in cs.CL

Abstract: Developing speech technologies for low-resource languages has become a very active research field over the last decade. Among others, Bayesian models have shown some promising results on artificial examples but still lack of in situ experiments. Our work applies state-of-the-art Bayesian models to unsupervised Acoustic Unit Discovery (AUD) in a real low-resource language scenario. We also show that Bayesian models can naturally integrate information from other resourceful languages by means of informative prior leading to more consistent discovered units. Finally, discovered acoustic units are used, either as the 1-best sequence or as a lattice, to perform word segmentation. Word segmentation results show that this Bayesian approach clearly outperforms a Segmental-DTW baseline on the same corpus.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Lucas Ondel (13 papers)
  2. Pierre Godard (10 papers)
  3. Laurent Besacier (76 papers)
  4. Elin Larsen (2 papers)
  5. Mark Hasegawa-Johnson (62 papers)
  6. Odette Scharenborg (34 papers)
  7. Emmanuel Dupoux (81 papers)
  8. Lukas Burget (164 papers)
  9. François Yvon (49 papers)
  10. Sanjeev Khudanpur (74 papers)
Citations (18)

Summary

We haven't generated a summary for this paper yet.