隨住人工智能技術嘅急速發展,語言學家開始關注粵語喺數碼領域嘅存亡問題。目前大部分語音辨識同機器翻譯系統都以普通話為主,粵語嘅支援明顯不足。
ceoi4 zyu6 jan4 gung1 zi3 nang4 gei6 seot6 ge3 gap1 cuk1 faat3 zin2, jyu5 jin4 hok6 gaa1 hoi1 ci2 gwaan1 zyu3 jyut6 jyu5 hai2 sou3 maa5 ling4 wik6 ge3 cyun4 mong4 man6 tai4.
As AI technology develops rapidly, linguists have begun to pay attention to the survival of Cantonese in the digital domain. Currently most speech recognition and machine translation systems prioritise Mandarin, with notably insufficient Cantonese support.
香港中文大學語言學教授李敏華指出:「粵語有超過八千萬使用者,但係喺數碼基礎設施方面嘅投入遠遠落後於其他主要語言。如果唔盡快建立完善嘅粵語語料庫同訓練數據,粵語將會喺AI時代被邊緣化。」
hoeng1 gong2 zung1 man4 daai6 hok6 jyu5 jin4 hok6 gaau3 sau6 lei5 man5 waa4 zi2 ceot1...
CUHK linguistics professor Li Man-wah points out: Cantonese has over 80 million speakers, but investment in digital infrastructure lags far behind other major languages. Without quickly building comprehensive Cantonese corpora and training data, Cantonese will be marginalised in the AI era.
近年有唔少民間團體同科技公司開始投入粵語AI研發。其中一個名為「粵語AI聯盟」嘅組織已經收集咗超過一萬小時嘅粵語語音數據,用嚟訓練語音辨識模型。
gan6 nin4 jau5 m4 siu2 man4 gaan1 tyun4 tai2 tung4 fo1 gei6 gung1 si1 hoi1 ci2 tau4 jap6 jyut6 jyu5 AI jin4 faat3.
In recent years, many civil organisations and tech companies have begun investing in Cantonese AI R&D. One group called the Cantonese AI Alliance has collected over 10,000 hours of Cantonese speech data for training recognition models.
有學者認為,保護粵語唔單止係技術問題,更加係文化認同嘅議題。「語言承載住一個民族嘅集體記憶同文化基因。當一種語言喺數碼世界消失,佢所承載嘅文化亦都會跟住褪色。」
jau5 hok6 ze2 jing6 wai4, bou2 wu6 jyut6 jyu5 m4 daan1 zi2 hai6 gei6 seot6 man6 tai4, gang3 gaa1 hai6 man4 faa3 jing6 tung4 ge3 ji5 tai4.
Scholars argue that protecting Cantonese is not merely a technical issue but a matter of cultural identity. Language carries a people's collective memory and cultural DNA. When a language disappears from the digital world, the culture it carries fades too.