Code Monkey home page Code Monkey logo

langsegment's Issues

[Bug] list indices must be integers or slices, not str

错误日志

File "c:\Users\14404\Project\GPT-SoVITS-beta0306fix2\runtime\lib\site-packages\LangSegment\LangSegment.py", line 676, in getTexts
return LangSegment.getTexts(text)
File "c:\Users\14404\Project\GPT-SoVITS-beta0306fix2\runtime\lib\site-packages\LangSegment\LangSegment.py", line 572, in getTexts
text = LangSegment._parse_symbols(text)
File "c:\Users\14404\Project\GPT-SoVITS-beta0306fix2\runtime\lib\site-packages\LangSegment\LangSegment.py", line 508, in _parse_symbols
cur_word = LangSegment._process_tags([] , text , True)
File "c:\Users\14404\Project\GPT-SoVITS-beta0306fix2\runtime\lib\site-packages\LangSegment\LangSegment.py", line 461, in _process_tags
LangSegment._parse_language(words , text)
File "c:\Users\14404\Project\GPT-SoVITS-beta0306fix2\runtime\lib\site-packages\LangSegment\LangSegment.py", line 330, in _parse_language
LangSegment._addwords(words,language,text,score)
File "c:\Users\14404\Project\GPT-SoVITS-beta0306fix2\runtime\lib\site-packages\LangSegment\LangSegment.py", line 250, in _addwords
else:LangSegment._saveData(words,language,text,score)
File "c:\Users\14404\Project\GPT-SoVITS-beta0306fix2\runtime\lib\site-packages\LangSegment\LangSegment.py", line 228, in _saveData
LangSegment._statistics(data["lang"],data["text"])
File "c:\Users\14404\Project\GPT-SoVITS-beta0306fix2\runtime\lib\site-packages\LangSegment\LangSegment.py", line 191, in _statistics
if not "|" in language:lang_count[language] += int(len(text)*2) if language == "zh" else len(text)
TypeError: list indices must be integers or slices, not str

TypeError: list indices must be integers or slices, not str

Traceback (most recent call last):
  File "e:\AItools\GPT-SoVITS-Inference\runtime\lib\site-packages\werkzeug\serving.py", line 362, in run_wsgi
    execute(self.server.app)
  File "e:\AItools\GPT-SoVITS-Inference\runtime\lib\site-packages\werkzeug\serving.py", line 325, in execute
    for data in application_iter:
  File "e:\AItools\GPT-SoVITS-Inference\runtime\lib\site-packages\werkzeug\wsgi.py", line 256, in __next__
    return self._next()
  File "e:\AItools\GPT-SoVITS-Inference\runtime\lib\site-packages\werkzeug\wrappers\response.py", line 32, in _iter_encoded
    for item in iterable:
  File "e:\AItools\GPT-SoVITS-Inference\runtime\lib\site-packages\flask\helpers.py", line 113, in generator
    yield from gen
  File "E:\AItools\GPT-SoVITS-Inference\Inference\src\inference_core.py", line 140, in get_streaming_tts_wav
    for sr, chunk in chunks:
  File "E:\AItools\GPT-SoVITS-Inference\Inference\src\inference_core.py", line 112, in inference
    yield next(tts_pipline.run(inputs))
  File "E:\AItools\GPT-SoVITS-Inference\GPT_SoVITS\TTS_infer_pack\TTS.py", line 531, in run
    data = self.text_preprocessor.preprocess(text, text_lang, text_split_method)
  File "E:\AItools\GPT-SoVITS-Inference\GPT_SoVITS\TTS_infer_pack\TextPreprocessor.py", line 53, in preprocess
    phones, bert_features, norm_text = self.segment_and_extract_feature_for_text(text, lang)
  File "E:\AItools\GPT-SoVITS-Inference\GPT_SoVITS\TTS_infer_pack\TextPreprocessor.py", line 87, in segment_and_extract_feature_for_text
    textlist, langlist = self.seg_text(texts, language)
  File "E:\AItools\GPT-SoVITS-Inference\GPT_SoVITS\TTS_infer_pack\TextPreprocessor.py", line 99, in seg_text
    for tmp in LangSegment.getTexts(text):
  File "e:\AItools\GPT-SoVITS-Inference\runtime\lib\site-packages\LangSegment\LangSegment.py", line 676, in getTexts
    return LangSegment.getTexts(text)
  File "e:\AItools\GPT-SoVITS-Inference\runtime\lib\site-packages\LangSegment\LangSegment.py", line 572, in getTexts
    text = LangSegment._parse_symbols(text)
  File "e:\AItools\GPT-SoVITS-Inference\runtime\lib\site-packages\LangSegment\LangSegment.py", line 508, in _parse_symbols
    cur_word = LangSegment._process_tags([] , text , True)
  File "e:\AItools\GPT-SoVITS-Inference\runtime\lib\site-packages\LangSegment\LangSegment.py", line 461, in _process_tags
    LangSegment._parse_language(words , text)
  File "e:\AItools\GPT-SoVITS-Inference\runtime\lib\site-packages\LangSegment\LangSegment.py", line 330, in _parse_language
    LangSegment._addwords(words,language,text,score)
  File "e:\AItools\GPT-SoVITS-Inference\runtime\lib\site-packages\LangSegment\LangSegment.py", line 250, in _addwords
    else:LangSegment._saveData(words,language,text,score)
  File "e:\AItools\GPT-SoVITS-Inference\runtime\lib\site-packages\LangSegment\LangSegment.py", line 212, in _saveData
    LangSegment._statistics(preData["lang"],text)
  File "e:\AItools\GPT-SoVITS-Inference\runtime\lib\site-packages\LangSegment\LangSegment.py", line 191, in _statistics
    if not "|" in language:lang_count[language] += int(len(text)*2) if language == "zh" else len(text)
TypeError: list indices must be integers or slices, not str

LangSegment.py 第186行
有的时候会报错如上

增加了一些鲁棒性检测后不会报错了:

修正后代码

@staticmethod
def _statistics(language, text):
    if LangSegment._lang_count is None or not isinstance(LangSegment._lang_count, defaultdict):
        LangSegment._lang_count = defaultdict(int)
    lang_count = LangSegment._lang_count
    if not "|" in language:
        lang_count[language] += int(len(text)*2) if language == "zh" else len(text)
    LangSegment._lang_count = lang_count

数字部分分类逻辑

textlist = ["【齐鲁艺票通】恭喜您购票成功!","订单编号:","669775550013131,","取票码:","93342253;","请凭取票码在演出开始前30分钟到指定地点换取纸质票。"]
for text in textlist:
    print(LangSegment.getTexts(text))
# [{'lang': 'zh', 'text': '【齐鲁艺票通】恭喜您购票成功!'}]
# [{'lang': 'zh', 'text': '订单编号:'}]
# [{'lang': 'en', 'text': '669775550013131, '}]
# [{'lang': 'zh', 'text': '取票码:'}]
# [{'lang': 'en', 'text': '93342253; '}]
# [{'lang': 'zh', 'text': '请凭取票码在演出开始前30分钟到指定地点换取纸质票。'}]

大佬。整句输入判定正常,当用户选择按标点符号切割时,纯数字会被识别为英文,能否改成参照上文的逻辑处理?

識別錯誤

使用完整的ALL模型去做識別
日本 chatgpt 萬 chatgpt
{'lang': 'ja', 'text': '日本', 'score': 0.72695434}
{'lang': 'en', 'text': 'chatgpt ', 'score': 0.75385803}
{'lang': 'ja', 'text': '萬 ', 'score': 0.84992695}
{'lang': 'en', 'text': 'chatgpt ', 'score': 0.75385803}

日本、日本語、數字(一~十、百、千、萬等等)
日本、日本語 ->單獨打會出錯
單獨一個數字 ->前後包夾一個英文字會出錯(若換成使用中英日模型怎不會出錯)

语言过滤模块逻辑

text = "English中文"
LangSegment.setLangfilters(["en"])
print(LangSegment.getTexts(text))
LangSegment.setLangfilters(["zh"])
print(LangSegment.getTexts(text))

运行结果

[{'lang': 'en', 'text': 'English '}]
[{'lang': 'en', 'text': 'English '}]

希望获得联系方式

你好 juntaosun,我是一个产品经理,我的newsletter在这里:https://produck.zhubai.love/

希望与你取得联系讨论一个音视频项目的实现,望不吝赐教。

我的联系方式:de_base64("d2VjaGF0OiBtYWRsaWZlcjEzMzcgLyBtYWlsOiBtYWRsaWZlckBsaXZlLmNvbQ==”)

语言判断错误

输入:
那我们拆分来看一下:Part A(传统阅读理解)

Part A被识别成了日文

一个神奇的bug

image
只要遇到”加分!加分!“两个连续就会出现空的结果,导致报错。而单独的”加分!“却不会导致结果报错。
image
image
所以大家在使用的时候,如果出现报空的情况下,还是得多加一层判断,保证正确执行。

处理中文会丢字

输入:
每题0.5分,共10分,第二部分

输出:
每题0.5第二部分

丢失了"分共10分"这几个字。我尝试了把分改为别的字,比如“元”,不会发生丢字

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.