Comments (1)
Cấu hình hệ thống:
- Server: 24 core
- RAM 80GB
- Process: 10
Sử dụng 2 cách:
- Queue để get text line
- Pool.map
Vấn đề: Process được khoảng 100MB text thì full RAM
Bên dưới là code sử dụng Pool.map còn code sử dụng Queue thì đã xóa rồi.
import multiprocessing
from gensim.models import Phrases
from gensim.models.phrases import Phraser
import pickle
from underthesea import word_sent
import time
from multiprocessing import Queue, Pool
sentences = []
with open('corpus.txt', 'r') as fs:
for count, sentence in enumerate(fs):
if count % 10000 == 0:
print(count)
sentence = sentence.strip()
sentences.append(sentence)
def run(sent):
p_name = multiprocessing.current_process().name
files_fs[p_name].write("%s\n" % "".join(word_sent(sent, format="text")))
if __name__ == '__main__':
num_process = 20
files_fs = {}
for i in range(1, num_process + 1):
file_name = './corpus/ForkPoolWorker-%s' % i
files_fs['ForkPoolWorker-%s' % i] = open(file_name, 'w')
p = Pool(num_process)
print(p.map(run, sentences))`
from underthesea.
Related Issues (20)
- Create text to speech with custom voice HOT 1
- VLC Corpus 2023
- underthesea for another languages
- Vietnamese Fiction Dataset
- Vietnamese Abstract Meaning Represeantion
- Optimization of Underthesea Codebase Size
- Lỗi gặp phải khi chuyển văn bản thành giọng nói (TTS) HOT 1
- không tải được, nó hiện lỗi ModuleNotFoundError: No module named 'maturin HOT 5
- Support python 3.12 (2024Q1)
- Bug detecting names with hyphens.
- Incompatibility with sklearn >= 1.5
- field list be able returned from classify function?
- 🌊 Underthesea v7
- Support underthesea_core with python 3.12
- Rust bindings in Docker on M1 HOT 2
- Underthesea 6.8.3 requires fasttext on import but the dependency is missing HOT 4
- Summarization (Text, Audio, Video)
- Integrate with Foundation Models
- Assistant API with Generative AI
- Sentiment feat issue HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from underthesea.