Comments (2)
Would like an optional encoding flag, which defaults to "utf-8" but you could specify others. I have to use "latin-1" for some cases.
from gpt-2.
Yeah, I'm not yet 100% sure myself if it should be UTF-8 or one should use system-default encoding dataset instead of UTF-8 and open them as such... Trying to train it on Polish text to see the results. Unfortunately it doesn't want to use Polish accent letters, for example replaces ล with normal l with samples. Maybe I'm missing something or it still needs more training? (although it uses รณ which usually exists in 1-byte encoding format)
EDIT: Never-mind the above... It seems that the console output is UTF-8 in my CMD which just simply doesn't work, it would need to be converted to ANSI using Polish code page before output, so in my case UTF-8 is most valid way to read datasets (without BOM!). Sample files look OK.
from gpt-2.
Related Issues (20)
- gpt2 translation task HOT 1
- How to train in multiple gpu HOT 1
- Early Stopping
- OOM on 345M with GPU HOT 4
- "ModelNotFoundError": No model named "encoder"
- ModuleNotFoundError: No module named 'encoder' HOT 7
- Where to enter model name as parameter HOT 1
- Cannot download the models HOT 4
- File "encode.py", line 23 HOT 1
- GPT2 Fine Tuning does not support Ampere (RTX 3000s) Cards
- Dockerfile no longer accurate? HOT 2
- ModuleNotFoundError: No module named 'Sampler' HOT 3
- I can't train my dataset HOT 3
- Need some help getting Tensor Rematerialization to work HOT 7
- this parameter cannot be zero when i using my file history.npz
- Encoder doesn't work HOT 1
- avg stays in 2.6-2.9 range
- how to use trained data to generate text? HOT 1
- GPT-2 doesn't have Encode.py or Train.py.
- Training on TPU HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gpt-2.