Comments (3)
I'd like to make a couple notes about what's going on under the hood here.
First, I generate a character set.
Each character is a series of strokes separated by angle changes. Originally this logic was for pyturtle's pen-based system, which made a lot of sense for simulated handwriting. So, a stroke feeds into the next one -- every character can be drawn without lifting the pen, with the exception of accents. (A character can have one or two dots or grave/acute accents -- if a character has two dots it's an umlaut and if a character has both an acute and grave accent it has a carat.)
Every element of the character with the exception of the accents is actually phonetic: each stroke type is a consonant sound and each angle change is a vowel sound. (This is inspired by hangul, where what appears to be a logogram is actually a cluster of up to three phonetic characters.) In this case we have up to five stroke-angle pairs. These phonetic readings aren't used, but in the original version of the script they were in the debug output.
Strokes can be either full length or half length, and they can be either lines or semicircles. Angle changes are limited to 45 degree intervals (i.e., 45, 90, 180, -45, and -90). These limitations are intended to mimic the kinds of differences that might actually work in a hand-written language -- there needs to be a big threshhold between distinct characters or else it's easy to misread.
A character set is between 20 and 36 characters -- about the same range as in reality for one- or two-sound characters in phonetic writing systems. Since ours actually has up to five syllables per character, we really should have many more, but that's a pain.
Then, I create a vocabulary by combining random characters. Originally, I had a bias toward short words and tied this bias to word frequency, but I don't do that anymore because I was having problems with the output. The vocabulary is supposed to be about 300 words, between one and five characters long.
Once I have a vocabulary, I make something resembling a grammar by creating a bunch of sentences whose markov model will resemble a markov model of a real language. Basically, I create a sentence pool and accumulate randomly chosen words from the vocabulary to randomly chosen parts of the pool while growing the pool. The result is that some words will have significantly stronger associations, so once we make a markov model, the distribution of stuff produced by chaining from that model will be zipf. I think. I didn't actually calculate it out properly, so I might be completely wrong.
I create an image for every word in the vocabulary, and then chain & render the result onto pages. I was getting a lot of single-word lines so I created a filter that merged lines 98% of the time, which brought the page count down to something more reasonable.
In my first pdf the characters are a little hard to see, since the base stroke unit is so small (5 pixels). So, I created a second one with a 10 pixel base stroke length: https://github.com/enkiv2/misc/blob/master/nanogenmo-2017/asemic-10pt.pdf
Since getting kerning right is really hard, I turned on cursive mode & created another version with a connected script: https://github.com/enkiv2/misc/blob/master/nanogenmo-2017/asemic-10pt-cursive.pdf
All of these have 50k or more 'words'.
from 2017.
This is fantastic, never thought of using Python for something like this!
from 2017.
from 2017.
Related Issues (20)
- Dial "S" for Sudoku HOT 2
- Grammar-based Generation HOT 1
- Shakespeare Summarizes Everything HOT 6
- Hard West Turn HOT 5
- Every Novel Generator HOT 5
- A Picture is Worth N Words HOT 1
- Tillman, Victor Lima, KOD HOT 5
- Emic Automata HOT 6
- views on an object -- kr pipkin & v21
- 2d Markov HOT 2
- What is Trump? by @hugovk
- The Infinite Fight Scene HOT 6
- LSTM-based Leo Tolstoy text generator
- Thematic Automata HOT 2
- Pride, Prejudice by @hugovk HOT 13
- The Edward Lear Limerick Generator HOT 4
- Frequency transforms of text HOT 4
- Icebox by @hugovk
- NaNoGenMo/Creative NLP Slack channel? HOT 7
- Language survey 2017 HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from 2017.