Code Monkey home page Code Monkey logo

Comments (3)

enkiv2 avatar enkiv2 commented on September 24, 2024 1

I'd like to make a couple notes about what's going on under the hood here.

First, I generate a character set.

Each character is a series of strokes separated by angle changes. Originally this logic was for pyturtle's pen-based system, which made a lot of sense for simulated handwriting. So, a stroke feeds into the next one -- every character can be drawn without lifting the pen, with the exception of accents. (A character can have one or two dots or grave/acute accents -- if a character has two dots it's an umlaut and if a character has both an acute and grave accent it has a carat.)

Every element of the character with the exception of the accents is actually phonetic: each stroke type is a consonant sound and each angle change is a vowel sound. (This is inspired by hangul, where what appears to be a logogram is actually a cluster of up to three phonetic characters.) In this case we have up to five stroke-angle pairs. These phonetic readings aren't used, but in the original version of the script they were in the debug output.

Strokes can be either full length or half length, and they can be either lines or semicircles. Angle changes are limited to 45 degree intervals (i.e., 45, 90, 180, -45, and -90). These limitations are intended to mimic the kinds of differences that might actually work in a hand-written language -- there needs to be a big threshhold between distinct characters or else it's easy to misread.

A character set is between 20 and 36 characters -- about the same range as in reality for one- or two-sound characters in phonetic writing systems. Since ours actually has up to five syllables per character, we really should have many more, but that's a pain.

Then, I create a vocabulary by combining random characters. Originally, I had a bias toward short words and tied this bias to word frequency, but I don't do that anymore because I was having problems with the output. The vocabulary is supposed to be about 300 words, between one and five characters long.

Once I have a vocabulary, I make something resembling a grammar by creating a bunch of sentences whose markov model will resemble a markov model of a real language. Basically, I create a sentence pool and accumulate randomly chosen words from the vocabulary to randomly chosen parts of the pool while growing the pool. The result is that some words will have significantly stronger associations, so once we make a markov model, the distribution of stuff produced by chaining from that model will be zipf. I think. I didn't actually calculate it out properly, so I might be completely wrong.

I create an image for every word in the vocabulary, and then chain & render the result onto pages. I was getting a lot of single-word lines so I created a filter that merged lines 98% of the time, which brought the page count down to something more reasonable.

In my first pdf the characters are a little hard to see, since the base stroke unit is so small (5 pixels). So, I created a second one with a 10 pixel base stroke length: https://github.com/enkiv2/misc/blob/master/nanogenmo-2017/asemic-10pt.pdf

Since getting kerning right is really hard, I turned on cursive mode & created another version with a connected script: https://github.com/enkiv2/misc/blob/master/nanogenmo-2017/asemic-10pt-cursive.pdf

All of these have 50k or more 'words'.

from 2017.

eoinnoble avatar eoinnoble commented on September 24, 2024

This is fantastic, never thought of using Python for something like this!

from 2017.

enkiv2 avatar enkiv2 commented on September 24, 2024

from 2017.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.