For some reason, one night I wanted to get a lot of fonts. Many of them. After an hour i got a lot Hazy The script pulled down the fonts, and a few days later, I had over 50,000 fonts on my computer.

Then I decided to convert it to a bitmap. This requires a little more attention than it looks. You need to crop each character in the font so that it aligns vertically, and then scale everything to fit the bitmap. I started with a 512 * 512 bitmap of all the characters. For all fonts, we find the maximum and minimum y of the bounding box, and we find the same for individual characters. After juggling the numbers further, I was able to reduce all the characters to 64 * 64.

The result is a tensor of size 56443 * 62 * 64 * 64. Exercise for Readers: Where did the number 62 come from? I saved it as a small (13GB) HDF5 file that I can download from here. fonts.hdf5.

Taking the average of all fonts, we get:

Hopefully, it should now be clear where the number 62 came from.

The median is much less blurry than the average.

Both the mean and median are well-formed and easy to read! However, individual fonts are everywhere.

Stealing fonts from various sketchy places on the web, I think I actually begged for it. In particular, most fonts don’t even have a lowercase version of the characters. A few fonts just miss a particular character and output a rectangle instead. And look at the ridiculous Power Rangers numbers in the lowercase “c”!

**Neural network training**

Now let’s train the neural network that generates the character! Specifically, I wanted to create a “font vector”, which is a vector of latent space that “defines” a particular font. In this way, similar fonts embed all fonts in spaces with similar vectors.

I used Lasagne / Theano to build a simple neural network – Check out the code here.. It took a very long time to converge, probably because of the large amount of data and parameters.rear *several weeks* When run, the model converges on what looks good.

Some notes about the model

- Four hidden layers of fully connected layers 1024 wide.
- The last layer is 4096 layers (64 * 64) with sigmoid non-linearity so that the output is between 0 (white) and 1 (black).
- L1 loss between forecast and target. This works much better than L2, which produces a very “gray” image. You can see it qualitatively in the picture above.
- A fairly strong L2 regularization of all parameters.
- Correction unit with non-linearity leakage for each layer (alpha = 0.01).
- The first layer is 102D. Each font is a 40D vector combined with a 62D binary one-hot vector of what a character is.
- The learning rate is 1.0, which is surprisingly high – it seems to be working well. If no epoch achieves a 10% test set improvement, reduce it by a third.
- The size of the mini-batch is 512 – for some strange reason, it seems that the larger the mini-batch, the faster the convergence.
- There is no dropout, it seems useless. I added some Gaussian noise (sigma 0.03) to the font vector, but it seemed to be a bit useful qualitatively.
- Very simple data augmentation by randomly blurring the input with a sigma sampled from [0, 1].. My theory was that this would be useful for fitting letters with thin lines.

All code programmingshots / deep-fonts Github repository.

After convergence, you will have 40D embedding for all 50k fonts. Eventually it seems to be a nearly multivariate normal distribution. The distribution of each of the 40 dimensions is as follows.

**Playing with the model**

First, let’s recreate the actual font characters using the characters generated from the network. Let’s plot the actual characters along with the output of the model. For each pair below, the actual letters are on the left and the model output is on the right.

These are the letters drawn from all *Test set*, Therefore, the network is unaware of any of them during training. All we are telling the network is (a) what the font is and (b) what the letters are. The model saw other characters in the same font during training, so it’s a good guess from the test examples that aren’t visible in those training examples.

The network does a decent job with most characters, but gives up on some of the more difficult characters. For example, characters with thin black lines are very difficult to predict in the model. This is because rendering a line a few pixels horizontally costs twice as much as rendering a blank.

You can also interpolate between different fonts in continuous space. All fonts are vectors, so you can create any font vector and generate output from it. Let’s sample the four fonts, place them in the corners of the square, and interpolate between them.

Certain characters have multiple formats that can be interpolated.Lowercase *g:*

You can also select a font vector to generate a new font from random perturbations.

(By the way, God of the Internet-Please forgive me for wasting bandwidth on all the animated GIFs in this blog post!)

You can also generate a whole new font. If you model the distribution of font vectors as a multivariate normal distribution, you can sample random vectors from them and examine the fonts they produce. Interpolating between some of these vectors in the figure below.

What’s interesting here is that the model learned that many fonts use uppercase letters in the lowercase range. The network interpolates seamlessly between Q and q. Here is an example of a network that interpolates very slowly between two fonts, which is the main difference.

With all the fonts in contiguous spaces, another great thing we can do is do t-SNE with them and embed all the fonts in a 2D plane. Here is a small excerpt of such an embedding:

**At the end**

There are many other fun things you can do. Clearly there is room for improvement here. In particular, if you have more time, you will definitely explore generative hostile models. This seems to be good at generating images. Some other things, such as batch normalization and parametric leak correction, should be relatively easy to implement.And finally, the network architecture itself could probably benefit from deconvolution instead of a fully connected layer.

Feel free to Download data If you are interested, please play!

** Tagged: Popular, etc.
**