A Simple Program Generator for Benchmarking Generative Models

Tuesday, November 28, 2017

For my master's thesis created a simple Haskell generator called huzzer to generate data for my experiments. You can try it out to see what sort of programs it makes:

>>>>> pip3 install huzzer && huzz -a 3 -e 4
module Generated (function0,function1,function2,function3) where

function0 :: Int -> Bool
function0 a = ((a * a) < ((mod a a) * 1))

function1 :: Bool -> Int -> Bool
function1 a b = (b /= (div 5 b))

function2 :: Bool -> Bool
function2 a = ((mod (1 * 2) (8 - 4)) > 1)

function3 :: Bool -> Bool -> Int -> Bool
function3 a b c = (c == (c * (c + 5)))

huzzer produces very simple Haskell code, and can be parameterized to create more complex (or more simple) functions. The code generated is very limited (only numbers from 1-9, letters from a-h etc.). This means that the whole subset of Haskell generated by this program consists of only 54 tokens (excluding whitespace). This makes it tractable to train generative models (VAEs, GANs) on these programs.

I also built an ANTLR grammar for these Haskell programs, and some functions to tokenize the code.