9. Toy Example: Generating ASTs for a DSL

In previous tutorials, we learned how to create generators for simple data types and how to combine manual and automatic derivation. Now, let’s apply these skills to a real-world scenario: generating Abstract Syntax Trees (ASTs) for a simple imperative programming language.

This tutorial is based on PIL (Primitive Imperative Language), a real example from the DepTyCheck repository.

Our Goal

In this tutorial, you will build a complete AST generator for a simple imperative language. By the end, you will have:

  1. Defined expression and statement types for the language

  2. Created both automatic (deriveGen) and hand-written generators

  3. Generated valid random programs with control flow structures

You will see output like:

Seq (Assign "x" (Lit 5)) (If (Add (Var "x") (Lit 3)) (Assign "y" (Lit 10)) Skip)

Expected time: 30-40 minutes


Step 1: Define the Expression Language

Let’s start with the foundation: arithmetic and logical expressions.

Create a new file named PILTutorial.idr

Add the basic setup and expression type

import Data.Fuel
import Data.List
import Deriving.DepTyCheck.Gen
import Test.DepTyCheck.Gen
import System.Random.Pure.StdGen

-- Arithmetic and logical expressions
data Expr : Type where
  Lit   : Nat -> Expr
  Var   : String -> Expr
  Add   : Expr -> Expr -> Expr
  Mul   : Expr -> Expr -> Expr
  And   : Expr -> Expr -> Expr
  Lt    : Expr -> Expr -> Expr

Show Expr where
  show (Lit l) = "Lit " ++ show l
  show (Var s) = "Var " ++ show s
  show (Add e1 e2) = "Add (" ++ show e1 ++ ") (" ++ show e2 ++ ")"
  show (Mul e1 e2) = "Mul (" ++ show e1 ++ ") (" ++ show e2 ++ ")"
  show (And e1 e2) = "And (" ++ show e1 ++ ") (" ++ show e2 ++ ")"
  show (Lt e1 e2) = "Lt (" ++ show e1 ++ ") (" ++ show e2 ++ ")"

Create a simple derived generator with a generator-helper for strings

genVarName : Fuel -> Gen MaybeEmpty String
genVarName _ = elements ["x", "y", "z", "temp", "result", "counter"]

genExpr : Fuel -> (Fuel -> Gen MaybeEmpty String) => Gen MaybeEmpty Expr
genExpr = deriveGen

Test it

main : IO ()
main = do
  putStrLn "--- Generating random expressions ---"
  for_ (the (List Int) [1..5]) $ \_ => do
    Just e <- pick (genExpr @{genVarName} (limit 3))
      | Nothing => putStrLn "Generation failed"
    printLn e

Build and run

echo -e ':exec main' | rlwrap pack repl ./PILTutorial.idr

Expected output (will vary):

--- Generating random expressions ---
Lt (Mul (Lt (Var "x") (Var "temp")) (Add (Lit 3) (Lit 1))) (Add (Var "temp") (Lt (Var "result") (Lit 0)))
Lt (And (Var "counter") (And (Lit 2) (Var "y"))) (And (Lit 1) (Add (Lit 0) (Var "counter")))
And (Mul (And (Lit 0) (Lit 0)) (Lit 3)) (Lit 3)
Mul (Lit 2) (Var "z")
Add (Lit 0) (Add (And (Var "y") (Lit 2)) (And (Lit 2) (Var "counter")))

Note

deriveGen automatically handles the recursive structure of expressions. The limit 3 fuel controls the maximum depth of nested expressions.


Step 2: Add Statements and Control Flow

Now let’s add statements that form complete programs.

Add the statement type to your file

-- Statements with control flow
data Stmt : Type where
  Skip   : Stmt
  Assign : String -> Expr -> Stmt
  Seq    : Stmt -> Stmt -> Stmt
  If     : Expr -> Stmt -> Stmt -> Stmt
  While  : Expr -> Stmt -> Stmt

Show Stmt where
  show Skip = "Skip"
  show (Assign s e) = "Assign " ++ show s ++ " (" ++ show e ++ ")"
  show (Seq s1 s2) = "Seq (" ++ show s1 ++ ") (" ++ show s2 ++ ")"
  show (If e1 s1 s2) = "If (" ++ show e1 ++ ") (" ++ show s1 ++ ") (" ++ show s2 ++ ")"
  show (While e1 s1) = "While (" ++ show e1 ++ ") (" ++ show s1 ++ ")"

Create a hand-written generator with explicit fuel control

genStmt : Fuel -> Gen MaybeEmpty Stmt
genStmt Dry = pure Skip  -- Base case: only non-recursive constructors
genStmt (More f) = frequency
  [ (1, pure Skip)
  , (5, Assign <$> genVarName (More f) <*> genExpr @{genVarName} (More f))
  , (3, Seq <$> genStmt f <*> genStmt f)
  , (2, If  <$> genExpr @{genVarName} (More f) <*> genStmt f <*> genStmt f)
  , (2, While <$> genExpr @{genVarName} (More f) <*> genStmt f)
  ]

Note

  • genStmt Dry = pure Skip ensures termination when fuel is exhausted

  • genStmt f (less fuel) in recursive calls controls program size

  • frequency weights make simple statements more common than complex ones

Test the statement generator

main_stmt : IO ()
main_stmt = do
  putStrLn "--- Generating random statements ---"
  for_ (the (List Int) [1..5]) $ \_ => do
    Just s <- pick (genStmt (limit 4))
      | Nothing => putStrLn "Generation failed"
    printLn s

Try to run main_stmt

echo -e ':exec main_stmt' | rlwrap pack repl ./PILTutorial.idr

Expected output:

--- Generating random statements ---
Assign "counter" (Lt (Lit 0) (Lit 4))
Assign "x" (Lt (Mul (Lt (Var "temp") (Lit 1)) (Lit 2)) (Lt (Var "counter") (And (Add (Var "y") (Lit 2)) (And (Var "counter") (Lit 2)))))
Assign "y" (And (Add (Mul (Var "x") (Var "temp")) (Var "temp")) (And (Mul (Mul (Var "y") (Lit 3)) (Lt (Var "counter") (Lit 2))) (Var "result")))
If (Lt (Mul (And (Lt (Lit 4) (Var "z")) (Add (Lit 1) (Var "x")))
   (Mul (Mul (Var "x") (Var "y")) (Add (Lit 0) (Lit 0))))
   (And (Lt (Add (Lit 3) (Lit 0)) (Mul (Lit 0) (Var "counter")))
   (Add (Var "z") (Var "temp")))
   (Assign "counter" (And (Mul (And (Var "z") (Lit 1)) (Var "result")) (Var "result")))
   (If (Lt (Mul (And (Var "z") (Var "counter")) (Lit 0))
   (Add (Lit 0) (Mul (Lit 3) (Lit 3))))
   (If (Mul (Mul (Lit 0) (Var "result")) (And (Lit 1) (Var "temp")))
   (While (Lt (Var "x") (Var "y")) (Skip))
   (While (Lt (Var "result") (Var "temp")) (Skip)))
   (Assign "temp" (Lt (Mul (Lit 2) (Lit 1)) (Mul (Lit 0) (Var "counter")))))
Seq (Assign "result" (Var "z"))
   (While (Add (Add (Var "counter") (And (Lit 3) (Lit 2)))
   (Lt (And (Var "temp") (Lit 1)) (Var "temp")))
   (Assign "y" (And (Lt (Var "counter") (Lit 1))
   (Lt (Var "counter") (Var "temp")))))

Step 3: Generate Complete Programs

Now let’s generate complete programs with multiple statements.

Define a Program type and generator

-- A program is a list of statements
data Program = MkProgram (List Stmt)

Show Program where
  show (MkProgram lst) = "MkProgram " ++ show lst

genProgram : (n : Nat) -> Fuel -> Gen MaybeEmpty Program
genProgram n fuel = MkProgram <$> listOf {length=pure n} (genStmt fuel)

Test program generation

main_program : IO ()
main_program = do
  putStrLn "--- Generating a random program (5 statements) ---"
  Just prog <- pick (genProgram 5 (limit 3))
    | Nothing => putStrLn "Generation failed"
  printLn prog

Try to run

echo -e ':exec main_program' | rlwrap pack repl ./PILTutorial.idr

Expected output:

--- Generating a random program (5 statements) ---
MkProgram [Seq (While (Add (Lit 1) (And (Lit 1) (Var "x"))) (While (Var "x") (Skip)))
   (Assign "temp" (And (And (Var "temp") (Var "y")) (Lit 1))),
   If (And (Add (Add (Var "x") (Lit 0)) (Lit 1)) (Lit 2))
   (Assign "result" (Lt (Lt (Lit 2) (Var "temp")) (Mul (Var "y") (Var "y"))))
   (While (Var "temp") (Seq (Skip) (Skip))),
   While (Mul (Var "result") (Mul (Lit 1) (Add (Var "y") (Lit 1))))
   (Assign "counter" (Lt (And (Lit 1) (Var "result")) (Lit 0))),
   Seq (If (And (And (Lit 1) (Var "result")) (Mul (Lit 0) (Lit 1)))
   (If (Lit 1) (Skip) (Skip)) (While (Lit 0) (Skip)))
   (If (Mul (Add (Var "x") (Lit 1)) (And (Lit 1) (Var "counter")))
   (While (Add (Lit 1) (Lit 1)) (Skip))
   (Assign "y" (Add (Lit 1) (Lit 0)))),
   Assign "temp" (Add (Add (Lit 1) (And (Var "y") (Var "z"))) (And (Lit 1) (Lit 3)))]

Note

listOf {length=pure n} gen runs the generator n times and collects results into a list.


Step 4: Test Properties of Generated Programs

Let’s verify that our generated programs have certain properties.

Add property checking functions

-- Check if a statement contains an assignment
hasAssign : Stmt -> Bool
hasAssign Skip = False
hasAssign (Assign _ _) = True
hasAssign (Seq s1 s2) = hasAssign s1 || hasAssign s2
hasAssign (If _ s1 s2) = hasAssign s1 || hasAssign s2
hasAssign (While _ s) = hasAssign s

-- Count statements in a program
stmtCount : Program -> Nat
stmtCount (MkProgram stmts) = length stmts

-- Check if program has a loop
hasLoop : Stmt -> Bool
hasLoop Skip = False
hasLoop (Assign _ _) = False
hasLoop (Seq s1 s2) = hasLoop s1 || hasLoop s2
hasLoop (If _ s1 s2) = hasLoop s1 || hasLoop s2
hasLoop (While _ _) = True

Test properties

main_properties : IO ()
main_properties = do
  putStrLn "--- Checking properties of generated programs ---"
  for_ (the (List Int) [1..10]) $ \_ => do
    Just prog <- pick (genProgram 3 (limit 3))
      | Nothing => putStrLn "Generation failed"

    let hasAnyAssign = Prelude.any {t=List} hasAssign (case prog of MkProgram stmts => stmts)
    let hasAnyLoop = Prelude.any {t=List} hasLoop (case prog of MkProgram stmts => stmts)
    let count = stmtCount prog

    putStrLn $ "Program with " ++ show count ++ " statements"
    putStrLn $ "  Has assignment: " ++ show hasAnyAssign
    putStrLn $ "  Has loop: " ++ show hasAnyLoop
    putStrLn ""

Try to run

echo -e ':exec main_properties' | rlwrap pack repl ./PILTutorial.idr

A sample of the output:

--- Checking properties of generated programs ---
Program with 3 statements
  Has assignment: True
  Has loop: False

Program with 3 statements
  Has assignment: True
  Has loop: True

Program with 3 statements
  Has assignment: True
  Has loop: False
...

Note

You can use these property checks to ensure your generated programs meet certain criteria for testing compilers or interpreters.


Next Steps

Now that you can generate complex ASTs, you’re ready for many other applications:

  • Test an interpreter: Use your generated PIL programs to test a language interpreter with property-based testing.

  • Add more language features: Extend the language with functions, arrays, or I/O operations.

  • Integrate custom generators: Continue to Mixing Manual and Automatic to see how deriveGen discovers and uses your custom generators.

  • Control distribution: Continue to Derivation Tuning to learn how to fine-tune constructor probabilities for more realistic program distributions.

The complete PILTutorial.idr file is available for reference. You can find it in the DepTyCheck examples or build it step-by-step following this tutorial.