Brainfuck Interpreter in C++17——A Modern Approach to Kill Your Brain

It has been a long time that I wrote my last C++ code. And obviously, C++ has released the new standard, C++17, for a few months. There are a lot features in which has been introduced or changed.

One out many is the pattern matching using std::variant, std::monostate and std::visit. This feature could lead the data-oriented design easier in modern C++. And it's not that hard to write a (possibly buggy) brainfuck interpreter with this feature.

Although with that said, we still need some extra 'hacking' to achieve a more elegant code style. There is a blog post by Marius Elvert explaining a two-line visitor in C++, with which you could write a few lambda expressions in one visitor and then the std::variant could be dispatched by the receiving type of those lambda.

auto my_visitor = visitor{
    [&](int value) { /* ... */ }
    [&](std::string const & value) { /* ... */ }
};

To accomplish the goal, Marius found that lager just uses two lines of code. The details are explained in aforementioned blog post. And for the readability of the code today, I just substituted the name of the struct.

And if you're in a hurry, the full code is on my GitHub, https://github.com/BlueCocoa/brainfuck-cpp17

template<class... Ts> struct brainfuck_vm : Ts... { using Ts::operator()...; };
template<class... Ts> brainfuck_vm(Ts...) -> brainfuck_vm<Ts...>;

And the next step is to define all the 8 valid ops of brainfuck.

#pragma mark - brainfuck ops

struct increment_value_op {}; // +
struct decrement_value_op {}; // -

struct increment_ptr_op {};   // >
struct decrement_ptr_op {};   // <

struct print_op {};           // ,
struct read_op {};            // .

struct loop_start_op {};      // [
struct loop_end_op {};        // ]

To keep the code clean and simple as much as possible, the using keyword is very helpful. And to be able to write some comments (with no brainfuck keyword inside of course) in brainfuck code, std::monostate would be useful too. So the std::variant to be defined will have 8+1 valid types.

/// brainfuck_op allowed ops in C++17 std::variant
using brainfuck_op = std::variant<
    increment_value_op,
    decrement_value_op,
    increment_ptr_op,
    decrement_ptr_op,
    print_op,
    read_op,
    loop_start_op,
    loop_end_op,
    std::monostate
>;

Furthermore, since we need to read from the stdin or from any file, which will using character form ops (+-><,.[]). A map is used for convenience.

/// a map from char to brainfuck_op
const std::map<char, brainfuck_op> bf_op_map {
    {'+', increment_value_op{}},
    {'-', decrement_value_op{}},
    {'>', increment_ptr_op{}},
    {'<', decrement_ptr_op{}},
    {'.', print_op{}},
    {',', read_op{}},
    {'[', loop_start_op{}},
    {']', loop_end_op{}}
};

Well, the last definition would surely be the brainfuck virtual machine! It's highly possibly not the best design, maybe even not a good one, but it would work as we intended.

I'm using std::map to simulate an infinite length tape of brainfuck vm, typically, the tape of brainfuck vm is 30000. But, hey, why not:) Given the existence of the operator[], it's easy for you to substitute the std::map with std::array.

#pragma mark - brainfuck vm

/// brainfuck virtual machine status
struct brainfuck_vm_status {
    /// virtual infinity length tape
    std::map<int, int> tape;
    /// current cell of the tape
    int tape_ptr = 0;

    /// used for keeping track of all valid brainfuck_op
    std::vector<char> instruction;
    /// current brainfuck_op index
    int instruction_ptr_current = -1;
    /// keeping track of loops
    std::stack<int> instruction_loop_ptr;

    /// flag of skipping loop, e.g
    /// +-[[[------------++++++++++-.>>[>]>>>--<<<<<<--]]]++++
    ///   ^skipping from, but we need all                ^end of skipping
    ///      instructions inside.
    int jump_loop = 0;
};

With all these definitions, the prototype of our interpreter would be like below.

int main(int argc, const char * argv[]) {
    // the brainfuck vm
    brainfuck_vm_status status;
    // temp variable for receving the character form op
    auto char_op = char{};
    // read char op forever until EOF
    while (std::cin >> char_op) {
        // interpret
        run_vm(status, char_op);
    }
}

That's pretty straightforward! Let's focus on the run_vm function then.

#pragma mark - brainfuck vm interpreter

/**
 run brainfuck vm

 @param status   run brainfuck vm from the given state
 @param char_op  character form op
 @param via_loop due to the way I wrote, a flag is needed to avoid re-adding ops
 */
void run_vm(brainfuck_vm_status & status, char char_op, bool via_loop = false) {
    // get the op from char_op
    brainfuck_op op = next_op(status, char_op, via_loop);

    // parttern matching
    std::visit(brainfuck_vm {
        [&](increment_value_op) {
        },
        [&](decrement_value_op) {
        },
        [&](increment_ptr_op) {
        },
        [&](decrement_ptr_op) {
        },
        [&](print_op) {
        },
        [&](read_op) {
        },
        [&](loop_start_op) {
        },
        [&](loop_end_op) {
        },
        [&](std::monostate) {
        }
    }, op);
}

The skeleton shares much similarity with some functional programming language. The things haven't done are the corresponding code of each op and the next_op function.

Let's start with the next_op function, without which, no tests could perform to verify our code blocks for these 8 brainfuck ops.

#pragma mark - helper function

/**
 next brainfuck op

 @param status   the brainfuck vm status
 @param char_op  character form op
 @param via_loop due to the way I wrote, a flag is needed to avoid re-adding ops
 @return brainfuck_op
 */
brainfuck_op next_op(brainfuck_vm_status & status, char char_op, bool via_loop = false) {
    // find the brainfuck_op from bf_op_map
    if (auto op = bf_op_map.find(char_op); op != bf_op_map.end()) {
        // do not append the char_op if we're retriving the next op inside a loop_op
        if (!via_loop) {
            // save char_op to instruction
            status.instruction.emplace_back(char_op);
            // increse the ptr of current instruction
            status.instruction_ptr_current++;
        }

        // return next op
        return op->second;
    } else {
        // invaild char for brainfuck
        // monostate is returned
        return std::monostate();
    }
}

And finally, to fill the blanks we left in run_vm:) Most of which are deadly simple but the loop ones. It takes some time to check whether the boundaries and recursions are correct in the code.

#pragma mark - brainfuck vm interpreter

/**
 run brainfuck vm

 @param status   run brainfuck vm from the given state
 @param char_op  character form op
 @param via_loop due to the way I wrote, a flag is needed to avoid re-adding ops
 */
void run_vm(brainfuck_vm_status & status, char char_op, bool via_loop = false) {
    // get the op from char_op
    brainfuck_op op = next_op(status, char_op, via_loop);

    // parttern matching
    std::visit(brainfuck_vm {
        [&](increment_value_op) {
            printf("increment_value_op\n");
            // skip actual action if we're skipping loop
            if (status.jump_loop == 0) {
                status.tape[status.tape_ptr]++;
            }
        },
        [&](decrement_value_op) {
            printf("decrement_value_op\n");
            // skip actual action if we're skipping loop
            if (status.jump_loop == 0) {
                status.tape[status.tape_ptr]--;
            }
        },
        [&](increment_ptr_op) {
            printf("increment_ptr_op\n");
            // skip actual action if we're skipping loop
            if (status.jump_loop == 0) {
                status.tape_ptr++;
            }
        },
        [&](decrement_ptr_op) {
            printf("decrement_ptr_op\n");
            // skip actual action if we're skipping loop
            if (status.jump_loop == 0) {
                status.tape_ptr--;
            }
        },
        [&](print_op) {
            printf("print_op\n");
            // skip actual action if we're skipping loop
            if (status.jump_loop == 0) {
                putchar(status.tape[status.tape_ptr]);
                printf("%c - %d\n", status.tape[status.tape_ptr], status.tape[status.tape_ptr]);
            }
        },
        [&](read_op) {
            // skip actual action if we're skipping loop
            if (status.jump_loop == 0) {
                status.tape[status.tape_ptr] = getchar();
            }
        },
        [&](loop_start_op) {
            printf("loop from ins ptr: %d cond[%d]\n", status.instruction_ptr_current, status.tape[status.tape_ptr]);
            // if and only if 1) `current_cell_value != 0`
            //                2) and we're not do the skipping
            // we can record the starting index of the if instruction
            // besides, if we're in condition 1)
            // the if statement should be also skipped
            if (status.tape[status.tape_ptr] != 0 && status.jump_loop == 0) {
                // push the starting instruction index of loop
                status.instruction_loop_ptr.emplace(status.instruction_ptr_current);
            } else {
                status.jump_loop++;
            }
        },
        [&](loop_end_op) {
            printf("loop end ins ptr: %d cond[%d]\n", status.instruction_ptr_current, status.tape[status.tape_ptr]);
            // decrease the jump_loop value if we encounter the `]`
            // and we were previously doing the skip
            if (status.jump_loop != 0) {
                status.jump_loop--;
            } else {
                // if we were not in skipping
                // then we need to check the loop condition, `current_cell_value != 0`
                if (status.tape[status.tape_ptr] != 0) {
                    // the instruction range of current loop
                    printf("loop [%d, %d]:\n", status.instruction_loop_ptr.top(), status.instruction_ptr_current);
#ifdef DEBUG
                    // dump all instructions inside the loop
                    for (int debug = status.instruction_loop_ptr.top(); debug <= status.instruction_ptr_current; debug++) {
                        putchar(status.instruction[debug]);
                    }
                    putchar('\n');
#endif

                    // loop the instruction until condition satisfies no more
                    while (status.tape[status.tape_ptr] != 0) {
                        // save current instruction pointer
                        int current = status.instruction_ptr_current;
                        // start the loop right after the index of `[`
                        status.instruction_ptr_current = status.instruction_loop_ptr.top() + 1;
                        // run one op at a time
                        // until the next op is the corresponding `]`
                        while (status.instruction_ptr_current < current) {
                            run_vm(status, status.instruction[status.instruction_ptr_current], true);
                            status.instruction_ptr_current++;
                        }
                        // restore the current instruction pointer
                        status.instruction_ptr_current = current;
                    }

                    // pop current loop starting index
                    status.instruction_loop_ptr.pop();
                }
            }
        },
        [&](std::monostate) {
        }
    }, op);
}

You may try this brainfuck code which should output "Hello World!":)

++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++..+++.>>.<-.<.+++.------.--------.>>+.>++.
声明: 本文为0xBBC原创, 转载注明出处喵~

One thought on “Brainfuck Interpreter in C++17——A Modern Approach to Kill Your Brain”

Leave a Reply

Your email address will not be published. Required fields are marked *