Brainfuck JIT 简易实现

本文的灵感来源是, 某次 C++ 课程的 OJ 题出了一道题是写一个 Brainfuck 解释器.

群里某位同学开始吐槽 OJ 样例太弱, 代码不需要优化就能过.

又恰好前几天在 GitHub 上看到 BF JIT 的一个教程实现, 于是决定自己也用 C++ 来实践一下.

~~我们的原题只是想用 C++ 写个很朴素的 BF 解释器, 现在纯属是杀鸡用屠龙宝刀了.~~

Brainfuck 语言简介 (From wikipedia)

The language consists of eight commands, listed below. A brainfuck program is a sequence of these commands, possibly interspersed with other characters (which are ignored). The commands are executed sequentially, with some exceptions: an instruction pointer begins at the first command, and each command it points to is executed, after which it normally moves forward to the next command. The program terminates when the instruction pointer moves past the last command.

The brainfuck language uses a simple machine model consisting of the program and instruction pointer, as well as a one-dimensional array of at least 30,000 byte cells initialized to zero; a movable data pointer (initialized to point to the leftmost byte of the array); and two streams of bytes for input and output (most often connected to a keyboard and a monitor respectively, and using the ASCII character encoding).

The eight language commands each consist of a single character:

Character	Meaning
`>`	Increment the data pointer (to point to the next cell to the right).
`<`	Decrement the data pointer (to point to the next cell to the left).
`+`	Increment (increase by one) the byte at the data pointer.
`-`	Decrement (decrease by one) the byte at the data pointer.
`.`	Output the byte at the data pointer.
`,`	Accept one byte of input, storing its value in the byte at the data pointer.
`[`	If the byte at the data pointer is zero, then instead of moving the instruction pointer forward to the next command, jump it forward to the command after the matching `]` command.
`]`	If the byte at the data pointer is nonzero, then instead of moving the instruction pointer forward to the next command, jump it back to the command after the matching `[` command.

As the name suggests, Brainfuck programs tend to be difficult to comprehend. This is partly because any mildly complex task requires a long sequence of commands and partly because the program’s text gives no direct indications of the program’s state. These, as well as Brainfuck’s inefficiency and its limited input/output capabilities, are some of the reasons it is not used for serious programming. Nonetheless, like any Turing complete language, Brainfuck is theoretically capable of computing any computable function or simulating any other computational model, if given access to an unlimited amount of memory. A variety of Brainfuck programs have been written. Although Brainfuck programs, especially complicated ones, are difficult to write, it is quite trivial to write an interpreter for Brainfuck in a more typical language such as C due to its simplicity. There even exist Brainfuck interpreters written in the Brainfuck language itself.

Brainfuck is an example of a so-called Turing tarpit: It can be used to write any program, but it is not practical to do so, because Brainfuck provides so little abstraction that the programs get very long or complicated.

翻译指令序列

Just-in-time compiler, 顾名思义(?), 就是要在运行时将代码翻译成机器码序列执行.

JIT 编译对于我们常用的抽象程度较高的高级语言都十分复杂(如 Hotspot, V8 等), 但对于 Brainfuck 这种 esolang 则十分简单.

Brainfuck 中的每个指令都能被简单地翻译成对应的 C 语言代码, 也能被直接翻译成机器码.

比如 > 可以翻译为 data_pointer++;, + 可以被翻译为 data[data_pointer]++;, …

我们现在可以将输入的程序字符串直接替换成机器码, 就完成了程序的翻译.

但~~你先别急~~.

比如连续的十个 > 会被直接翻译为十句 data_pointer++;, 而不是更优化的 data_pointer += 10;.

我们可以先对 BF 程序做一些简单的优化再将其进行翻译.

具体的实现主要就是将连续的 +->< 指令进行了合并: https://github.com/lyc8503/BrainfuckJIT/blob/master/main.cpp#L64

得到了我们自定义的指令序列之后就可以开始将它们翻译成汇编代码了.

我们使用 rdx 寄存器作为上面提到的数据指针, 那么指令就可以进行如下的对应翻译.

> 和 <
1
2
add rdx, 1 sub rdx, 1
+ 和 -
1
2
add byte [rdx], 1 sub byte [rdx], 1

. 和 ,

使用了 Linux 的系统调用 read 和 write

mov rax, 0      ; `read` syscall no 0
mov rdi, 0      ; 1st arg: int fd, value 0 (stdin)
mov rsi, rdx    ; 2nd arg: void* buf, addr rdx
push rdx        ; save rdx to stack
mov rdx, 1      ; 3rd arg: size_t count, 1 character
syscall         ; do the syscall
pop rdx         ; restore rdx

mov rax, 1      ; `write` syscall no 1
mov rdi, 1      ; 1st arg: int fd, value 1 (stdout)
mov rsi, rdx    ; 2nd arg: void* buf, addr rdx
push rdx        ; save rdx to stack
mov rdx, 1      ; 3rd arg: size_t count, 1 character
syscall         ; do the syscall
pop rdx         ; restore rdx

[ 和 ]
这两个是条件跳转指令, 具体的汇编可以表示如下
1
2
3
4
5
cmp byte [rdx], 0 je 0x1 cmp byte [rdx], 0 jne 0x1
但在刚读取到前中括号时我们时不知道后括号的位置的, 没法填充后跳转的位置.
不过这个经典问题显然可以使用栈来解决.
在遇到前括号时将当前的代码位置压栈, 遇到对应的后括号的就可以计算出对应的跳转位置.

具体实现: https://github.com/lyc8503/BrainfuckJIT/blob/master/main.cpp#L108

执行机器码

在开始执行整段代码之前我们首先要设置 rdx，让其指向我们预先准备好的一段内存区域.

// initialize the rdx as the data pointer
// machine code generated by nasm `mov rdx, [the value of the pointer to data]`
code.push_back(0x48);
code.push_back(0xba);

// little-endian byte order
for (int i = 0; i < 8; i++) {
	code.push_back(((unsigned long long) data & ((unsigned long long) 0xff << (8 * i))) >> (8 * i));
}

接下来理论上我们只需要使用内联汇编将我们将我们当前程序的 rip 寄存器指向代码的开头, 就可以正确执行代码了.

但由于现在操作系统的安全保护措施, 生成的机器码是不能直接执行的. (否则会直接喜提一个内存违例 SIGSEGV.)

我们需要先向操作系统特别申请一块可以执行的内存空间, 在 Linux 下可以使用 mmap 来实现. 具体参数含义请参考手册.

1 2	`// allocate rwx memory virtualCodeAddress = mmap(nullptr, len, PROT_READ \| PROT_WRITE \| PROT_EXEC, MAP_ANONYMOUS \| MAP_PRIVATE, 0, 0);`

之后我们就将代码复制到这块内存区域, 直接使用函数指针的方式修改 rip, 当然 C++ 编译器会为我们的函数调用创建一个栈帧, 所以不要忘了在机器码最后加上一个 ret 来正确地返回.

1	code.push_back(0xc3); // machine code `ret` to exit the func

具体实现: https://github.com/lyc8503/BrainfuckJIT/blob/master/main.cpp#L21

正确性测试

完成之后能够通过原本 OJ 的所有测试用例.

也可以尝试跑个自解释器~

>>>+[[-]>>[-]++>+>+++++++[<++++>>++<-]++>>+>+>+++++[>++>++++++<<-]+>>>,<++[[>[->>]<[>>]<<-]<[<]<+>>[>]>[<+>-[[<+>-]>]<[[[-]<]++<-[<+++++++++>[<->-]>>]>>]]<<]<]<[[<]>[[>]>>[>>]+[<<]<[<]<+>>-]>[>]+[->>]<<<<[[<<]<[<]+<<[+>+<<-[>-->+<<-[>+<[>>+<<-]]]>[<+>-]<]++>>-->[>]>>[>>]]<<[>>+<[[<]<]>[[<<]<[<]+[-<+>>-[<<+>++>-[<->[<<+>>-]]]<[>+<-]>]>[>]>]>[>>]>>]<<[>>+>>+>>]<<[->>>>>>>>]<<[>.>>>>>>>]<<[>->>>>>]<<[>,>>>]<<[>+>]<<[+<<]<]

速度测试

尝试了双层自解释器运行 Hello World 程序, 原本的解释器花了 2m18s, 使用 JIT 后花了 2.5s.

1
2

>>>+[[-]>>[-]++>+>+++++++[<++++>>++<-]++>>+>+>+++++[>++>++++++<<-]+>>>,<++[[>[->>]<[>>]<<-]<[<]<+>>[>]>[<+>-[[<+>-]>]<[[[-]<]++<-[<+++++++++>[<->-]>>]>>]]<<]<]<[[<]>[[>]>>[>>]+[<<]<[<]<+>>-]>[>]+[->>]<<<<[[<<]<[<]+<<[+>+<<-[>-->+<<-[>+<[>>+<<-]]]>[<+>-]<]++>>-->[>]>>[>>]]<<[>>+<[[<]<]>[[<<]<[<]+[-<+>>-[<<+>++>-[<->[<<+>>-]]]<[>+<-]>]>[>]>]>[>>]>>]<<[>>+>>+>>]<<[->>>>>>>>]<<[>.>>>>>>>]<<[>->>>>>]<<[>,>>>]<<[>+>]<<[+<<]<]
>>>+[[-]>>[-]++>+>+++++++[<++++>>++<-]++>>+>+>+++++[>++>++++++<<-]+>>>,<++[[>[->>]<[>>]<<-]<[<]<+>>[>]>[<+>-[[<+>-]>]<[[[-]<]++<-[<+++++++++>[<->-]>>]>>]]<<]<]<[[<]>[[>]>>[>>]+[<<]<[<]<+>>-]>[>]+[->>]<<<<[[<<]<[<]+<<[+>+<<-[>-->+<<-[>+<[>>+<<-]]]>[<+>-]<]++>>-->[>]>>[>>]]<<[>>+<[[<]<]>[[<<]<[<]+[-<+>>-[<<+>++>-[<->[<<+>>-]]]<[>+<-]>]>[>]>]>[>>]>>]<<[>>+>>+>>]<<[->>>>>>>>]<<[>.>>>>>>>]<<[>->>>>>]<<[>,>>>]<<[>+>]<<[+<<]<]!>++++++++[<+++++++++>-]<.>++++[<+++++++>-]<+.+++++++..+++.>>++++++[<+++++++>-]<++.------------.>++++++[<+++++++++>-]<+.<.+++.------.--------.>>>++++[<++++++++>-]<+.!

# 原本
real    2m18.136s
user    2m18.063s
sys     0m0.012s

# JIT
real	0m2.489s
user	0m0.010s
sys	    0m0.000s

在一个绘制曼德博分型的程序上我原来的解释器花了 3m2s, JIT 优化后只花了 1s.

# 原本
real    3m2.546s
user    3m2.396s
sys     0m0.014s

# JIT
real	0m1.157s
user	0m0.022s
sys	    0m0.151s

优化空间

目前的输入输出指令是直接使用 Linux 系统调用实现, 而且每次只读取一个字符, 系统调用的开销较大, 对于输入输出较多的程序可以事先读取并实现缓存.

软件研究=) > 其他

打赏支持

“请我吃糖果~o(*^ω^*)o”

Brainfuck JIT 简易实现

https://blog.lyc8503.site/post/bf-jit/

作者

lyc8503

发布于

2022年10月10日

许可协议

TCP/IP 协议栈实现 Dnet - Ep.0 前言&背景介绍上一篇

Spring Boot + React 写聊天软件 Ep.2 - 一些细节与技巧下一篇