PythonPlusPlus: Bridging Worlds with Polyglot Code

Picture this: you find yourself immersed in a new job, knee-deep in a C++ codebase, yet your heart yearns for the simplicity and elegance of Python syntax. What do you do? You don't just conform – you innovate, and boldly submit this to code review:

#include "pythonstart.h"

def greet(name):
    print("hello, " + name + "!")
    return

def greet2(name):
    print("how are you, " + name + "?")
    return

def bye():
    print("ok bye!")
    return

#include "pythonmid.h"

username = "Mat"

print("Hello from \"Python\"!")

greet(username)
greet2(username)
print("getting ready to say bye...")
bye()

#include "pythonend.h"

The first code review comes in, and it seems your contribution may be in jeopardy:

That's just Python code! It won't work with the rest of our C++ codebase!

Before they can reject your code, you sharply interject:

Hey! Did you even test my code?

With skepticism in the air, one of your brave teammates steps up and runs the code through a C++ compiler. To everyone's amazement, the result is identical to that of running it with Python! The code not only speaks Python, but fluently converses in C++:

$ g++ Python.cpp && ./a.out
Hello from "Python"!
hello, Mat!
how are you, Mat?
getting ready to say bye...
ok bye!

The commit is eventually merged, and your unconventional approach not only saves your job but earns you a place in the annals of the team's most memorable code submissions. You are also banned from touching that codebase again.

How it works: unraveling the enigma

This kind of program is termed as “polyglot,” which literally means “written in multiple languages.” The entire idea of writing one of these relies on finding intersections between the two (or more!) languages' syntax. In Python, a # signifies a comment, while in C++ (and C), it denotes a preprocessor directive. These lines are the key to the program. We'll see how the preprocessor works (and how we can abuse this!) in a little bit.

You'll notice the only preprocessor directives in our code are #include statements. Unlike other languages, C and C++ opt for a simple but effective solution to calling external library functions: copy-paste. Seriously.

When I write #include <iostream> at the top of a C++ file, what actually happens is the entire file called “iostream” (installed system-wide as part of the C++ standard library) gets pasted by the preprocessor, residing now where that #include statement once was. You don't technically have to use the #include directive to get a C++ program calling library functions: you can get the same behavior by just copying the file's contents manually at the top of your code (but that's a terrible idea!).

For example, here are two C++ header files:

preamble.h:

int main()
{
    int retVal = 0;

postlude.h:

}

And our beautifully readable code:

#include "preamble.h"

for(int i = 0; i < 4; i++)
{
    retVal += 1;
}
return retVal;

#include "postlude.h"

Let's run our code through the standalone C/C++ Preprocessor cpp:

$ cpp code.cpp
# 1 "code.cpp"
# 1 "preamble.h" 1
int main()
{
    int retVal = 0;
# 2 "code.cpp" 2

for(int i = 0; i < 4; i++)
{
    retVal += 1;
}
return retVal;

# 1 "postlude.h" 1
}
# 10 "code.cpp" 2

You can see the preprocessor outputs a bunch of lines starting with #. These are a kind of comment meant for us puny humans to understand exactly what the preprocessor did. The first number indicates the line number, and the string in quotes is the filename. The optional number at the end of the line represents a flag, where 1 means it's the start of an include, and 2 means we are returning to a file after an include is done. You can find the full docs here. You can see we start at line 1 in code.cpp, which then includes preamble.h. The contents of preamble.h follow, and afterwards we return back to code.cpp. So on and so forth, finally copy-pasting together an amalgamate program that consists of a simple main function that returns 4.

The preprocessor is a very powerful tool, and as long as the final text that is passed to the compiler is valid, anything goes!

Let's break down the polyglot program from the start of the post:

Functions

In Python, functions are defined as follows:

def greet(name):
    print("hello, " + name + "!")

Somehow, we need to translate this into working C++ just through the preprocessor. Because Python allows declaring functions anywhere, but C++ does not, we can use a function pointer instead. C++ has a neat trick here called a lambda, which allows us to define unnamed functions inline, and it's perfect to have our pointer point to.

Armed with this knowledge, we can use #define to create a macro that will turn def into auto (a special C++ keyword that deduces the type of a variable based on what's assigned to it), and another macro that turns greet(name) into our lambda definition:

#define def auto 
#define greet(arg) greet = [](std::string arg) {

Applying this to our Python function from above gets us some of the way there

auto greet = [](std::string arg) {:
    print("hello, " + name + "!")

We still have to handle that pesky : that Python requires at the end of function declarations. Now, where does C++ have a :... aha! The revered ternary operator, that everybody totally loves! Its syntax is as follows: condition ? truthy : falsy. We don't care about the logic here, we just want that sweet : character, so we can add the most cursed ternary expression I've ever written to the end of the greet macro:

#define greet(arg) greet = [](std::string arg) { false?false

Running the preprocessor through our function, we get the following:

auto greet = [](std::string name) { false?false:
    print("hello, " + name + "!")

That's some good progress! There's three main issues left: – the ternary operator is left hanging there. We need a “falsy” value for this thrilling and definitely-very-useful comparison to compile. – there's no print function in C++ (this project was conceived before std::print was added to C++23). – we need to close that dangling curly bracket and add a semicolon at the end of our lambda, somehow.

Implementing the print function can be done with a simple function-style macro that just plops the argument into std::cout. This only works for simple prints, but I'm not going for anything more here :)

Additionally, we can knock the unfinished ternary issue out by adding a stray false; at the beginning. Usually this will just do nothing as it just gets discarded, but in the case that a print occurs right after a function definition, it will complete the ternary operator. Hooray!

#define print(a) false;std::cout << (a) << std::endl; 

Now for closing the function... there are no keywords left we can use here. I haven't found a way to make this work consistently without polluting the print macro with closing brackets that would cause it to break if used more than once or outside of a function. Thankfully, Python has a return keyword we can add without changing the behavior of the function:

def greet(name):
    print("hello, " + name + "!")
    return

Then on the C++ side, we can redefine it to close our lambda!

#define return return; };

Finally, our simple function now preprocesses to this valid albeit cursed C++ code:

auto greet = [](std::string name) { false?false:
    false;std::cout << ("hello, " + name + "!") << std::endl;
    return; };

“int main”

Here's the next bit we have to tackle, after the function definitions:

username = "Mat"

print("Hello from \"Python\"!")

greet(username)
greet2(username)
print("getting ready to say bye...")
bye()

This code was given in one of my university courses to showcase the basics of Python. In it, we create a variable, call a couple functions, and call it a day.

Python allows writing code willy-nilly outside of any function, but in C++ this is not exactly the case, especially if we need to call library functions. Our print statements must reside inside the main function. We can have our initial header (the one with all the function macros) also start the main function by adding a lone int main() { at the end of it. We also need a header at the end with the sole purpose of closing that opening bracket:

pythonstart.h:

#define greet(arg) greet = [](std::string arg) { false?false
#define print(a) false;std::cout << (a) << std::endl; 
#define return return; };

// start the main function (will be closed by pythonend.h)
int main() {

pythonend.h (thrilling):

}

The code

Looking at the first lines of the actual code, a lot of stuff is missing for it to work in C++:

username = "Mat"
print("Hello from \"Python\"!")

The first obvious issue is that C++ requires types, while Python does not. We will need a pythonmid.h header to plop a std::string in there and so tell username its type:

pythonmid.h:

#define username std::string username

Then, oh no! Our print macro-function inserts a stray false right after my string literal, causing a compile error! We must redefine print to remove the false prefix, but keep the lone semicolon as it can serve to punctuate the username declaration:

#undef print
#define print(a) ;std::cout << (a) << std::endl;

Finally, the function calls:

greet(username)
greet2(username)
print("getting ready to say bye...")
bye()

In short, every function must be redefined to expand into a call rather than a declaration, like so:

#undef greet
#define greet(name) greet(name);

That's it!

And there we go! Here's the full “Python” file from the start of this post, put through the C++ preprocessor:

// -snip- the entire contents of the <iostream> and <string> C++ headers
# 2 "pythonstart.h" 2
# 15 "pythonstart.h"

# 15 "pythonstart.h"
int main() {
# 4 "Python.cpp" 2


auto greet = [](std::string name) { false?false:
    false;std::cout << ("hello, " + name + "!") << std::endl;
    return; };

auto greet2 = [](std::string name) { false?false:
    false;std::cout << ("how are you, " + name + "?") << std::endl;
    return; };

auto bye = []() { false?false:
    false;std::cout << ("ok bye!") << std::endl;
    return; };

# 1 "pythonmid.h" 1
# 25 "pythonmid.h"
std::string
# 20 "Python.cpp" 2

username = "Mat"

;std::cout << ("Running \"Python\"!") << std::endl;

greet(username);
greet2(username);
;std::cout << ("getting ready to say bye...") << std::endl;
bye();

# 1 "pythonend.h" 1
}
# 31 "Python.cpp" 2

You can find the full sources on my Gitea.

I hope this was a fun introduction to polyglot programming! It's usually filled with crazy hacks like these, and thus can be very fun whilst being immensely impractical, but believe me: it has its uses!

While researching for a different project in 2020, I came across this perfect example: Cosmopolitan is a project that allows C programs to build to an “actually portable executable”: a file that runs simultaneously on Linux, MacOS, Windows, FreeBSD, OpenBSD, NetBSD, and can also directly boot from the BIOS. I recommend Justine's blog post for a fascinating read!


Thanks for reading! Feel free to contact me if you have any suggestions or comments. Find me on Mastodon and Matrix.

You can follow the blog through: – ActivityPub by inputting @mat@blog.allpurposem.at – RSS/Atom: Copy this link into your reader: https://blog.allpurposem.at

My website: https://allpurposem.at