<?xml version="1.0" encoding="UTF-8"?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>include &amp;mdash; All-Purpose Mat&#39;s Blog</title>
    <link>https://blog.allpurposem.at/tag:include</link>
    <description>Monthly-ish projects pushing the boundary of what&#39;s possible</description>
    <pubDate>Thu, 30 Apr 2026 20:12:52 +0200</pubDate>
    <item>
      <title>PythonPlusPlus: Bridging Worlds with Polyglot Code</title>
      <link>https://blog.allpurposem.at/pythonplusplus-bridging-worlds-with-polyglot-code</link>
      <description>&lt;![CDATA[Picture this: you find yourself immersed in a new job, knee-deep in a C++ codebase, yet your heart yearns for the simplicity and elegance of Python syntax. What do you do? You don&#39;t just conform – you innovate, and boldly submit this to code review:&#xA;&#xA;include &#34;pythonstart.h&#34;&#xA;&#xA;def greet(name):&#xA;    print(&#34;hello, &#34; + name + &#34;!&#34;)&#xA;    return&#xA;&#xA;def greet2(name):&#xA;    print(&#34;how are you, &#34; + name + &#34;?&#34;)&#xA;    return&#xA;&#xA;def bye():&#xA;    print(&#34;ok bye!&#34;)&#xA;    return&#xA;&#xA;include &#34;pythonmid.h&#34;&#xA;&#xA;username = &#34;Mat&#34;&#xA;&#xA;print(&#34;Hello from \&#34;Python\&#34;!&#34;)&#xA;&#xA;greet(username)&#xA;greet2(username)&#xA;print(&#34;getting ready to say bye...&#34;)&#xA;bye()&#xA;&#xA;include &#34;pythonend.h&#34;&#xA;!--more--&#xA;The first code review comes in, and it seems your contribution may be in jeopardy:&#xA;&#xA;  That&#39;s just Python code! It won&#39;t work with the rest of our C++ codebase!&#xA;&#xA;Before they can reject your code, you sharply interject:&#xA;&#xA;  Hey! Did you even test my code?&#xA;&#xA;With skepticism in the air, one of your brave teammates steps up and runs the code through a C++ compiler. To everyone&#39;s amazement, the result is identical to that of running it with Python! The code not only speaks Python, but fluently converses in C++:&#xA;&#xA;$ g++ Python.cpp &amp;&amp; ./a.out&#xA;Hello from &#34;Python&#34;!&#xA;hello, Mat!&#xA;how are you, Mat?&#xA;getting ready to say bye...&#xA;ok bye!&#xA;&#xA;The commit is eventually merged, and your unconventional approach not only saves your job but earns you a place in the annals of the team&#39;s most memorable code submissions. You are also banned from touching that codebase again.&#xA;&#xA;How it works: unraveling the enigma&#xA;&#xA;This kind of program is termed as &#34;polyglot,&#34; which literally means &#34;written in multiple languages.&#34; The entire idea of writing one of these relies on finding intersections between the two (or more!) languages&#39; syntax. In Python, a # signifies a comment, while in C++ (and C), it denotes a preprocessor directive. These lines are the key to the program. We&#39;ll see how the preprocessor works (and how we can abuse this!) in a little bit.&#xA;&#xA;You&#39;ll notice the only preprocessor directives in our code are #include statements. Unlike other languages, C and C++ opt for a simple but effective solution to calling external library functions: copy-paste. Seriously.&#xA;&#xA;When I write #include iostream at the top of a C++ file, what actually happens is the entire file called &#34;iostream&#34; (installed system-wide as part of the C++ standard library) gets pasted by the preprocessor, residing now where that #include statement once was. You don&#39;t technically have to use the #include directive to get a C++ program calling library functions: you can get the same behavior by just copying the file&#39;s contents manually at the top of your code (but that&#39;s a terrible idea!).&#xA;&#xA;For example, here are two C++ header files:&#xA;&#xA;preamble.h:&#xA;int main()&#xA;{&#xA;    int retVal = 0;&#xA;postlude.h:&#xA;}&#xA;And our beautifully readable code:&#xA;include &#34;preamble.h&#34;&#xA;&#xA;for(int i = 0; i &lt; 4; i++)&#xA;{&#xA;    retVal += 1;&#xA;}&#xA;return retVal;&#xA;&#xA;include &#34;postlude.h&#34;&#xA;&#xA;Let&#39;s run our code through the standalone C/C++ Preprocessor cpp:&#xA;$ cpp code.cpp&#xA;1 &#34;code.cpp&#34;&#xA;1 &#34;preamble.h&#34; 1&#xA;int main()&#xA;{&#xA;    int retVal = 0;&#xA;2 &#34;code.cpp&#34; 2&#xA;&#xA;for(int i = 0; i &lt; 4; i++)&#xA;{&#xA;    retVal += 1;&#xA;}&#xA;return retVal;&#xA;&#xA;1 &#34;postlude.h&#34; 1&#xA;}&#xA;10 &#34;code.cpp&#34; 2&#xA;&#xA;You can see the preprocessor outputs a bunch of lines starting with #. These are a kind of comment meant for us puny humans to understand exactly what the preprocessor did. The first number indicates the line number, and the string in quotes is the filename. The optional number at the end of the line represents a flag, where 1 means it&#39;s the start of an include, and 2 means we are returning to a file after an include is done. You can find the full docs here.&#xA;You can see we start at line 1 in code.cpp, which then includes preamble.h. The contents of preamble.h follow, and afterwards we return back to code.cpp. So on and so forth, finally copy-pasting together an amalgamate program that consists of a simple main function that returns 4.&#xA;&#xA;The preprocessor is a very powerful tool, and as long as the final text that is passed to the compiler is valid, anything goes!&#xA;&#xA;Let&#39;s break down the polyglot program from the start of the post:&#xA;&#xA;Functions&#xA;In Python, functions are defined as follows:&#xA;def greet(name):&#xA;    print(&#34;hello, &#34; + name + &#34;!&#34;)&#xA;&#xA;Somehow, we need to translate this into working C++ just through the preprocessor. Because Python allows declaring functions anywhere, but C++ does not, we can use a function pointer instead. C++ has a neat trick here called a lambda, which allows us to define unnamed functions inline, and it&#39;s perfect to have our pointer point to.&#xA;&#xA;Armed with this knowledge, we can use #define to create a macro that will turn def into auto (a special C++ keyword that deduces the type of a variable based on what&#39;s assigned to it), and another macro that turns greet(name) into our lambda definition:&#xA;define def auto &#xA;define greet(arg) greet =  {&#xA;&#xA;Applying this to our Python function from above gets us some of the way there&#xA;auto greet =  {:&#xA;    print(&#34;hello, &#34; + name + &#34;!&#34;)&#xA;&#xA;We still have to handle that pesky : that Python requires at the end of function declarations. Now, where does C++ have a :... aha! The revered ternary operator, that everybody totally loves! Its syntax is as follows: condition ? truthy : falsy. We don&#39;t care about the logic here, we just want that sweet : character, so we can add the most cursed ternary expression I&#39;ve ever written to the end of the greet macro:&#xA;define greet(arg) greet =  { false?false&#xA;&#xA;Running the preprocessor through our function, we get the following:&#xA;auto greet =  { false?false:&#xA;    print(&#34;hello, &#34; + name + &#34;!&#34;)&#xA;&#xA;That&#39;s some good progress! There&#39;s three main issues left:&#xA;the ternary operator is left hanging there. We need a &#34;falsy&#34; value for this thrilling and definitely-very-useful comparison to compile.&#xA;there&#39;s no print function in C++ (this project was conceived before std::print was added to C++23).&#xA;we need to close that dangling curly bracket and add a semicolon at the end of our lambda, somehow.&#xA;&#xA;Implementing the print function can be done with a simple function-style macro that just plops the argument into std::cout. This only works for simple prints, but I&#39;m not going for anything more here :)&#xA;&#xA;Additionally, we can knock the unfinished ternary issue out by adding a stray false; at the beginning. Usually this will just do nothing as it just gets discarded, but in the case that a print occurs right after a function definition, it will complete the ternary operator. Hooray! &#xA;&#xA;define print(a) false;std::cout &lt;&lt; (a) &lt;&lt; std::endl; &#xA;&#xA;Now for closing the function... there are no keywords left we can use here. I haven&#39;t found a way to make this work consistently without polluting the print macro with closing brackets that would cause it to break if used more than once or outside of a function. Thankfully, Python has a return keyword we can add without changing the behavior of the function:&#xA;def greet(name):&#xA;    print(&#34;hello, &#34; + name + &#34;!&#34;)&#xA;    return&#xA;&#xA;Then on the C++ side, we can redefine it to close our lambda!&#xA;define return return; };&#xA;&#xA;Finally, our simple function now preprocesses to this valid albeit cursed C++ code:&#xA;auto greet =  { false?false:&#xA;    false;std::cout &lt;&lt; (&#34;hello, &#34; + name + &#34;!&#34;) &lt;&lt; std::endl;&#xA;    return; };&#xA;&#xA;&#34;int main&#34;&#xA;&#xA;Here&#39;s the next bit we have to tackle, after the function definitions:&#xA;username = &#34;Mat&#34;&#xA;&#xA;print(&#34;Hello from \&#34;Python\&#34;!&#34;)&#xA;&#xA;greet(username)&#xA;greet2(username)&#xA;print(&#34;getting ready to say bye...&#34;)&#xA;bye()&#xA;  This code was given in one of my university courses to showcase the basics of Python. In it, we create a variable, call a couple functions, and call it a day.&#xA;&#xA;Python allows writing code willy-nilly outside of any function, but in C++ this is not exactly the case, especially if we need to call library functions. Our print statements must reside inside the main function. We can have our initial header (the one with all the function macros) also start the main function by adding a lone int main() { at the end of it. We also need a header at the end with the sole purpose of closing that opening bracket:&#xA;&#xA;pythonstart.h:&#xA;define greet(arg) greet =  { false?false&#xA;define print(a) false;std::cout &lt;&lt; (a) &lt;&lt; std::endl; &#xA;define return return; };&#xA;&#xA;// start the main function (will be closed by pythonend.h)&#xA;int main() {&#xA;&#xA;pythonend.h (thrilling):&#xA;}&#xA;&#xA;The code&#xA;&#xA;Looking at the first lines of the actual code, a lot of stuff is missing for it to work in C++:&#xA;username = &#34;Mat&#34;&#xA;print(&#34;Hello from \&#34;Python\&#34;!&#34;)&#xA;&#xA;The first obvious issue is that C++ requires types, while Python does not. We will need a pythonmid.h header to plop a std::string in there and so tell username its type:&#xA;&#xA;pythonmid.h:&#xA;define username std::string username&#xA;&#xA;Then, oh no! Our print macro-function inserts a stray false right after my string literal, causing a compile error! We must redefine print to remove the false prefix, but keep the lone semicolon as it can serve to punctuate the username declaration:&#xA;undef print&#xA;define print(a) ;std::cout &lt;&lt; (a) &lt;&lt; std::endl;&#xA;&#xA;Finally, the function calls:&#xA;greet(username)&#xA;greet2(username)&#xA;print(&#34;getting ready to say bye...&#34;)&#xA;bye()&#xA;&#xA;In short, every function must be redefined to expand into a call rather than a declaration, like so:&#xA;undef greet&#xA;define greet(name) greet(name);&#xA;&#xA;That&#39;s it!&#xA;And there we go! Here&#39;s the full &#34;Python&#34; file from the start of this post, put through the C++ preprocessor:&#xA;// -snip- the entire contents of the iostream and string C++ headers&#xA;2 &#34;pythonstart.h&#34; 2&#xA;15 &#34;pythonstart.h&#34;&#xA;&#xA;15 &#34;pythonstart.h&#34;&#xA;int main() {&#xA;4 &#34;Python.cpp&#34; 2&#xA;&#xA;auto greet =  { false?false:&#xA;    false;std::cout &lt;&lt; (&#34;hello, &#34; + name + &#34;!&#34;) &lt;&lt; std::endl;&#xA;    return; };&#xA;&#xA;auto greet2 =  { false?false:&#xA;    false;std::cout &lt;&lt; (&#34;how are you, &#34; + name + &#34;?&#34;) &lt;&lt; std::endl;&#xA;    return; };&#xA;&#xA;auto bye =  { false?false:&#xA;    false;std::cout &lt;&lt; (&#34;ok bye!&#34;) &lt;&lt; std::endl;&#xA;    return; };&#xA;&#xA;1 &#34;pythonmid.h&#34; 1&#xA;25 &#34;pythonmid.h&#34;&#xA;std::string&#xA;20 &#34;Python.cpp&#34; 2&#xA;&#xA;username = &#34;Mat&#34;&#xA;&#xA;;std::cout &lt;&lt; (&#34;Running \&#34;Python\&#34;!&#34;) &lt;&lt; std::endl;&#xA;&#xA;greet(username);&#xA;greet2(username);&#xA;;std::cout &lt;&lt; (&#34;getting ready to say bye...&#34;) &lt;&lt; std::endl;&#xA;bye();&#xA;&#xA;1 &#34;pythonend.h&#34; 1&#xA;}&#xA;31 &#34;Python.cpp&#34; 2&#xA;You can find the full sources on my Gitea.&#xA;&#xA;I hope this was a fun introduction to polyglot programming! It&#39;s usually filled with crazy hacks like these, and thus can be very fun whilst being immensely impractical, but believe me: it has its uses!&#xA;&#xA;While researching for a different project in 2020, I came across this perfect example: Cosmopolitan is a project that allows C programs to build to an &#34;actually portable executable&#34;: a file that runs simultaneously on Linux, MacOS, Windows, FreeBSD, OpenBSD, NetBSD, and can also directly boot from the BIOS. I recommend Justine&#39;s blog post for a fascinating read!&#xA;&#xA;---&#xD;&#xA;&#xD;&#xA;Thanks for reading! Feel free to contact me if you have any suggestions or comments.&#xD;&#xA;Find me on Mastodon and Matrix.&#xD;&#xA;&#xD;&#xA;You can follow the blog through:&#xD;&#xA;ActivityPub by inputting @mat@blog.allpurposem.at&#xD;&#xA;RSS/Atom: Copy this link into your reader: https://blog.allpurposem.at&#xD;&#xA;&#xD;&#xA;My website: https://allpurposem.at&#xD;&#xA;&#xD;&#xA;link rel=&#34;preload&#34; href=&#34;https://blog.allpurposem.at/lexend.woff2&#34; as=&#34;font&#34; type=&#34;font/woff2&#34; crossorigin=&#34;&#34;]]&gt;</description>
      <content:encoded><![CDATA[<p>Picture this: you find yourself immersed in a new job, knee-deep in a C++ codebase, yet your heart yearns for the simplicity and elegance of Python syntax. What do you do? You don&#39;t just conform – you innovate, and boldly submit this to code review:</p>

<pre><code class="language-python">#include &#34;pythonstart.h&#34;

def greet(name):
    print(&#34;hello, &#34; + name + &#34;!&#34;)
    return

def greet2(name):
    print(&#34;how are you, &#34; + name + &#34;?&#34;)
    return

def bye():
    print(&#34;ok bye!&#34;)
    return

#include &#34;pythonmid.h&#34;

username = &#34;Mat&#34;

print(&#34;Hello from \&#34;Python\&#34;!&#34;)

greet(username)
greet2(username)
print(&#34;getting ready to say bye...&#34;)
bye()

#include &#34;pythonend.h&#34;
</code></pre>



<p>The first code review comes in, and it seems your contribution may be in jeopardy:</p>

<blockquote><p>That&#39;s just Python code! It won&#39;t work with the rest of our C++ codebase!</p></blockquote>

<p>Before they can reject your code, you sharply interject:</p>

<blockquote><p>Hey! Did you even test my code?</p></blockquote>

<p>With skepticism in the air, one of your brave teammates steps up and runs the code through a C++ compiler. To everyone&#39;s amazement, the result is identical to that of running it with Python! The code not only speaks Python, but fluently converses in C++:</p>

<pre><code>$ g++ Python.cpp &amp;&amp; ./a.out
Hello from &#34;Python&#34;!
hello, Mat!
how are you, Mat?
getting ready to say bye...
ok bye!
</code></pre>

<p>The commit is eventually merged, and your unconventional approach not only saves your job but earns you a place in the annals of the team&#39;s most memorable code submissions. You are also banned from touching that codebase again.</p>

<h2 id="how-it-works-unraveling-the-enigma" id="how-it-works-unraveling-the-enigma">How it works: unraveling the enigma</h2>

<p>This kind of program is termed as “polyglot,” which literally means “written in multiple languages.” The entire idea of writing one of these relies on finding intersections between the two (or more!) languages&#39; syntax. In Python, a <code>#</code> signifies a comment, while in C++ (and C), it denotes a <em>preprocessor directive</em>. These lines are the key to the program. We&#39;ll see how the preprocessor works (and how we can abuse this!) in a little bit.</p>

<p>You&#39;ll notice the only preprocessor directives in our code are <code>#include</code> statements. Unlike other languages, C and C++ opt for a simple but effective solution to calling external library functions: copy-paste. Seriously.</p>

<p>When I write <code>#include &lt;iostream&gt;</code> at the top of a C++ file, what actually happens is the entire file called “iostream” (installed system-wide as part of the C++ standard library) gets pasted by the preprocessor, residing now where that <code>#include</code> statement once was. You don&#39;t technically have to use the <code>#include</code> directive to get a C++ program calling library functions: you can get the same behavior by just copying the file&#39;s contents manually at the top of your code (but that&#39;s a terrible idea!).</p>

<p>For example, here are two C++ header files:</p>

<p><code>preamble.h</code>:</p>

<pre><code class="language-cpp">int main()
{
    int retVal = 0;
</code></pre>

<p><code>postlude.h</code>:</p>

<pre><code class="language-cpp">}
</code></pre>

<p>And our beautifully readable code:</p>

<pre><code class="language-cpp">#include &#34;preamble.h&#34;

for(int i = 0; i &lt; 4; i++)
{
    retVal += 1;
}
return retVal;

#include &#34;postlude.h&#34;
</code></pre>

<p>Let&#39;s run our code through the standalone C/C++ Preprocessor <code>cpp</code>:</p>

<pre><code class="language-cpp">$ cpp code.cpp
# 1 &#34;code.cpp&#34;
# 1 &#34;preamble.h&#34; 1
int main()
{
    int retVal = 0;
# 2 &#34;code.cpp&#34; 2

for(int i = 0; i &lt; 4; i++)
{
    retVal += 1;
}
return retVal;

# 1 &#34;postlude.h&#34; 1
}
# 10 &#34;code.cpp&#34; 2
</code></pre>

<p>You can see the preprocessor outputs a bunch of lines starting with <code>#</code>. These are a kind of comment meant for us puny humans to understand exactly what the preprocessor did. The first number indicates the line number, and the string in quotes is the filename. The optional number at the end of the line represents a flag, where <code>1</code> means it&#39;s the start of an include, and <code>2</code> means we are returning to a file after an include is done. You can find the full docs <a href="https://gcc.gnu.org/onlinedocs/cpp/Preprocessor-Output.html">here</a>.
You can see we start at line 1 in <code>code.cpp</code>, which then includes <code>preamble.h</code>. The contents of <code>preamble.h</code> follow, and afterwards we return back to <code>code.cpp</code>. So on and so forth, finally copy-pasting together an amalgamate program that consists of a simple <code>main</code> function that returns 4.</p>

<p>The preprocessor is a very powerful tool, and as long as the final text that is passed to the compiler is valid, anything goes!</p>

<p>Let&#39;s break down the polyglot program from the start of the post:</p>

<h3 id="functions" id="functions">Functions</h3>

<p>In Python, functions are defined as follows:</p>

<pre><code class="language-py">def greet(name):
    print(&#34;hello, &#34; + name + &#34;!&#34;)
</code></pre>

<p>Somehow, we need to translate this into working C++ just through the preprocessor. Because Python allows declaring functions anywhere, but C++ does not, we can use a function <em>pointer</em> instead. C++ has a neat trick here called a lambda, which allows us to define unnamed functions inline, and it&#39;s perfect to have our pointer point to.</p>

<p>Armed with this knowledge, we can use <code>#define</code> to create a macro that will turn <code>def</code> into <code>auto</code> (a special C++ keyword that deduces the type of a variable based on what&#39;s assigned to it), and another macro that turns <code>greet(name)</code> into our lambda definition:</p>

<pre><code class="language-cpp">#define def auto 
#define greet(arg) greet = [](std::string arg) {
</code></pre>

<p>Applying this to our Python function from above gets us some of the way there</p>

<pre><code class="language-cpp">auto greet = [](std::string arg) {:
    print(&#34;hello, &#34; + name + &#34;!&#34;)
</code></pre>

<p>We still have to handle that pesky <code>:</code> that Python requires at the end of function declarations. Now, where does C++ have a <code>:</code>... aha! The revered ternary operator, that everybody totally loves! Its syntax is as follows: <code>condition ? truthy : falsy</code>. We don&#39;t care about the logic here, we just want that sweet <code>:</code> character, so we can add the most cursed ternary expression I&#39;ve ever written to the end of the <code>greet</code> macro:</p>

<pre><code class="language-cpp">#define greet(arg) greet = [](std::string arg) { false?false
</code></pre>

<p>Running the preprocessor through our function, we get the following:</p>

<pre><code class="language-cpp">auto greet = [](std::string name) { false?false:
    print(&#34;hello, &#34; + name + &#34;!&#34;)
</code></pre>

<p>That&#39;s some good progress! There&#39;s three main issues left:
– the ternary operator is left hanging there. We need a “falsy” value for this thrilling and definitely-very-useful comparison to compile.
– there&#39;s no <code>print</code> function in C++ (this project was conceived before <code>std::print</code> was added to C++23).
– we need to close that dangling curly bracket and add a semicolon at the end of our lambda, somehow.</p>

<p>Implementing the <code>print</code> function can be done with a simple function-style macro that just plops the argument into <code>std::cout</code>. This only works for simple prints, but I&#39;m not going for anything more here :)</p>

<p>Additionally, we can knock the unfinished ternary issue out by adding a stray <code>false;</code> at the beginning. Usually this will just do nothing as it just gets discarded, but in the case that a print occurs right after a function definition, it will complete the ternary operator. Hooray!</p>

<pre><code class="language-cpp">#define print(a) false;std::cout &lt;&lt; (a) &lt;&lt; std::endl; 
</code></pre>

<p>Now for closing the function... there are no keywords left we can use here. I haven&#39;t found a way to make this work consistently without polluting the <code>print</code> macro with closing brackets that would cause it to break if used more than once or outside of a function. Thankfully, Python has a <code>return</code> keyword we can add without changing the behavior of the function:</p>

<pre><code class="language-py">def greet(name):
    print(&#34;hello, &#34; + name + &#34;!&#34;)
    return
</code></pre>

<p>Then on the C++ side, we can redefine it to close our lambda!</p>

<pre><code class="language-cpp">#define return return; };
</code></pre>

<p>Finally, our simple function now preprocesses to this valid albeit cursed C++ code:</p>

<pre><code class="language-cpp">auto greet = [](std::string name) { false?false:
    false;std::cout &lt;&lt; (&#34;hello, &#34; + name + &#34;!&#34;) &lt;&lt; std::endl;
    return; };
</code></pre>

<h2 id="int-main" id="int-main">“int main”</h2>

<p>Here&#39;s the next bit we have to tackle, after the function definitions:</p>

<pre><code class="language-py">username = &#34;Mat&#34;

print(&#34;Hello from \&#34;Python\&#34;!&#34;)

greet(username)
greet2(username)
print(&#34;getting ready to say bye...&#34;)
bye()
</code></pre>

<blockquote><p>This code was given in one of my university courses to showcase the basics of Python. In it, we create a variable, call a couple functions, and call it a day.</p></blockquote>

<p>Python allows writing code willy-nilly outside of any function, but in C++ this is not exactly the case, especially if we need to call library functions. Our print statements <em>must</em> reside inside the <code>main</code> function. We can have our initial header (the one with all the function macros) also start the <code>main</code> function by adding a lone <code>int main() {</code> at the end of it. We also need a header at the end with the sole purpose of closing that opening bracket:</p>

<p><code>pythonstart.h</code>:</p>

<pre><code class="language-cpp">#define greet(arg) greet = [](std::string arg) { false?false
#define print(a) false;std::cout &lt;&lt; (a) &lt;&lt; std::endl; 
#define return return; };

// start the main function (will be closed by pythonend.h)
int main() {
</code></pre>

<p><code>pythonend.h</code> (thrilling):</p>

<pre><code class="language-cpp">}
</code></pre>

<h2 id="the-code" id="the-code">The code</h2>

<p>Looking at the first lines of the actual code, a lot of stuff is missing for it to work in C++:</p>

<pre><code class="language-py">username = &#34;Mat&#34;
print(&#34;Hello from \&#34;Python\&#34;!&#34;)
</code></pre>

<p>The first obvious issue is that C++ requires types, while Python does not. We will need a <code>pythonmid.h</code> header to plop a <code>std::string</code> in there and so tell <code>username</code> its type:</p>

<p><code>pythonmid.h</code>:</p>

<pre><code class="language-cpp">#define username std::string username
</code></pre>

<p>Then, oh no! Our <code>print</code> macro-function inserts a stray <code>false</code> right after my string literal, causing a compile error! We must redefine <code>print</code> to remove the <code>false</code> prefix, but keep the lone semicolon as it can serve to punctuate the <code>username</code> declaration:</p>

<pre><code class="language-cpp">#undef print
#define print(a) ;std::cout &lt;&lt; (a) &lt;&lt; std::endl;
</code></pre>

<p>Finally, the function calls:</p>

<pre><code class="language-cpp">greet(username)
greet2(username)
print(&#34;getting ready to say bye...&#34;)
bye()
</code></pre>

<p>In short, every function must be redefined to expand into a call rather than a declaration, like so:</p>

<pre><code class="language-cpp">#undef greet
#define greet(name) greet(name);
</code></pre>

<h2 id="that-s-it" id="that-s-it">That&#39;s it!</h2>

<p>And there we go! Here&#39;s the full “Python” file from the start of this post, put through the C++ preprocessor:</p>

<pre><code class="language-cpp">// -snip- the entire contents of the &lt;iostream&gt; and &lt;string&gt; C++ headers
# 2 &#34;pythonstart.h&#34; 2
# 15 &#34;pythonstart.h&#34;

# 15 &#34;pythonstart.h&#34;
int main() {
# 4 &#34;Python.cpp&#34; 2


auto greet = [](std::string name) { false?false:
    false;std::cout &lt;&lt; (&#34;hello, &#34; + name + &#34;!&#34;) &lt;&lt; std::endl;
    return; };

auto greet2 = [](std::string name) { false?false:
    false;std::cout &lt;&lt; (&#34;how are you, &#34; + name + &#34;?&#34;) &lt;&lt; std::endl;
    return; };

auto bye = []() { false?false:
    false;std::cout &lt;&lt; (&#34;ok bye!&#34;) &lt;&lt; std::endl;
    return; };

# 1 &#34;pythonmid.h&#34; 1
# 25 &#34;pythonmid.h&#34;
std::string
# 20 &#34;Python.cpp&#34; 2

username = &#34;Mat&#34;

;std::cout &lt;&lt; (&#34;Running \&#34;Python\&#34;!&#34;) &lt;&lt; std::endl;

greet(username);
greet2(username);
;std::cout &lt;&lt; (&#34;getting ready to say bye...&#34;) &lt;&lt; std::endl;
bye();

# 1 &#34;pythonend.h&#34; 1
}
# 31 &#34;Python.cpp&#34; 2
</code></pre>

<p>You can find the full sources on <a href="https://git.allpurposem.at/mat/PythonPlusPlus">my Gitea.</a></p>

<p>I hope this was a fun introduction to polyglot programming! It&#39;s usually filled with crazy hacks like these, and thus can be very fun whilst being immensely impractical, but believe me: it has its uses!</p>

<p>While researching for a different project in 2020, I came across this perfect example: <a href="https://justine.lol/ape.html">Cosmopolitan</a> is a project that allows C programs to build to an “actually portable executable”: a file that runs simultaneously on Linux, MacOS, Windows, FreeBSD, OpenBSD, NetBSD, and can also directly boot from the BIOS. I recommend Justine&#39;s blog post for a fascinating read!</p>

<hr>

<p>Thanks for reading! Feel free to contact me if you have any suggestions or comments.
Find me on <a href="https://allpurposem.at/link/mastodon">Mastodon</a> and <a href="https://allpurposem.at/link/matrix">Matrix</a>.</p>

<p>You can follow the blog through:
– ActivityPub by inputting <code><a href="https://blog.allpurposem.at/@/mat@blog.allpurposem.at" class="u-url mention">@<span>mat@blog.allpurposem.at</span></a></code>
– RSS/Atom: Copy this link into your reader: <code>https://blog.allpurposem.at</code></p>

<p>My website: <a href="https://allpurposem.at">https://allpurposem.at</a></p>

<p></p>
]]></content:encoded>
      <guid>https://blog.allpurposem.at/pythonplusplus-bridging-worlds-with-polyglot-code</guid>
      <pubDate>Sun, 28 Jan 2024 22:14:49 +0000</pubDate>
    </item>
    <item>
      <title>The vector::reserve fallacy</title>
      <link>https://blog.allpurposem.at/the-vector-reserve-fallacy</link>
      <description>&lt;![CDATA[While reading through some code I wrote for a raytracing assignment, I noticed a peculiar function that had never caused any issues, but really looked like it should. After asking a bunch of people, I present this blog post to you! &#xA;!--more--&#xA;&#xA;Ah, C++ standard containers. So delightfully intuitive to work with. The most versatile has to be std::vector, whose job is to wrap a dynamic &#34;C-style&#34; array and manage its capacity for us as we grow and shrink the vector&#39;s size. We can simply call pushback on the vector to add as many elements as we want, and the vector will grow its capacity when needed to fit our new elements.&#xA;&#xA;  If you understand how a std::vector works, feel free to skip to the code.&#xA;&#xA;But is it that simple?&#xA;&#xA;Resizing the vector&#39;s internal array is not cheap! It incurs allocating a whole new (bigger) block of memory, copying all the elements to it, and finally freeing the old block (note that this copy may be a move, see here). Because we add elements one by one, this would trigger a lot of resizes, as the vector keeps having to guess how many elements we plan to add and reallocating a bigger and bigger internal array every time we pushback past its capacity! So, a conforming std::vector implementation will usually try to get ahead of us and secretly allocate a bigger block when it sees we start pushing to it, and then it can just keep track of the size of the vector (how many elements we&#39;ve pushed to it) separately from its capacity (how many elements it can grow to before it needs to resize the internal array again).&#xA;&#xA;std::vector kindly exposes this internal functionality to us through some functions. For example, the capacity() function returns the current capacity of the vector&#39;s internal array. If we know the size it will grow up to ahead of time, we can use the reserve(sizetype capacity) function to have it pre-allocate this capacity for us. This avoids reallocating a lot when doing a bunch of pushbacks, which can let us gain a precious bit of performance (see the example here for some actual numbers).&#xA;&#xA;The code&#xA;&#xA;Now that we understand std::vector::reserve, let&#39;s take a look at some C++:&#xA;std::vectorint myVec{}; // create a vector of size 0&#xA;myVec.reserve(1); // reserve a capacity of 1&#xA;myVec[0] = 42; // write 42 to the first element of our empty(!!) vector&#xA;std::cout &lt;&lt; myVec[0];&#xA;&#xA;When run, the above prints 42. I hope I&#39;m not the only one who&#39;s surprised this works! I&#39;m overwriting the value of the first element in a vector... which has no elements. This is an out of bounds write, and should definitely not work.&#xA;Not only that, but on my machine I can replace index 0 with up to index 15187 and it still works fine! Index 15188 segfaults, though, so at least that&#39;s sane behavior (so long as I get far enough away from the start of the vector...).&#xA;So what the peck is going on??&#xA;&#xA;The peck (it&#39;s going on)&#xA;&#xA;Okay, okay, I&#39;ll say the thing. We&#39;ve found what in C++ is called &#34;undefined behavior&#34; (UB). This is a magical realm where anything could happen. Your computer might replace every window title with your username, or your program might send an order to all pizza restaurants in a 5km radius. If you&#39;re lucky, your program will just crash. More likely though, your code will do exactly what you intended it to do, and either subtly break something later on, or never signal anything on your machine... and break on someone else&#39;s.&#xA;&#xA;Why is this undefined behavior, you ask? We told our vector to reserve a size of 1, so 0 is a perfectly valid index in the its internal array. However, the C++ standard never states that vector should have an internal array! It only asks for vector implementations to be able to grow and shrink, and for reserve() to &#34;ensure a capacity&#34; up to which no reallocations need to happen.&#xA;&#xA;  NOTE: after lots of research (and asking the smart folks of the #include C++ community), I&#39;ve been unable to find an implementation where this does break. That doesn&#39;t mean it&#39;s okay to rely on this behavior! It&#39;s still UB!&#xA;&#xA;Why it works for us&#xA;&#xA;Despite this being undefined behavior, it works consistently in my program. Why is this?&#xA;When we run the line myVec0] = 42, the std::vector::operator[] function is called with an argument of 0, to return a reference to the location in memory at index 0 for this vector. Let&#39;s look at the [source code for this function in GCC&#39;s libstdc++ (which I used for my testing, though the same issue applies on clang and MSVC):&#xA;&#xA;/*&#xA; @brief  Subscript access to the data contained in the %vector.&#xA; @param _n The index of the element for which data should be&#xA; accessed.&#xA; @return  Read/write reference to data.&#xA;   This operator allows for easy, array-style, data access.&#xA; Note that data access with this operator is unchecked and&#xA; outofrange lookups are not defined. (For checked lookups&#xA; see at().)&#xA; /&#xA;GLIBCXXNODISCARD GLIBCXX20CONSTEXPR&#xA;reference&#xA;operator GLIBCXXNOEXCEPT&#xA;{&#xA;    glibcxxrequiressubscript(n);&#xA;    return (this-  Mimpl.Mstart + n);&#xA;}&#xA;&#xA;Looking past all the macros (the subscript thing expands to an empty line by default, we&#39;ll look into it later), this simply takes the pointer to the start of the internal array (Mimpl.Mstart), adds our argument n, and returns it as a reference. As long as Mstart points to some valid allocated address, we should be fine accessing it within bounds of the array (note, of course, that this is only true for this implementation of libstdc++! Other implementations may do different things; we&#39;re in UB-land here). This explains why our index outside of the vector&#39;s size worked: we&#39;re indexing the internal array, not the vector! As long as we call reserve on the vector first, and our index is within that reserved array&#39;s size the data should be perfectly okay being written to and read from an out-of-bounds-but-within-capacity index of a vector (on this specific version of GCC&#39;s libstdc++). If we remove the myVec.reserve(1) line, the program does crash as expected, since Mimpl.Mstart is not initialized and thus points to invalid memory.&#xA;&#xA;Array out of bounds&#xA;&#xA;The reason why accessing an index higher than the array&#39;s size works is covered here, but a tl;dr is that you are indeed overwriting memory you shouldn&#39;t be, and by chance nothing bad is happening. If we run it through the valgrind memory error detector, it indeed detects our error for any index outside the array. Here&#39;s the log for a write at index 1, after a call to reserve(1):&#xA;&#xA;Invalid write of size 4&#xA;   at 0x1091FC: main (ub.cpp:8)&#xA; Address 0x4e21084 is 0 bytes after a block of size 4 alloc&#39;d&#xA;   at 0x4841F11: operator new(unsigned long) (vgreplacemalloc.c:434)&#xA;   by 0x109825: std::newallocatorint::allocate(unsigned long, void const) (newallocator.h:147)&#xA;   by 0x109604: allocate (alloctraits.h:482)&#xA;   by 0x109604: std::Vectorbaseint, std::allocator&lt;int   ::Mallocate(unsigned long) (stlvector.h:378)&#xA;   by 0x1093FF: std::vectorint, std::allocator&lt;int   ::reserve(unsigned long) (vector.tcc:79)&#xA;   by 0x1091EA: main (ub.cpp:6)&#xA;&#xA;Let&#39;s dissect this output:&#xA;The first line indicates that we wrote 4 bytes somewhere that&#39;s &#34;invalid.&#34; That&#39;s the size of a 64-bit int, which is the type we&#39;re writing into index 1.&#xA;The big call stack tells us where the array that we&#39;re accessing out of bounds was allocated. The penultimate line points us to that std::vector::reserve call we made, which creates a &#34;block of size 4&#34; (the vector&#39;s internal array, with the capacity for a single 4-byte int).&#xA;&#xA;This indicates that we are indeed accessing the internal array out of bounds, and that it is a memory error that will cause UB even on this implementation of std::vector. So that answers that!&#xA;&#xA;Speed at the cost of safety&#xA;&#xA;Although on my GCC install, using this as actual storage &#34;works&#34; &#34;fine,&#34; it has... issues. When we try to do a range-based loop, it will never get the elements we wrote out of bounds. If the vector gets copied, it will only bring over the data within its size, and leave behind everything else. These kinds of issues would be super hard to diagnose had I not spotted the UB here!&#xA;&#xA;Shouldn&#39;t std::vector::operator[] warn us that we&#39;re accessing an element outside of the vector&#39;s size? Let&#39;s check the C++ standard on vector functions.&#xA;&#xA;  Only at() performs range checking. If the index is out of range, at() throws an outofrange exception. All other functions do not check.&#xA;&#xA;\- The C++ Standard Library: A Tutorial and Reference by Nicolai M. Josuttis (2012), pages 274-275&#xA;&#xA;Well, darn. I can understand why, though. When writing code in C++, we expect to have the lowest possible performance overhead, yet still get to use all these nice abstractions. Performing bounds checks, even if cheap, can really add up if we have to do it for every vector access. Changing it to at(0) does indeed print a (relatively) helpful crash message: &#xA;terminate called after throwing an instance of &#39;std::outofrange&#39;&#xA;  what():  vector::Mrangecheck: n (which is 1)   = this-  size() (which is 0)&#xA;&#xA;As I was writing this, an excellent relevant post by @saagar@saagarjha.com graced my Mastodon timeline:&#xA;&#xA;video controls src=&#34;https://federated.saagarjha.com/media/e0fb0d82-7cfe-4d6b-ba5b-a89c7c8d97d6/out.mov&#34;&#xA;Download the&#xA;  a href=&#34;https://federated.saagarjha.com/media/e0fb0d82-7cfe-4d6b-ba5b-a89c7c8d97d6/out.mov&#34;video./a&#xA;/video&#xA;Original source.&#xA;&#xA;That&#39;s not all, though! Remember that curious glibcxxrequiressubscript(_n); macro in the GCC implementation of operator[], which I said we&#39;d look at later? Now is before&#39;s later, so let&#39;s take a look at the definition:&#xA;ifndef GLIBCXXASSERTIONS&#xA;  # define glibcxxrequiressubscript(N)&#xA;else&#xA;  # define _glibcxxrequiressubscript(N)&#x9;\&#xA;  _glibcxxassert(N  this-size())&#xA;endif&#xA;&#xA;So it does* do something! You just have to have GLIBCXXASSERTIONS defined. Indeed, if we define that macro with the -DGLIBCXXASSERTIONS compiler flag, we get this wonderful totally-readable error when the code tries to index out of bounds:&#xA;/usr/include/c++/13.2.1/bits/stlvector.h:1125: std::vectorTp, Alloc::reference std::vectorTp, Alloc::operator [with Tp = int; Alloc = std::allocatorint; reference = int&amp;; sizetype = long unsigned int]: Assertion &#39;__n  this-size()&#39; failed.&#xA;Okay, it&#39;s no &#34;you&#39;re accessing this vector out of bounds, please stop,&#34; but it certainly is better than dealing with the potential mess of undefined behavior that awaits otherwise. I guess I&#39;ll be adding this flag to all my debug builds from now on!&#xA;&#xA;If you&#39;re curious, this is my original code where I found the issue.&#xA;&#xA;---&#xD;&#xA;&#xD;&#xA;Thanks for reading! Feel free to contact me if you have any suggestions or comments.&#xD;&#xA;Find me on Mastodon and Matrix.&#xD;&#xA;&#xD;&#xA;You can follow the blog through:&#xD;&#xA;ActivityPub by inputting @mat@blog.allpurposem.at&#xD;&#xA;RSS/Atom: Copy this link into your reader: https://blog.allpurposem.at&#xD;&#xA;&#xD;&#xA;My website: https://allpurposem.at&#xD;&#xA;&#xD;&#xA;link rel=&#34;preload&#34; href=&#34;https://blog.allpurposem.at/lexend.woff2&#34; as=&#34;font&#34; type=&#34;font/woff2&#34; crossorigin=&#34;&#34;]]&gt;</description>
      <content:encoded><![CDATA[<p>While reading through some code I wrote for a raytracing assignment, I noticed a peculiar function that had never caused any issues, but <em>really</em> looked like it should. After asking a bunch of people, I present this blog post to you!
</p>

<p>Ah, C++ standard containers. So delightfully intuitive to work with. The most versatile has to be <code>std::vector</code>, whose job is to wrap a dynamic “C-style” array and manage its <em>capacity</em> for us as we grow and shrink the vector&#39;s <em>size</em>. We can simply call <code>push_back</code> on the vector to add as many elements as we want, and the vector will grow its capacity when needed to fit our new elements.</p>

<blockquote><p>If you understand how a <code>std::vector</code> works, feel free to skip to <a href="#the-code">the code.</a></p></blockquote>

<h2 id="but-is-it-that-simple" id="but-is-it-that-simple">But is it that simple?</h2>

<p>Resizing the vector&#39;s internal array is not cheap! It incurs allocating a whole new (bigger) block of memory, copying all the elements to it, and finally freeing the old block (note that this copy may be a move, see <a href="http://stackoverflow.com/questions/10127603/why-does-reallocating-a-vector-copy-instead-of-moving-the-elements">here</a>). Because we add elements one by one, this would trigger a lot of resizes, as the vector keeps having to guess how many elements we plan to add and reallocating a bigger and bigger internal array every time we <code>push_back</code> past its capacity! So, a conforming <code>std::vector</code> implementation will usually try to get ahead of us and secretly allocate a bigger block when it sees we start pushing to it, and then it can just keep track of the <em>size</em> of the vector (how many elements we&#39;ve pushed to it) separately from its <em>capacity</em> (how many elements it can grow to before it needs to resize the internal array again).</p>

<p><code>std::vector</code> kindly exposes this internal functionality to us through some functions. For example, the <code>capacity()</code> function returns the current capacity of the vector&#39;s internal array. If we know the size it will grow up to ahead of time, we can use the <code>reserve(size_type capacity)</code> function to have it pre-allocate this capacity for us. This avoids reallocating a lot when doing a bunch of <code>push_back</code>s, which can let us gain a precious bit of performance (see the example <a href="https://www.codeproject.com/Articles/5425/An-In-Depth-Study-of-the-STL-Deque-Container#_Experiment2">here</a> for some actual numbers).</p>

<h2 id="the-code" id="the-code">The code</h2>

<p>Now that we understand <code>std::vector::reserve</code>, let&#39;s take a look at some C++:</p>

<pre><code class="language-cpp">std::vector&lt;int&gt; myVec{}; // create a vector of size 0
myVec.reserve(1); // reserve a capacity of 1
myVec[0] = 42; // write 42 to the first element of our empty(!!) vector
std::cout &lt;&lt; myVec[0];
</code></pre>

<p>When run, the above prints <code>42</code>. I hope I&#39;m not the only one who&#39;s surprised this works! I&#39;m overwriting the value of the first element in a vector... which has no elements. This is an out of bounds write, and should definitely not work.
Not only that, but on my machine I can replace index <code>0</code> with up to index <code>15187</code> and it still works fine! Index <code>15188</code> segfaults, though, so at least that&#39;s sane behavior (so long as I get far enough away from the start of the vector...).
So what the peck is going on??</p>

<h2 id="the-peck-it-s-going-on" id="the-peck-it-s-going-on">The peck (it&#39;s going on)</h2>

<p>Okay, okay, I&#39;ll say the thing. We&#39;ve found what in C++ is called “undefined behavior” (UB). This is a magical realm where anything could happen. Your computer might replace every window title with your username, or your program might send an order to all pizza restaurants in a 5km radius. If you&#39;re lucky, your program will just crash. More likely though, your code will do exactly what you intended it to do, and either subtly break something later on, or never signal anything on your machine... and break on someone else&#39;s.</p>

<p>Why is this undefined behavior, you ask? We told our vector to reserve a size of 1, so 0 is a perfectly valid index in the its internal array. However, the C++ standard never states that vector should have an internal array! It only asks for vector implementations to be able to grow and shrink, and for <code>reserve()</code> to “ensure a capacity” up to which no reallocations need to happen.</p>

<blockquote><p>NOTE: after lots of research (and asking the smart folks of the <a href="https://www.includecpp.org/"><a href="https://blog.allpurposem.at/tag:include" class="hashtag"><span>#</span><span class="p-category">include</span></a> C++ community</a>), I&#39;ve been unable to find an implementation where this does break. That doesn&#39;t mean it&#39;s okay to rely on this behavior! It&#39;s still UB!</p></blockquote>

<h3 id="why-it-works-for-us" id="why-it-works-for-us">Why it works for us</h3>

<p>Despite this being undefined behavior, it works consistently in my program. Why is this?
When we run the line <code>myVec[0] = 42</code>, the <code>std::vector::operator[]</code> function is called with an argument of 0, to return a reference to the location in memory at index 0 for this vector. Let&#39;s look at the <a href="https://gcc.gnu.org/onlinedocs/gcc-4.6.2/libstdc++/api/a01069_source.html#l00695">source code</a> for this function in GCC&#39;s libstdc++ (which I used for my testing, though the same issue applies on clang and MSVC):</p>

<pre><code class="language-cpp">/**
 *  @brief  Subscript access to the data contained in the %vector.
 *  @param __n The index of the element for which data should be
 *  accessed.
 *  @return  Read/write reference to data.
 *
 *  This operator allows for easy, array-style, data access.
 *  Note that data access with this operator is unchecked and
 *  out_of_range lookups are not defined. (For checked lookups
 *  see at().)
 */
_GLIBCXX_NODISCARD _GLIBCXX20_CONSTEXPR
reference
operator[](size_type __n) _GLIBCXX_NOEXCEPT
{
    __glibcxx_requires_subscript(__n);
    return *(this-&gt;_M_impl._M_start + __n);
}
</code></pre>

<p>Looking past all the macros (the subscript thing expands to an empty line by default, we&#39;ll look into it later), this simply takes the pointer to the start of the internal array (<code>_M_impl._M_start</code>), adds our argument <code>__n</code>, and returns it as a reference. As long as <code>_M_start</code> points to some valid allocated address, we should be fine accessing it within bounds of the array (note, of course, that this is only true for <strong>this</strong> implementation of libstdc++! Other implementations may do different things; we&#39;re in UB-land here). This explains why our index outside of the vector&#39;s size worked: we&#39;re indexing the internal array, not the vector! As long as we call <code>reserve</code> on the vector first, and our index is within that reserved array&#39;s size the data should be perfectly okay being written to and read from an out-of-bounds-but-within-capacity index of a vector (on this specific version of GCC&#39;s libstdc++). If we remove the <code>myVec.reserve(1)</code> line, the program does crash as expected, since <code>_M_impl._M_start</code> is not initialized and thus points to invalid memory.</p>

<h4 id="array-out-of-bounds" id="array-out-of-bounds">Array out of bounds</h4>

<p>The reason why accessing an index <em>higher</em> than the array&#39;s size works is covered <a href="https://stackoverflow.com/questions/1239938/accessing-an-array-out-of-bounds-gives-no-error-why">here</a>, but a tl;dr is that you are indeed overwriting memory you shouldn&#39;t be, and by chance nothing bad is happening. If we run it through the <code>valgrind</code> memory error detector, it indeed detects our error for any index outside the array. Here&#39;s the log for a write at index <code>1</code>, after a call to <code>reserve(1)</code>:</p>

<pre><code>Invalid write of size 4
   at 0x1091FC: main (ub.cpp:8)
 Address 0x4e21084 is 0 bytes after a block of size 4 alloc&#39;d
   at 0x4841F11: operator new(unsigned long) (vg_replace_malloc.c:434)
   by 0x109825: std::__new_allocator&lt;int&gt;::allocate(unsigned long, void const*) (new_allocator.h:147)
   by 0x109604: allocate (alloc_traits.h:482)
   by 0x109604: std::_Vector_base&lt;int, std::allocator&lt;int&gt; &gt;::_M_allocate(unsigned long) (stl_vector.h:378)
   by 0x1093FF: std::vector&lt;int, std::allocator&lt;int&gt; &gt;::reserve(unsigned long) (vector.tcc:79)
   by 0x1091EA: main (ub.cpp:6)
</code></pre>

<p>Let&#39;s dissect this output:
1. The first line indicates that we wrote 4 bytes somewhere that&#39;s “invalid.” That&#39;s the size of a 64-bit <code>int</code>, which is the type we&#39;re writing into index <code>1</code>.
2. The big call stack tells us where the array that we&#39;re accessing out of bounds was allocated. The penultimate line points us to that <code>std::vector::reserve</code> call we made, which creates a “block of size 4” (the vector&#39;s internal array, with the capacity for a single 4-byte <code>int</code>).</p>

<p>This indicates that we are indeed accessing the internal array out of bounds, and that it is a memory error that will cause UB even on this implementation of <code>std::vector</code>. So that answers that!</p>

<h2 id="speed-at-the-cost-of-safety" id="speed-at-the-cost-of-safety">Speed at the cost of safety</h2>

<p>Although on my GCC install, using this as actual storage “works” “fine,” it has... issues. When we try to do a range-based loop, it will never get the elements we wrote out of bounds. If the vector gets copied, it will only bring over the data within its size, and leave behind everything else. These kinds of issues would be super hard to diagnose had I not spotted the UB here!</p>

<p>Shouldn&#39;t <code>std::vector::operator[]</code> warn us that we&#39;re accessing an element outside of the vector&#39;s size? Let&#39;s check the C++ standard on vector functions.</p>

<blockquote><p>Only <code>at()</code> performs range checking. If the index is out of range, <code>at()</code> throws an <code>out_of_range</code> exception. All other functions do <em>not</em> check.</p></blockquote>

<p>- <em>The C++ Standard Library: A Tutorial and Reference</em> by Nicolai M. Josuttis (2012), pages 274-275</p>

<p>Well, darn. I can understand why, though. When writing code in C++, we expect to have the lowest possible performance overhead, yet still get to use all these nice abstractions. Performing bounds checks, even if cheap, can really add up if we have to do it for every vector access. Changing it to <code>at(0)</code> does indeed print a (relatively) helpful crash message:</p>

<pre><code class="language-cpp">terminate called after throwing an instance of &#39;std::out_of_range&#39;
  what():  vector::_M_range_check: __n (which is 1) &gt;= this-&gt;size() (which is 0)
</code></pre>

<p>As I was writing this, an excellent relevant post by <a href="https://blog.allpurposem.at/@/saagar@saagarjha.com" class="u-url mention">@<span>saagar@saagarjha.com</span></a> graced my Mastodon timeline:</p>

<p><video controls="" src="https://federated.saagarjha.com/media/e0fb0d82-7cfe-4d6b-ba5b-a89c7c8d97d6/out.mov">
Download the
  <a href="https://federated.saagarjha.com/media/e0fb0d82-7cfe-4d6b-ba5b-a89c7c8d97d6/out.mov">video.</a>
</video>
<a href="https://federated.saagarjha.com/notice/AbFvHsSx5mhPlyqABk">Original source.</a></p>

<p>That&#39;s not all, though! Remember that curious <code>__glibcxx_requires_subscript(__n);</code> macro in the GCC implementation of <code>operator[]</code>, which I said we&#39;d look at later? Now is before&#39;s later, so let&#39;s take a look at the definition:</p>

<pre><code class="language-cpp">#ifndef _GLIBCXX_ASSERTIONS
  # define __glibcxx_requires_subscript(_N)
#else
  # define __glibcxx_requires_subscript(_N)	\
  __glibcxx_assert(_N &lt; this-&gt;size())
#endif
</code></pre>

<p>So it <em>does</em> do something! You just have to have <code>_GLIBCXX_ASSERTIONS</code> defined. Indeed, if we define that macro with the <code>-D_GLIBCXX_ASSERTIONS</code> compiler flag, we get this wonderful totally-readable error when the code tries to index out of bounds:</p>

<pre><code class="language-cpp">/usr/include/c++/13.2.1/bits/stl_vector.h:1125: std::vector&lt;_Tp, _Alloc&gt;::reference std::vector&lt;_Tp, _Alloc&gt;::operator[](size_type) [with _Tp = int; _Alloc = std::allocator&lt;int&gt;; reference = int&amp;; size_type = long unsigned int]: Assertion &#39;__n &lt; this-&gt;size()&#39; failed.
</code></pre>

<p>Okay, it&#39;s no “you&#39;re accessing this vector out of bounds, please stop,” but it certainly is better than dealing with the potential mess of undefined behavior that awaits otherwise. I guess I&#39;ll be adding this flag to all my debug builds from now on!</p>

<p>If you&#39;re curious, <a href="https://git.allpurposem.at/mat/GraphicsProg1/src/commit/b3ef88189ee7d2bec2d1da08edbd6e2e84928496/source/DataTypes.h#L178">this</a> is my original code where I found the issue.</p>

<hr>

<p>Thanks for reading! Feel free to contact me if you have any suggestions or comments.
Find me on <a href="https://allpurposem.at/link/mastodon">Mastodon</a> and <a href="https://allpurposem.at/link/matrix">Matrix</a>.</p>

<p>You can follow the blog through:
– ActivityPub by inputting <code><a href="https://blog.allpurposem.at/@/mat@blog.allpurposem.at" class="u-url mention">@<span>mat@blog.allpurposem.at</span></a></code>
– RSS/Atom: Copy this link into your reader: <code>https://blog.allpurposem.at</code></p>

<p>My website: <a href="https://allpurposem.at">https://allpurposem.at</a></p>

<p></p>
]]></content:encoded>
      <guid>https://blog.allpurposem.at/the-vector-reserve-fallacy</guid>
      <pubDate>Fri, 27 Oct 2023 21:53:41 +0000</pubDate>
    </item>
    <item>
      <title>Adventures cross-compiling a Windows game engine</title>
      <link>https://blog.allpurposem.at/adventures-cross-compiling-a-windows-game-engine</link>
      <description>&lt;![CDATA[As part of my game development major at DAE, I have to work on several projects which were not made with support for my platform of choice (Linux). Thankfully, most of these have been simple frameworks wrapping around SDL and OpenGL, so my job was limited to rewriting the build system from Visual Studio&#39;s .sln project file to a cross-platform CMake project (and fixing some bugs along the way). Not too bad. I&#39;d miss the beginning of the first class, but was up and going shortly after. Among these were the first two semesters of Programming. Here&#39;s a list of school engines I have ported so far:&#xA;&#xA;Programming 1 &#34;SDL Framework&#34;: https://git.allpurposem.at/mat/SDL-Framework&#xA;Programming 2 &#34;GameDevEngine2&#34;: https://git.allpurposem.at/mat/GameDevEngine2&#xA;Graphics Programming &#34;RayTracer&#34;: https://git.allpurposem.at/mat/GraphicsProg1&#xA;Gameplay Programming &#34;FRAMEWORK&#34;: https://git.allpurposem.at/mat/GameplayProg&#xA;&#xA;The versatility of having a cross-platform project allowed me to add tons of niceties for some of these. The one I&#39;m most happy with is the &#34;GameDevEngine2&#34; framework from Programming 2, to which I added web support and ended up using it for my and 2FoamBoards&#39;s entry in the 2023 GMTK game jam.&#xA;&#xA;Programming 3&#xA;&#xA;I&#39;d been having it easy. A couple nonstandard Microsoft Visual C++ (MSVC) bits of syntax here, a couple win32 API calls (functions that are specific to Windows) there... I wasn&#39;t expecting what arrived in my downloads folder today. I applied my usual CMake boilerplate, with SDL support, hit run to see the perhaps 50-100 errors... and instead was greeted with a simple but effective singular error.&#xA;&#xA;apm@apg ~/S/Prog3 (main)  clang++ source/GameWinMain.cpp &#xA;In file included from source/GameWinMain.cpp:9:&#xA;source/GameWinMain.h:12:10: fatal error: &#39;windows.h&#39; file not found&#xA;include windows.h&#xA;         ^&#xA;1 error generated.&#xA;&#xA;Oh, no&#xA;There&#39;s no SDL. There&#39;s no OpenGL. No GLFW, Qt, or GTK. It&#39;s all bare Windows API calls. I think I was in some form of state of disbelief, as I spent the next 30 minutes slowly creating #defines and typedefs to patch in all the types. Maybe, just maybe, I could patch around the types and it would magically open a window and I could get started with my classwork. No such thing happened.&#xA;&#xA;!--more--&#xA;Options&#xA;&#xA;So: what are my options? Is this salvageable, without having to boot the dreaded virtual machine? Let&#39;s see... I could:&#xA;&#xA;continue patching around the 3-4k lines of win32 API calls like I was ineffectively doing before&#xA;rewrite the engine from scratch to support SDL&#xA;build the native .sln file by somehow running MSVC on WINE (a Windows compatibility layer for Linux)&#xA;cross-compile from Linux to Windows and run the .exe file with WINE&#xA;&#xA;Obviously the first two options would be preferable, as they don&#39;t come with a hard dependency on the unfamiliar world of WINE. However, they sadly also take the most time. I have not yet discarded the second option (the author of the engine gave me the green light to rewrite it for native Linux, and even use it in exams (that&#39;s a first!!)), but as I have to follow the class from the start, I think I&#39;ll be going with WINE.&#xA;&#xA;aur/msvc-wine-git&#xA;&#xA;Of course, I&#39;m not the first person to want to build a .sln project from Linux. This appears to be a solved problem, with the polished-looking msvc-wine toolchain available as a native package for my distro. So I went ahead and installed it:&#xA;&#xA;apm@apg ~/S/Prog3 (main)  gimme msvc-wine-git&#xA;[sudo] password for apm: &#xA;:: Resolving dependencies...&#xA;:: Calculating conflicts...&#xA;:: Calculating inner conflicts...&#xA;&#xA;Aur (1) msvc-wine-git-17.7.r4-2&#xA;&#xA;:: Proceed to review? [Y/n]: &#xA;&#xA;It diligently fetched MSVC, the Windows 11 SDK, and all the necessary components from Microsoft&#39;s servers, while I had time to read the documentation. I happened upon the CMake instructions, which is how I&#39;ve managed all my school-related projects so far, and it didn&#39;t stick in my brain. I don&#39;t intend to criticize the writing, but something about it being all the way in the bottom in a FAQ, with no code blocks or example commands, or having a class going on around me while I was doing this prevented me from understanding how I&#39;m supposed to use it. The only time I&#39;ve ever used a separate toolchain was Emscripten; it provides a nice little emcmake wrapper for CMake which takes care of a lot of the details for you. I gave it a few tries, but seeing I was getting nowhere, and every second was lost class time, I decided to move on to my last option.&#xA;&#xA;LLVM&#xA;&#xA;I knew a little about LLVM before this, from having used clangd as my language server for C++ projects. As I understand it, it&#39;s a group of compilers designed in such a way that the &#34;frontends&#34; (which read the text code and output an intermediate language) and &#34;backends&#34; (read intermediate language and output the final binary) are swappable and interchangeable. This means you can use the same backend to compile both C++ and Rust code, while still getting equally well-optimized machine code out the other side. I enlisted the help of @JohnyTheCarrot@toot.community, who I knew has worked with clang before. He told me about the concept of an &#34;LLVM triple&#34;, which is a setting for LLVM compilers that tells it what sort of machine you want it to output code for. Crucially, you can specify a triplet for a completely different system than your own, and it should still work. I tried the following command:&#xA;clang++ -target x8664-w64-mingw32 source/.cpp -o game&#xA;&#xA;This currently outputs 227 linker errors. I know there were many syntax-related compiler errors which I&#39;ve since fixed, but it does get us past the dreaded #include windows.h! All of the linker errors take the following form:&#xA;/usr/bin/x8664-w64-mingw32-ld: /tmp/GameEngine-ac27d8.o:GameEngine.cpp:(.text+0xc95f): undefined reference to `impDeleteObject&#xA;&#xA;Fun with the linker &#xA;&#xA;Each of these is related to a call of a Windows-related function. It looks like we&#39;re missing the libraries! Adding the -mwindows flag tells Clang it&#39;s compiling &amp; linking a GUI Windows app, instead of a command line one. This causes linking against a lot of win32 GUI-related functions, reducing the linker errors to a mere 9. There&#39;s two kinds:&#xA;&#xA;_impAlphaBlend and _impTransparentBlt&#xA;According to the code, these are used for transparency. I have yet to use this engine, but from the names I&#39;m guessing they allow for drawing semi-opaque images on top of each other and blend the colors together. According to Microsoft&#39;s documentation, these are located in Msimg32.dll.&#xA;&#xA;_impmciSendStringA&#xA;These are functions from the defunct Multimedia Control Interface (that&#39;s the mci at the start of the name!), which this engine uses to play audio. Microsoft helpfully kept the legacy documentation online, informing me that these belong to Winmm.dll.&#xA;&#xA;At first, I assumed I&#39;d have to get these from a copy of Windows. However, I remembered WINE has a lot of open source reimplementations of these DLLs (Windows&#39;s version of .so shared libraries), and sure enough locate msimg32.dll (note the lowercase: I wasted some time with this because Linux is case sensitive, while Windows is not!) pointed me straight to a DLL I could yoink. I added it to the list of files to compile, and the msimg32-related linker errors were gone. Hooray!&#xA;&#xA;...or so I thought. I excitedly copied in winmm.dll and tried to compile...&#xA;clang-16: error: unable to execute command: Segmentation fault (core dumped)&#xA;clang-16: error: linker command failed due to signal (use -v to see invocation)&#xA;&#xA;Excuse me?? The linker is segfaulting?? To be honest, I have no idea whether this is an actual bug in LLVM&#39;s linker, but it sure did stump me for a while. I thought maybe my copy of winmm.dll was corrupt, or WINE did something weird with it. I went as far as downloading Microsoft&#39;s version of the DLL, but was met with the same sad message. What could I be possibly doing wrong?&#xA;&#xA;Oh. I&#39;m not supposed to be copying the DLLs into here, am I? The last time I used a linker without going through CMake, I was passing libraries to it was -llibname. But it can&#39;t be that easy for this... can it? It&#39;d have to go to my default WINE prefix to fetch them, which sounds plain weird. Libraries come from system paths, not user-specific folders. Well, might be worth a try anyways...&#xA;&#xA;apm@apg ~/S/P/build (main)  clang++ -mwindows -target x8664-w64-mingw32 ../source/.cpp -o game -lmsimg32 -lwinmm&#xA;In file included from ../source/GameWinMain.cpp:10:&#xA;../source/GameEngine.h:19:9: warning: &#39;WIN32WINNT&#39; macro redefined [-Wmacro-redefined]&#xA;define WIN32WINNT 0x0A00                             // Windows 10&#xA;        ^&#xA;/usr/x8664-w64-mingw32/include/mingw.h:239:9: note: previous definition is here&#xA;define WIN32WINNT 0xa00&#xA;        ^&#xA;1 warning generated.&#xA;Warning: corrupt .drectve at end of def file&#xA;Warning: corrupt .drectve at end of def file&#xA;Warning: corrupt .drectve at end of def file&#xA;apm@apg ~/S/P/build (main)  ls&#xA;game.exe&#xA;&#xA;wait*. That built?? HUH???? There&#39;s no way it--&#xA;apm@apg ~/S/P/build (main)  ./game.exe&#xA;-snip-&#xA;0130:err:module:importdll Library libgccsseh-1.dll (which is needed by L&#34;Z:\\home\\apm\\School\\Prog3\\build\\game.exe&#34;) not found&#xA;0130:err:module:importdll Library libstdc++-6.dll (which is needed by L&#34;Z:\\home\\apm\\School\\Prog3\\build\\game.exe&#34;) not found&#xA;0130:err:module:LdrInitializeThunk Importing dlls for L&#34;Z:\\home\\apm\\School\\Prog3\\build\\game.exe&#34; failed, status c0000135&#xA;&#xA;Right. Not so fast, heh. Still, this is great news! I don&#39;t know how or why this works, but we&#39;re linking to the DLLs somehow somewhere. WINE can&#39;t find some mingw32 libraries which were pulled in by -mwindows, but we can easily point it to them with export WINEPATH=&#34;/usr/x8664-w64-mingw32/bin&#34;&#xA;&#xA;And that&#39;s it! Here&#39;s the engine in all its glory, with audio support and all! It&#39;s beautiful...&#xA;&#xA;A screenshot of a completely black window with many lines of warnings from WINE behind it&#xA;&#xA;Right, there&#39;s nothing built on it yet. It&#39;s just a blank canvas. But hey, it doesn&#39;t crash!&#xA;&#xA;What&#39;s next?&#xA;&#xA;Having this run through WINE does come with a few limitations:&#xA;&#xA;All WINE apps take a long while to launch, though you can vastly improve this by running wineserver --persistent beforehand.&#xA;Usually, I attach gdb (the GNU debugger) to my code from my IDE, neovim. However, with this program running under WINE, I don&#39;t know how I would do that. Debugging remains an unsolved mystery (EDIT: see Addendum, I figured it out!).&#xA;WINE is slowly merging Wayland support, but at the moment it runs under X11, meaning I&#39;m sacrificing some performance and convenience.&#xA;Finally, of course, this will never have Linux support. I don&#39;t like that.&#xA;&#xA;Long-term, depending on the course workload and how complex the engine functions end up being, I think I will rewrite it in SDL. This will have the added bonus of enabling, like with my other engine ports, web support (see my Programming 2 end project here and a game jam game made in the same engine here). However, I think this will take longer than I think is reasonable to spend while procrastinating on other classes, so I&#39;m leaving it here. I wrote down my process while it was still fresh in my mind, so I hope this was an interesting read! As always, any and all constructive feedback is welcome directed to me: @mat@mastodon.gamedev.place .&#xA;&#xA;I am considering writing up my general porting process in a separate blog post, so perhaps expect that next!&#xA;&#xA;---&#xA;&#xA;Addendum&#xA;&#xA;After doing some additional research, and asking around in the very helpful WineHQ IRC room, I found a way to get debugging working! The first step is adding the -g flag to the clang++ invocation, which tells clang we want it to generate debug information (namely source maps, so the debugger can show which line of code we&#39;re at). Then I simply have to run winedbg --gdb game.exe, and I am presented with a (nearly) full-featured gdb prompt!&#xA;&#xA;A screenshot of a gdb interface showing source code of a WinMain function which runs the game engine&#xA;&#xA;I&#39;m unsure how to hook this up to neovim (maybe I can look into the Debug Adapter Protocol for this?), but for now just having a gdb environment is awesome enough. Unto more adventures!&#xA;&#xA;---&#xD;&#xA;&#xD;&#xA;Thanks for reading! Feel free to contact me if you have any suggestions or comments.&#xD;&#xA;Find me on Mastodon and Matrix.&#xD;&#xA;&#xD;&#xA;You can follow the blog through:&#xD;&#xA;ActivityPub by inputting @mat@blog.allpurposem.at&#xD;&#xA;RSS/Atom: Copy this link into your reader: https://blog.allpurposem.at&#xD;&#xA;&#xD;&#xA;My website: https://allpurposem.at&#xD;&#xA;&#xD;&#xA;link rel=&#34;preload&#34; href=&#34;https://blog.allpurposem.at/lexend.woff2&#34; as=&#34;font&#34; type=&#34;font/woff2&#34; crossorigin=&#34;&#34;]]&gt;</description>
      <content:encoded><![CDATA[<p>As part of my game development major at DAE, I have to work on several projects which were not made with support for my platform of choice (Linux). Thankfully, most of these have been simple frameworks wrapping around SDL and OpenGL, so my job was limited to rewriting the build system from Visual Studio&#39;s <code>.sln</code> project file to a cross-platform CMake project (and fixing some bugs along the way). Not too bad. I&#39;d miss the beginning of the first class, but was up and going shortly after. Among these were the first two semesters of Programming. Here&#39;s a list of school engines I have ported so far:</p>
<ol><li>Programming 1 “SDL Framework”: <a href="https://git.allpurposem.at/mat/SDL-Framework">https://git.allpurposem.at/mat/SDL-Framework</a></li>
<li>Programming 2 “GameDevEngine2”: <a href="https://git.allpurposem.at/mat/GameDevEngine2">https://git.allpurposem.at/mat/GameDevEngine2</a></li>
<li>Graphics Programming “RayTracer”: <a href="https://git.allpurposem.at/mat/GraphicsProg1">https://git.allpurposem.at/mat/GraphicsProg1</a></li>
<li>Gameplay Programming “_FRAMEWORK”: <a href="https://git.allpurposem.at/mat/GameplayProg">https://git.allpurposem.at/mat/GameplayProg</a></li></ol>

<p>The versatility of having a cross-platform project allowed me to add tons of niceties for some of these. The one I&#39;m most happy with is the “GameDevEngine2” framework from Programming 2, to which I added web support and ended up using it for my and 2FoamBoards&#39;s <a href="https://2foamboards.itch.io/murder">entry in the 2023 GMTK game jam</a>.</p>

<h2 id="programming-3" id="programming-3">Programming 3</h2>

<p>I&#39;d been having it easy. A couple nonstandard Microsoft Visual C++ (MSVC) bits of syntax here, a couple win32 API calls (functions that are specific to Windows) there... I wasn&#39;t expecting what arrived in my downloads folder today. I applied my usual CMake boilerplate, with SDL support, hit run to see the perhaps 50-100 errors... and instead was greeted with a simple but effective singular error.</p>

<pre><code class="language-cpp">apm@apg ~/S/Prog3 (main)&gt; clang++ source/GameWinMain.cpp 
In file included from source/GameWinMain.cpp:9:
source/GameWinMain.h:12:10: fatal error: &#39;windows.h&#39; file not found
#include &lt;windows.h&gt;
         ^~~~~~~~~~~
1 error generated.
</code></pre>

<h3 id="oh-no" id="oh-no">Oh, no</h3>

<p>There&#39;s no SDL. There&#39;s no OpenGL. No GLFW, Qt, or GTK. It&#39;s <em>all</em> bare Windows API calls. I think I was in some form of state of disbelief, as I spent the next 30 minutes slowly creating <code>#define</code>s and <code>typedefs</code> to patch in all the types. Maybe, just maybe, I could patch around the types and it would magically open a window and I could get started with my classwork. No such thing happened.</p>



<h2 id="options" id="options">Options</h2>

<p>So: what are my options? Is this salvageable, without having to boot the dreaded virtual machine? Let&#39;s see... I could:</p>
<ul><li>continue patching around the 3-4k lines of win32 API calls like I was ineffectively doing before</li>
<li>rewrite the engine from scratch to support SDL</li>
<li>build the native <code>.sln</code> file by somehow running MSVC on WINE (a Windows compatibility layer for Linux)</li>
<li>cross-compile from Linux to Windows and run the <code>.exe</code> file with WINE</li></ul>

<p>Obviously the first two options would be preferable, as they don&#39;t come with a hard dependency on the unfamiliar world of WINE. However, they sadly also take the most time. I have not yet discarded the second option (the author of the engine gave me the green light to rewrite it for native Linux, and even use it in exams (that&#39;s a first!!)), but as I have to follow the class from the start, I think I&#39;ll be going with WINE.</p>

<h3 id="aur-msvc-wine-git" id="aur-msvc-wine-git"><code>aur/msvc-wine-git</code></h3>

<p>Of course, I&#39;m not the first person to want to build a <code>.sln</code> project from Linux. This appears to be a solved problem, with the polished-looking <a href="https://github.com/mstorsjo/msvc-wine">msvc-wine</a> toolchain available as a native package for my distro. So I went ahead and installed it:</p>

<pre><code class="language-ini">apm@apg ~/S/Prog3 (main)&gt; gimme msvc-wine-git
[sudo] password for apm: 
:: Resolving dependencies...
:: Calculating conflicts...
:: Calculating inner conflicts...

Aur (1) msvc-wine-git-17.7.r4-2

:: Proceed to review? [Y/n]: 
</code></pre>

<p>It diligently fetched MSVC, the Windows 11 SDK, and all the necessary components from Microsoft&#39;s servers, while I had time to read the documentation. I happened upon the CMake instructions, which is how I&#39;ve managed all my school-related projects so far, and it didn&#39;t stick in my brain. I don&#39;t intend to criticize the writing, but something about it being all the way in the bottom in a FAQ, with no code blocks or example commands, or having a class going on around me while I was doing this prevented me from understanding how I&#39;m supposed to use it. The only time I&#39;ve ever used a separate toolchain was <a href="https://emscripten.org">Emscripten</a>; it provides a nice little <code>emcmake</code> wrapper for CMake which takes care of a lot of the details for you. I gave it a few tries, but seeing I was getting nowhere, and every second was lost class time, I decided to move on to my last option.</p>

<h2 id="llvm" id="llvm">LLVM</h2>

<p>I knew a little about <a href="https://llvm.org">LLVM</a> before this, from having used <code>clangd</code> as my language server for C++ projects. As I understand it, it&#39;s a group of compilers designed in such a way that the “frontends” (which read the text code and output an intermediate language) and “backends” (read intermediate language and output the final binary) are swappable and interchangeable. This means you can use the same backend to compile both C++ and Rust code, while still getting equally well-optimized machine code out the other side. I enlisted the help of <a href="https://blog.allpurposem.at/@/JohnyTheCarrot@toot.community" class="u-url mention">@<span>JohnyTheCarrot@toot.community</span></a>, who I knew has worked with <code>clang</code> before. He told me about the concept of an “LLVM triple”, which is a setting for LLVM compilers that tells it what sort of machine you want it to output code for. Crucially, you can specify a triplet for a completely different system than your own, and it <em>should</em> still work. I tried the following command:</p>

<pre><code class="language-bash">clang++ -target x86_64-w64-mingw32 source/*.cpp -o game
</code></pre>

<p>This currently outputs 227 linker errors. I know there were many syntax-related compiler errors which I&#39;ve since fixed, but it does get us past the dreaded <code>#include windows.h</code>! All of the linker errors take the following form:</p>

<pre><code class="language-cpp">/usr/bin/x86_64-w64-mingw32-ld: /tmp/GameEngine-ac27d8.o:GameEngine.cpp:(.text+0xc95f): undefined reference to `__imp_DeleteObject
</code></pre>

<h3 id="fun-with-the-linker" id="fun-with-the-linker">Fun with the linker</h3>

<p>Each of these is related to a call of a Windows-related function. It looks like we&#39;re missing the libraries! Adding the <code>-mwindows</code> flag tells Clang it&#39;s compiling &amp; linking a GUI Windows app, instead of a command line one. This causes linking against a lot of win32 GUI-related functions, reducing the linker errors to a mere 9. There&#39;s two kinds:</p>
<ul><li><p><code>__imp_AlphaBlend</code> and <code>__imp_TransparentBlt</code>
According to the code, these are used for transparency. I have yet to use this engine, but from the names I&#39;m guessing they allow for drawing semi-opaque images on top of each other and blend the colors together. According to Microsoft&#39;s documentation, these are located in <code>Msimg32.dll</code>.</p></li>

<li><p><code>__imp_mciSendStringA</code>
These are functions from the defunct <a href="https://en.wikipedia.org/wiki/Media_Control_Interface">Multimedia Control Interface</a> (that&#39;s the <code>mci</code> at the start of the name!), which this engine uses to play audio. Microsoft helpfully kept the legacy documentation online, informing me that these belong to <code>Winmm.dll</code>.</p></li></ul>

<p>At first, I assumed I&#39;d have to get these from a copy of Windows. However, I remembered WINE has a lot of open source reimplementations of these DLLs (Windows&#39;s version of <code>.so</code> shared libraries), and sure enough <code>locate msimg32.dll</code> (note the lowercase: I wasted some time with this because Linux is case sensitive, while Windows is not!) pointed me straight to a DLL I could yoink. I added it to the list of files to compile, and the <code>msimg32</code>-related linker errors were gone. Hooray!</p>

<p>...or so I thought. I excitedly copied in <code>winmm.dll</code> and tried to compile...</p>

<pre><code class="language-toml">clang-16: error: unable to execute command: Segmentation fault (core dumped)
clang-16: error: linker command failed due to signal (use -v to see invocation)
</code></pre>

<p>Excuse me?? The <em>linker</em> is segfaulting?? To be honest, I have no idea whether this is an actual bug in LLVM&#39;s linker, but it sure did stump me for a while. I thought maybe my copy of <code>winmm.dll</code> was corrupt, or WINE did something weird with it. I went as far as downloading Microsoft&#39;s version of the DLL, but was met with the same sad message. What could I be possibly doing wrong?</p>

<p><strong><em>Oh.</em></strong> I&#39;m not supposed to be copying the DLLs into here, am I? The last time I used a linker without going through CMake, I was passing libraries to it was <code>-l&lt;libname&gt;</code>. But it can&#39;t be that easy for this... can it? It&#39;d have to go to my default WINE prefix to fetch them, which sounds plain weird. Libraries come from system paths, not user-specific folders. Well, might be worth a try anyways...</p>

<pre><code class="language-ini">apm@apg ~/S/P/build (main)&gt; clang++ -mwindows -target x86_64-w64-mingw32 ../source/*.cpp -o game -lmsimg32 -lwinmm
In file included from ../source/GameWinMain.cpp:10:
../source/GameEngine.h:19:9: warning: &#39;_WIN32_WINNT&#39; macro redefined [-Wmacro-redefined]
#define _WIN32_WINNT 0x0A00                             // Windows 10
        ^
/usr/x86_64-w64-mingw32/include/_mingw.h:239:9: note: previous definition is here
#define _WIN32_WINNT 0xa00
        ^
1 warning generated.
Warning: corrupt .drectve at end of def file
Warning: corrupt .drectve at end of def file
Warning: corrupt .drectve at end of def file
apm@apg ~/S/P/build (main)&gt; ls
game.exe*
</code></pre>

<p><em>wait</em>. That built?? HUH???? There&#39;s no way it—</p>

<pre><code class="language-ini">apm@apg ~/S/P/build (main)&gt; ./game.exe
-snip-
0130:err:module:import_dll Library libgcc_s_seh-1.dll (which is needed by L&#34;Z:\\home\\apm\\School\\Prog3\\build\\game.exe&#34;) not found
0130:err:module:import_dll Library libstdc++-6.dll (which is needed by L&#34;Z:\\home\\apm\\School\\Prog3\\build\\game.exe&#34;) not found
0130:err:module:LdrInitializeThunk Importing dlls for L&#34;Z:\\home\\apm\\School\\Prog3\\build\\game.exe&#34; failed, status c0000135
</code></pre>

<p>Right. Not so fast, heh. Still, this is great news! I don&#39;t know how or why this works, but we&#39;re linking to the DLLs somehow somewhere. WINE can&#39;t find some mingw32 libraries which were pulled in by <code>-mwindows</code>, but we can easily point it to them with <code>export WINEPATH=&#34;/usr/x86_64-w64-mingw32/bin&#34;</code></p>

<p>And that&#39;s it! Here&#39;s the engine in all its glory, with audio support and all! It&#39;s beautiful...</p>

<p><img src="https://allpurposem.at/blog/prog3-engine.png" alt="A screenshot of a completely black window with many lines of warnings from WINE behind it"></p>

<p>Right, there&#39;s nothing built on it yet. It&#39;s just a blank canvas. But hey, it doesn&#39;t crash!</p>

<h2 id="what-s-next" id="what-s-next">What&#39;s next?</h2>

<p>Having this run through WINE does come with a few limitations:</p>
<ul><li>All WINE apps take a long while to launch, though you can vastly improve this by running <code>wineserver --persistent</code> beforehand.</li>
<li>Usually, I attach <code>gdb</code> (the GNU debugger) to my code from my IDE, neovim. However, with this program running under WINE, I don&#39;t know how I would do that. Debugging remains an unsolved mystery (EDIT: see Addendum, I figured it out!).</li>
<li>WINE is slowly merging Wayland support, but at the moment it runs under X11, meaning I&#39;m sacrificing some performance and convenience.</li>
<li>Finally, of course, this will never have Linux support. I don&#39;t like that.</li></ul>

<p>Long-term, depending on the course workload and how complex the engine functions end up being, I think I will rewrite it in SDL. This will have the added bonus of enabling, like with my other engine ports, web support (see my Programming 2 end project <a href="https://allpurposem.at/tdyd.html">here</a> and a game jam game made in the same engine <a href="https://2foamboards.itch.io/murder">here</a>). However, I think this will take longer than I think is reasonable to spend while procrastinating on other classes, so I&#39;m leaving it here. I wrote down my process while it was still fresh in my mind, so I hope this was an interesting read! As always, any and all constructive feedback is welcome directed to me: <a href="https://blog.allpurposem.at/@/mat@mastodon.gamedev.place" class="u-url mention">@<span>mat@mastodon.gamedev.place</span></a> .</p>

<p>I am considering writing up my general porting process in a separate blog post, so perhaps expect that next!</p>

<hr>

<h2 id="addendum" id="addendum">Addendum</h2>

<p>After doing some additional research, and asking around in the very helpful <a href="https://www.winehq.org/irc">WineHQ IRC</a> room, I found a way to get debugging working! The first step is adding the <code>-g</code> flag to the <code>clang++</code> invocation, which tells clang we want it to generate debug information (namely source maps, so the debugger can show which line of code we&#39;re at). Then I simply have to run <code>winedbg --gdb game.exe</code>, and I am presented with a (nearly) full-featured gdb prompt!</p>

<p><img src="https://allpurposem.at/blog/prog3-windbg.png" alt="A screenshot of a gdb interface showing source code of a WinMain function which runs the game engine"></p>

<p>I&#39;m unsure how to hook this up to neovim (maybe I can look into the Debug Adapter Protocol for this?), but for now just having a gdb environment is awesome enough. Unto more adventures!</p>

<hr>

<p>Thanks for reading! Feel free to contact me if you have any suggestions or comments.
Find me on <a href="https://allpurposem.at/link/mastodon">Mastodon</a> and <a href="https://allpurposem.at/link/matrix">Matrix</a>.</p>

<p>You can follow the blog through:
– ActivityPub by inputting <code><a href="https://blog.allpurposem.at/@/mat@blog.allpurposem.at" class="u-url mention">@<span>mat@blog.allpurposem.at</span></a></code>
– RSS/Atom: Copy this link into your reader: <code>https://blog.allpurposem.at</code></p>

<p>My website: <a href="https://allpurposem.at">https://allpurposem.at</a></p>

<p></p>
]]></content:encoded>
      <guid>https://blog.allpurposem.at/adventures-cross-compiling-a-windows-game-engine</guid>
      <pubDate>Thu, 21 Sep 2023 20:20:48 +0000</pubDate>
    </item>
  </channel>
</rss>