Jekyll2024-01-21T18:15:39+00:00http://kenny-peng.com/feed.xmljust for context by Kenny PengA blog about my personal projects and the rich backgrounds they rely on
Kenny Pengknypng44@gmail.comRebuilding ESP32-fluid-simulation: the advection and force steps of the sim task (Part 4)2024-01-20T00:00:00+00:002024-01-20T00:00:00+00:00http://kenny-peng.com/2024/01/20/esp32_fluid_sim_4<p>If you’ve read <a href="/2023/07/30/esp32_fluid_sim_2.html">Part 2</a> and <a href="/2023/09/22/esp32_fluid_sim_3.html">Part 3</a> already, then you’re as equipped to read this part as I can make you. You’ve already heard me mention that we should be passing in touch inputs, consisting of locations of velocities. You’ve also already heard that we’re getting out color arrays. Some mechanism should be turning the former into the latter, and it should be broadly inspired by the physics, which we had written out as partial differential equations. This post and the next post—the final ones—are about that mechanism. To be precise, this post covers everything but the pressure step, and the next will give it its own airtime.</p>
<p>With that said, if I miss anything, the references I used might be more helpful. That’s primarily the <a href="https://developer.nvidia.com/gpugems/gpugems/part-vi-beyond-triangles/chapter-38-fast-fluid-dynamics-simulation-gpu">GPU Gems chapter</a> and <a href="https://jamie-wong.com/2016/08/05/webgl-fluid-simulation/">Jamie Wong’s blog post</a>, but there’s also Stam’s <a href="https://damassets.autodesk.net/content/dam/autodesk/www/autodesk-reasearch/Publications/pdf/realtime-fluid-dynamics-for.pdf">“Realtime Fluid Dynamics for Games”</a> and <a href="https://dl.acm.org/doi/pdf/10.1145/311535.311548">“Stable Fluids”</a>.</p>
<p>Now, to tell you what I’m gonna tell you, a high-level overview is this:</p>
<ol>
<li>apply “semi-Lagrangian advection” to the velocities,</li>
<li>apply the user’s input to the velocities,</li>
<li>calculate the “divergence-free projection” of the velocities—here making use of the so-called pressure term to do it—and finally,</li>
<li>apply “semi-Lagrangian advection” to the density array with the updated velocities.</li>
</ol>
<p>The process has four parts, and each part corresponds to a part of the physics. Let’s recall the partial differential equations that we ended up with in Part 3, that is:</p>
\[\frac{\partial \rho}{\partial t} = - (\bold v \cdot \nabla) \rho\]
\[\frac{\partial \bold v}{\partial t} = - (\bold v \cdot \nabla) \bold v - \frac{1}{\rho} \nabla p + \bold f\]
\[\nabla \cdot \bold v = 0\]
<p>Besides the incompressibility constraint, $\nabla \cdot \bold v = 0$, the equations can be split into four terms. That’s one term for each part of the process. To list them in the order of their corresponding steps, there’s the advection of the velocity $-(\bold v \cdot \nabla) \bold v$, the applied force $\bold f$, the so-called pressure $- \frac{1}{\rho} \nabla p$, and the advection of the density $-(\bold v \cdot \nabla) \rho$.</p>
<p>Before we get into each term and its corresponding part of the process, there’s a key piece of context to keep in mind. We’re faced with the definitions of $\frac{\partial \rho}{\partial t}$ and $\frac{\partial \bold v}{\partial t}$ here, and they have solutions which are density and velocity fields that evolve over time. That’s not computable. Computers can’t do operations on fields—the functions of continuous space that they are—much less ones that continuously vary over time. Instead, time and space need to be “discretized”.</p>
<p>Let’s tackle the discretization of time first. Continuous time can be approximated by a <em>sequence</em> of points in time. In the simplest case, those points in time are regularly spaced apart by a single timestep $\Delta t$, in other words being the sequence $0$, $\Delta t$, $2 \Delta t$, $3 \Delta t$, and so on. That’s the structure we’ll take. (In other cases, the spacing can be <em>irregular</em>, being dynamically optimized for faster overall execution, but that’s out-of-scope.) The result is a field at some time $t_0$ that can be approximately expressed in terms of the field at the <em>previous</em> time $t_0 - \Delta t$. That is, we could calculate an <em>update</em> to the fields. You may see how this is useful for running simulations. This general idea is called “numerical integration”, the simplest case being <a href="https://en.wikipedia.org/wiki/Euler_method">Euler’s method</a>—yes, that Euler’s method, if you still remember it. (In other cases, methods like <a href="https://en.wikipedia.org/wiki/Backward_Euler_method">implicit Euler</a>, <a href="https://en.wikipedia.org/wiki/Runge%E2%80%93Kutta_methods">Runge-Kutta</a>, and <a href="https://en.wikipedia.org/wiki/Leapfrog_integration">leapfrog integration</a> offer better accuracy and/or stability, but that’s again out-of-scope.)</p>
<p>Now, let’s tackle the discretization of space. Continuous space can be approximated by a mesh of points, each point taking on the value of the field there. In the simplest case, that mesh is a regular grid. Remember that fields are functions of location, and so the value of a field at a single point is a single scalar or vector. Combining this with the use of a grid, we get the incredibly convenient fact that discretized fields can be expressed as an <em>array of values</em>. For every value in some array <code class="language-plaintext highlighter-rouge">f[i, j]</code>, there is a corresponding point on the grid $(x_i, y_j)$. This discretization is the one Stam went with, and for that reason, it’s the one used here.</p>
<figure>
<img src="/images/2024-1-20-figure1.png" alt="On the left, a surface plot of x squared plus y squared, and on the right a grid filled with numbers, namely the values of x squared plus y squared at integer values of x and y" />
<figcaption>Using discretization with a grid, an array of the field's values can stand for the field itself. Grids are defined by their grid lengths $\Delta x$ and $\Delta y$. In this example, $\Delta x = \Delta y = 1$ is a special case where the field is evaluated at integer values of $x$ and $y$.</figcaption>
</figure>
<div class="note-panel">
<p>Side note: it’s a fair question to ask here why <code class="language-plaintext highlighter-rouge">f[i, j]</code> doesn’t correspond to—say—$(x_j, y_i)$ instead. Why does <code class="language-plaintext highlighter-rouge">i</code> select the horizontal component and not the vertical one? This is continuing from my discussion on the correspondence in my last post. The answer is that you <em>could</em> go about it that way and then derive a different but entirely consistent discretization. In fact, I originally had it that way. However, I switched out of that to keep all the expressions looking like how they do in the literature. So, in short, it’s convention.</p>
<p>Second side note: this is not to say that the array is a <em>matrix</em>. The array is only two-dimensional because the space is two-dimensional. If the space was three-dimensional, then so would be the array. And forget about arrays if the mesh isn’t a grid! So, most matrix operations wouldn’t mean anything either. It’d be more correct to think of discretized fields as very long vectors, but we’re encroaching on a next-post matter now.</p>
</div>
<p><!-- div class="note-panel" --></p>
<p>Anyway, a key result of discretizing space is that the differential operators can be approximated by differences (i.e. subtraction) between the values of the field at a point and its neighbors. Furthermore, using a grid makes these differences incredibly simple by turning $\frac{\partial}{\partial x}$ into the value of the right neighbor minus the value of the left neighbor and $\frac{\partial}{\partial y}$ into the top minus the bottom. The methods that use this fact are “finite difference methods”, and the pressure step is one such method, but we’ll go into more detail on that in the next post.</p>
<p>So, to sum up this “just for context” moment, to compute an (approximate) solution to the presented partial differential equations, we need two levels of discretization. First, we need to discretize time, turning it into a scheme of updating the density and velocity fields repeatedly. Then, we need to discretize space to make the update computable. All this is because computers cannot handle functions of continuous time nor functions of continuous space, let alone functions of both like an evolving field. Now, all this is quite abstract, and that’s because each part invokes the discretization of time and space <em>slightly differently</em>, and we’ll go into the details of each.</p>
<p>With all that said, in the face of our definitions of $\frac{\partial \rho}{\partial t}$ and $\frac{\partial \bold v}{\partial t}$, this generally means that the density/concentration field (which I’m currently just calling the density field out of expediency) and the velocity field become just density and velocity arrays, and we must calculate their updates. In this situation, we update the arrays <em>term by term</em>, hence why each step of the overall process corresponds to a single term. (Though, I’m not sure if the implicit assumption of independence between the terms that underlies going term by term is just an expedient approximation or our math-given right. Anyway…) Let’s go over the four parts, step by step.</p>
<p>The first step is the “semi-Lagrangian advection” of the velocities, implementing the $-(\bold v \cdot \nabla) \bold v$ term. A key highlight here: Stam’s treatment of the advection term is <em>not</em> a finite differences method, yet it still uses discretization with a grid! I’d also like to highlight a bit of how Stam arrived at this method, though the GPU Gems chapter and Wong’s blog post would more succinctly jump to the end result. Now, I can’t do justice to the entire derivation. With that said, if you ever move on the reading Stam’s “Stable Fluids”, you’d find that Stam’s formal analysis involves a technique called a “method of characteristics”. It’s got a whole proof, but I’d just say that it looks like this: at every point $\bold x$ (that’s the coordinate vector $\bold x$, if you remember from <a href="/2023/07/30/esp32_fluid_sim_2.html">Part 2</a>), there is a particle that arrived there from somewhere. Letting $\bold{p}(\bold x, t)$ be its path—where the current location is given as $\bold{p}(\bold x, t_0) = \bold x$—then $\bold{p}(\bold x, t_0 - \Delta t)$ is where the particle was in the previous time.</p>
<figure>
<img src="/images/2024-1-20-figure2.png" alt="In orange, a velocity field as a vector plot. In blue, a path of a particle that follows the velocity field. In black, a point on the path that represents the position of the particle at time t_0. In grey, a point on the path that represents where the particle was previously at time t_0 minus Delta t." />
<figcaption>Given some velocity field, the path of a particle and its locations at time $t_0$ and $t_0 - \Delta t$</figcaption>
</figure>
<p>As a result, the particle must have carried its properties along the way, and one of them is said to be momentum, in other words, velocity. Therefore, an advection update looks like the assignment of the field value at $\bold{p}(\bold x, t_0 - \Delta t)$ to the field at $\bold x$. This result directly falls out of the assumptions that Stam presents (and for which I boorishly presented a picture instead), and it can be written as the following:</p>
\[\bold{v}_\text{advect}(\bold x) = \bold{v}(\bold{p}(\bold x, t_0 - \Delta t))\]
<p>You may notice that Stam is presenting a unique time discretization here. You may also notice that it’s not computable yet because we’re missing a discretization of space. Of course, Stam presented one in “Stable Fluids” too. For starters, the calculation of $\bold{v}_\text{advect}$ can be done at just the points on the grid. From there, reading would show that Stam used a Runge-Kutta back-tracing on the velocity field to find $\bold{p}(\bold x, t_0 - \Delta t)$. I won’t get into how that works, and I won’t have to in a moment. Anyway, the found point almost certainly doesn’t coincide with a point on the grid, so Stam used an approximation of the velocity there, $\bold{v}(\bold{p}(\bold x, t_0 - \Delta t))$, by <a href="https://en.wikipedia.org/wiki/Bilinear_interpolation#Application_in_image_processing">“bilinearly interpolating”</a> between the four closest velocity values.</p>
<!-- Give a mathematical description of linear interpolation actually? -->
<figure>
<img src="/images/2024-1-20-figure3.png" alt="Four arrows on the corners of a square on the grid, each pointed in different directions. Dashed lines connect the top two arrows and the bottom two arrows. Points dot halfway on the dashed lines. On the top point, an arrow points in the direction of the top two arrows' average. On the bottom point, an arrow points the direction of the bottom two arrows' average. Another dashed line connects the points on the dashed line. Less than half-way on the dashed line, there is a point and an arrow that points in the weighted average of the arrows on the dashed lines, with more weight given to the top arrow." />
</figure>
<p>For more information on that, see the above link to Wikipedia. It’s got a better explanation of bilinear interpolation than one I can make—diagrams included. With that said, bilinear interpolation also amounts to very little code.</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">template</span><span class="o"><</span><span class="k">class</span> <span class="nc">T</span><span class="p">></span>
<span class="n">T</span> <span class="nf">billinear_interpolate</span><span class="p">(</span><span class="kt">float</span> <span class="n">di</span><span class="p">,</span> <span class="kt">float</span> <span class="n">dj</span><span class="p">,</span> <span class="n">T</span> <span class="n">p11</span><span class="p">,</span> <span class="n">T</span> <span class="n">p12</span><span class="p">,</span> <span class="n">T</span> <span class="n">p21</span><span class="p">,</span> <span class="n">T</span> <span class="n">p22</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">T</span> <span class="n">x1</span><span class="p">,</span> <span class="n">x2</span><span class="p">,</span> <span class="n">interpolated</span><span class="p">;</span>
<span class="n">x1</span> <span class="o">=</span> <span class="n">p11</span><span class="o">*</span><span class="p">(</span><span class="mi">1</span><span class="o">-</span><span class="n">dj</span><span class="p">)</span><span class="o">+</span><span class="n">p12</span><span class="o">*</span><span class="n">dj</span><span class="p">;</span> <span class="c1">// interp between lower-left and upper-left</span>
<span class="n">x2</span> <span class="o">=</span> <span class="n">p21</span><span class="o">*</span><span class="p">(</span><span class="mi">1</span><span class="o">-</span><span class="n">dj</span><span class="p">)</span><span class="o">+</span><span class="n">p22</span><span class="o">*</span><span class="n">dj</span><span class="p">;</span> <span class="c1">// interp between lower-right and upper-right</span>
<span class="n">interpolated</span> <span class="o">=</span> <span class="n">x1</span><span class="o">*</span><span class="p">(</span><span class="mi">1</span><span class="o">-</span><span class="n">di</span><span class="p">)</span><span class="o">+</span><span class="n">x2</span><span class="o">*</span><span class="n">di</span><span class="p">;</span> <span class="c1">// interp between left and right</span>
<span class="k">return</span> <span class="n">interpolated</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Though, a fair question to ask here: what should we do if the backtrace sends us to the boundary of the domain, or even beyond it? This is an important question because what we should do here directly depends on what the boundary is <em>physically</em>. In our case, the boundary is a solid wall. Here, I turn to the GPU Gems article, where it’s written that the “no-slip” condition hence applies, which just means the velocity there dmust be zero.</p>
<p>The no-slip condition can be implemented inside the bilinear interpolation scheme.</p>
<p>For now, let’s focus on the bottom row. Below the bottom row, we can construct a phantom row that always takes the <em>negative</em> of its values. Therefore, any linear interpolation at the halfway point between the bottom row and the phantom row must be equal to zero. That is, the halfway <em>line</em> between them achieves the no-slip condition, thereby simulating the solid wall. From there, if the backtrace gives a position that is beyond the halfway line, it should be clamped to it. This approach with the phantom row extends to all sides of the domain.</p>
<figure>
<img src="/images/2024-1-20-figure4.png" alt="In the background, the bottom half of the image is hashed out, signifying a solid wall. Four arrows are on the corners of a square on the grid, the top two pointed in different directions but the bottom two pointing in the opposite direction of the top two. Dashed lines connect the top two arrows and the bottom two arrows. Points dot halfway on the dashed lines. On the top point, an arrow points in the direction of the top two arrows' average. On the bottom point, an arrow points in the direction of the bottom twos' average, and notably this is also in the opposite direction of the arrow on the top point. Another dashed line connects the points on the dashed lines. A point dots halfway on this new line. On this new point, a smaller point dots on top of it, signifying that the average of the two averages, which are opposites of each other, is zero." />
<figcaption>The phantom row exists <i>inside</i> the wall, and the value of the bilinear interpolation on the wall's surface must be zero</figcaption>
</figure>
<p>We also need to define the value of the phantom corner formed by a phantom row and phantom column. I didn’t see a rigorous treatment of them in my references, and I’ve seen that the corners might not matter much in practice. Still, the “no-slip” condition has a nice internal consistency that just gives us this definition. At the intersection of the halfway lines, the velocity there must also be zero. From this, we can form <em>an equation involving the value of the phantom corner</em>, and its solution is that the phantom corner should take on the value of the real corner—<em>not</em> its negative! Rather, it can be thought of as the phantom row taking the negative of the value at the end of the phantom column, which is itself a negative, and this makes a double negative.</p>
<figure>
<img src="/images/2024-1-20-figure5.png" alt="In the background, the left half and bottom half of the image is hashed out, signifying intersecting solid walls. Four arrows are on the corners of a square on the grid. The upper right arrow is pointed in some direction, the upper left and bottom right is pointed in the opposite direction, and the bottom left arrow is pointed in the same direction. Dashed lines connect the top two arrows and the bottom two arrows. Points dot halfway on the dashed lines. On each of the two points is a smaller point, signifying that the average of the top two arrows and the average of the bottom two arrows is both zero. Another dashed line connects the points on the dashed lines. A point dots halfway on this new line. On this point, a smaller point dots on top of it, signifying that the average of the two averages, which are themselves zero, is zero." />
</figure>
<p>This completes what Stam showed in “Stable Fluids”, though I pulled in the no-slip condition and its implementation from the GPU Gems article.</p>
<p>Regarding what we have so far: according to Stam, the “method of characteristics” update before discretization is “unconditionally stable” because no value in $\bold{v}_\text{advect}$ can be larger than the largest value in $\bold v$ (obviously because $\bold{v}_\text{advect}$ always <em>is</em> some value in $\bold v$), and his discretization with linear interpolation preserved the stability (because $\bold{v}_\text{advect}$ is always <em>between</em> some values in $\bold v$ or zero). This is especially important; in the past, I had written fluid simulations that didn’t have unconditional stability, and they blew up unless I took small timesteps. Getting to take large timesteps here is critical to running this sim on an ESP32.</p>
<p>However, we’re one further approximation away from the method that appears in “Realtime Fluid Dynamics for Games” (and also the GPU Gems article and Wong’s post). Quite simply, if finding the path from $\bold{p}(\bold x, t_0 - \Delta t)$ to $\bold x$ can be called a nonlinear backtrace, then it’s replaced with a <em>linear</em> backtrace. The path is approximated with a straight line that extends from $\bold x$ in the direction of the velocity there:</p>
\[\bold{v}_\text{advect}(x) = \bold{v}(\bold x - \bold{v}(\bold x) \Delta t, t)\]
<p>or in other words $\bold x - \bold{v}(\bold x) \Delta t$ replaces $\bold{p}(\bold x, t - \Delta t)$</p>
<figure>
<img src="/images/2024-1-20-figure6.png" alt="In orange, a velocity field as a vector plot. In blue, a straight line that approximates a path of a particle that follows the vector field and extends in the direction of the velocity. In black, a point on the path that represents the position of the particle at time t_0, specifically a particle at a point that coincides on the grid of arrows i.e. the vector plot. In grey, an approximation of the point where the particle was previously at time t_0 - Delta t." />
</figure>
<p>This expression is shown as the principal discretization in the references I’ve mentioned—and it’s not hard to take it as so—but it’s really three parts: a “method of characteristics” analysis that comprises a time discretization, a space discretization using a grid and linear interpolation, and a further approximation using a linear backtrace. With these essential components in mind, we can draw a couple of conclusions:</p>
<ol>
<li>In “Realtime Fluid Dynamics for Games”, Stam goes on to state that “the idea of tracing back and interpolating” is a kind of “semi-Lagrangian method”, and so the linear backtrace isn’t quintessential to that classification. It remains a useful approximation, though.</li>
<li>The key feature of this method is the unconditional stability that comes from the interpolation not exceeding the original values, and that’s a useful constraint to carry forward. For example, if you find yourself wasting compute on clipping values, like I once did, then something wasn’t done correctly.</li>
<li>Generally speaking, this advection method isn’t the end-all and be-all of advection methods, and the field of fluid simulation is much larger than that. And it escapes me—go look to other sources for those.</li>
</ol>
<p>In any case, this perspective doesn’t change the fact that the advection update fortunately manifests as only a couple of lines of C or C++. Here’s how I wrote it in ESP32-fluid-simulation.</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">template</span><span class="o"><</span><span class="k">class</span> <span class="nc">T</span><span class="p">,</span> <span class="k">class</span> <span class="nc">VECTOR_T</span><span class="p">></span>
<span class="kt">void</span> <span class="nf">semilagrangian_advect</span><span class="p">(</span><span class="n">Field</span><span class="o"><</span><span class="n">T</span><span class="o">></span> <span class="o">*</span><span class="n">new_property</span><span class="p">,</span> <span class="k">const</span> <span class="n">Field</span><span class="o"><</span><span class="n">T</span><span class="o">></span> <span class="o">*</span><span class="n">property</span><span class="p">,</span> <span class="k">const</span> <span class="n">Field</span><span class="o"><</span><span class="n">VECTOR_T</span><span class="o">></span> <span class="o">*</span><span class="n">velocity</span><span class="p">,</span> <span class="kt">float</span> <span class="n">dt</span><span class="p">){</span>
<span class="kt">int</span> <span class="n">N_i</span> <span class="o">=</span> <span class="n">new_property</span><span class="o">-></span><span class="n">N_i</span><span class="p">,</span> <span class="n">N_j</span> <span class="o">=</span> <span class="n">new_property</span><span class="o">-></span><span class="n">N_j</span><span class="p">;</span>
<span class="k">for</span><span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">N_i</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">){</span>
<span class="k">for</span><span class="p">(</span><span class="kt">int</span> <span class="n">j</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">j</span> <span class="o"><</span> <span class="n">N_j</span><span class="p">;</span> <span class="n">j</span><span class="o">++</span><span class="p">){</span>
<span class="n">VECTOR_T</span> <span class="n">displacement</span> <span class="o">=</span> <span class="n">dt</span><span class="o">*</span><span class="n">velocity</span><span class="o">-></span><span class="n">index</span><span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">);</span>
<span class="n">VECTOR_T</span> <span class="n">source</span> <span class="o">=</span> <span class="p">{</span><span class="n">i</span><span class="o">-</span><span class="n">displacement</span><span class="p">.</span><span class="n">x</span><span class="p">,</span> <span class="n">j</span><span class="o">-</span><span class="n">displacement</span><span class="p">.</span><span class="n">y</span><span class="p">};</span>
<span class="c1">// Clamp the source location within the boundaries</span>
<span class="k">if</span><span class="p">(</span><span class="n">source</span><span class="p">.</span><span class="n">x</span> <span class="o"><</span> <span class="o">-</span><span class="mf">0.5</span><span class="n">f</span><span class="p">)</span> <span class="n">source</span><span class="p">.</span><span class="n">x</span> <span class="o">=</span> <span class="o">-</span><span class="mf">0.5</span><span class="n">f</span><span class="p">;</span>
<span class="k">if</span><span class="p">(</span><span class="n">source</span><span class="p">.</span><span class="n">x</span> <span class="o">></span> <span class="n">N_i</span><span class="o">-</span><span class="mf">0.5</span><span class="n">f</span><span class="p">)</span> <span class="n">source</span><span class="p">.</span><span class="n">x</span> <span class="o">=</span> <span class="n">N_i</span><span class="o">-</span><span class="mf">0.5</span><span class="n">f</span><span class="p">;</span>
<span class="k">if</span><span class="p">(</span><span class="n">source</span><span class="p">.</span><span class="n">y</span> <span class="o"><</span> <span class="o">-</span><span class="mf">0.5</span><span class="n">f</span><span class="p">)</span> <span class="n">source</span><span class="p">.</span><span class="n">y</span> <span class="o">=</span> <span class="o">-</span><span class="mf">0.5</span><span class="n">f</span><span class="p">;</span>
<span class="k">if</span><span class="p">(</span><span class="n">source</span><span class="p">.</span><span class="n">y</span> <span class="o">></span> <span class="n">N_j</span><span class="o">-</span><span class="mf">0.5</span><span class="n">f</span><span class="p">)</span> <span class="n">source</span><span class="p">.</span><span class="n">y</span> <span class="o">=</span> <span class="n">N_j</span><span class="o">-</span><span class="mf">0.5</span><span class="n">f</span><span class="p">;</span>
<span class="c1">// Get the source value with billinear interpolation</span>
<span class="kt">int</span> <span class="n">i11</span> <span class="o">=</span> <span class="n">FLOOR</span><span class="p">(</span><span class="n">source</span><span class="p">.</span><span class="n">x</span><span class="p">),</span> <span class="n">j11</span> <span class="o">=</span> <span class="n">FLOOR</span><span class="p">(</span><span class="n">source</span><span class="p">.</span><span class="n">y</span><span class="p">),</span>
<span class="n">i12</span> <span class="o">=</span> <span class="n">i11</span><span class="p">,</span> <span class="n">j12</span> <span class="o">=</span> <span class="n">j11</span><span class="o">+</span><span class="mi">1</span><span class="p">,</span>
<span class="n">i21</span> <span class="o">=</span> <span class="n">i11</span><span class="o">+</span><span class="mi">1</span><span class="p">,</span> <span class="n">j21</span> <span class="o">=</span> <span class="n">j11</span><span class="p">,</span>
<span class="n">i22</span> <span class="o">=</span> <span class="n">i11</span><span class="o">+</span><span class="mi">1</span><span class="p">,</span> <span class="n">j22</span> <span class="o">=</span> <span class="n">j11</span><span class="o">+</span><span class="mi">1</span><span class="p">;</span>
<span class="kt">float</span> <span class="n">di</span> <span class="o">=</span> <span class="n">source</span><span class="p">.</span><span class="n">x</span><span class="o">-</span><span class="n">i11</span><span class="p">,</span> <span class="n">dj</span> <span class="o">=</span> <span class="n">source</span><span class="p">.</span><span class="n">y</span><span class="o">-</span><span class="n">j11</span><span class="p">;</span>
<span class="n">T</span> <span class="n">p11</span> <span class="o">=</span> <span class="n">property</span><span class="o">-></span><span class="n">index</span><span class="p">(</span><span class="n">i11</span><span class="p">,</span> <span class="n">j11</span><span class="p">),</span> <span class="n">p12</span> <span class="o">=</span> <span class="n">property</span><span class="o">-></span><span class="n">index</span><span class="p">(</span><span class="n">i12</span><span class="p">,</span> <span class="n">j12</span><span class="p">),</span>
<span class="n">p21</span> <span class="o">=</span> <span class="n">property</span><span class="o">-></span><span class="n">index</span><span class="p">(</span><span class="n">i21</span><span class="p">,</span> <span class="n">j21</span><span class="p">),</span> <span class="n">p22</span> <span class="o">=</span> <span class="n">property</span><span class="o">-></span><span class="n">index</span><span class="p">(</span><span class="n">i22</span><span class="p">,</span> <span class="n">j22</span><span class="p">);</span>
<span class="n">T</span> <span class="n">interpolated</span> <span class="o">=</span> <span class="n">billinear_interpolate</span><span class="p">(</span><span class="n">di</span><span class="p">,</span> <span class="n">dj</span><span class="p">,</span> <span class="n">p11</span><span class="p">,</span> <span class="n">p12</span><span class="p">,</span> <span class="n">p21</span><span class="p">,</span> <span class="n">p22</span><span class="p">);</span>
<span class="n">new_property</span><span class="o">-></span><span class="n">index</span><span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">)</span> <span class="o">=</span> <span class="n">interpolated</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="n">new_property</span><span class="o">-></span><span class="n">update_boundary</span><span class="p">();</span>
<span class="p">}</span>
</code></pre></div></div>
<p>We can see the linear backtrace in the calculation of the <code class="language-plaintext highlighter-rouge">source</code> vector. The <code class="language-plaintext highlighter-rouge">source</code> vector is floating-point, but the arrays are indexed with integers. So, I used the <code class="language-plaintext highlighter-rouge">FLOOR</code> macro to find the upper-left point, and then I found the rest by adding one. I wrote <code class="language-plaintext highlighter-rouge">FLOOR</code> to calculate the <a href="https://en.wikipedia.org/wiki/Floor_and_ceiling_functions">floor function</a>, and—no—it’s not the same as <code class="language-plaintext highlighter-rouge">(int)x</code>! <code class="language-plaintext highlighter-rouge">(int)x</code> rounds toward zero, and the floor function strictly rounds down. Finally, there’s also clamping of the <code class="language-plaintext highlighter-rouge">source</code> vector and <code class="language-plaintext highlighter-rouge">new_property->update_boundary()</code> which calculates the phantom rows and columns.</p>
<p>It’s worth noting that, because semi-Lagrangian advection is also applied to the density later, that step can be implemented using the same code that does velocity advection. If you look at the terms $-(\bold v \cdot \nabla) \bold v$ and $-(\bold v \cdot \nabla) \rho$, you can see that the operator doesn’t change—only <em>what’s being operated on</em> does. The only difference is that the “no-slip” condition doesn’t apply to advecting density, so the phantom rows and columns should just copy instead of taking the negative.</p>
<p>Personally, I took the natural approach for C++ and wrote a function template, and that meant it could take either the density array or the velocity array. Then, I had the <code class="language-plaintext highlighter-rouge">new_property->update_boundary()</code> method either do the copy or the negative, depending on a private variable of <code class="language-plaintext highlighter-rouge">new_property</code>. You can see how that works in the <a href="https://github.com/colonelwatch/ESP32-fluid-simulation/blob/0a4906ab6106901e7790403f01d6db964ebfd569/ESP32-fluid-simulation/Field.h#L57-L94"><code class="language-plaintext highlighter-rouge">Field.h</code> file</a> of ESP32-oled-spectrum. That said, an approach that also works in C is to recognize $x$-velocity and $y$-velocity as <em>independent properties</em>. Then, they can be stored in separate, scalar arrays—say <code class="language-plaintext highlighter-rouge">u</code> and <code class="language-plaintext highlighter-rouge">v</code>—and then the same code that operates on density arrays can operate on each component. You can see how exactly that would it would be done in “Realtime Fluid Dynamics for Games”.</p>
<p>Moving on from the semi-Lagrangian advection of velocity (and density), the second step is to apply the user’s input to the velocity array. This corresponds to the $\bold f$ term, the external forces term. This isn’t something Stam had set in stone, since what makes up the external forces really depends on the physical situation being simulated. In our case, we want someone swirling their arm in the water, and so external forces must be derived from the touch data. That’s the touch data we had the touch task generate in <a href="/2023/07/30/esp32_fluid_sim_2.html">Part 2</a>, and here’s where it comes into play.</p>
<p>Recall that a touch input consists of a position and a velocity. Let $\bold{x}_i$ and $\bold{v}_i$ be the position and velocity of the $i$-th input in the queue. Naturally, we should want to influence the velocities around $\bold{x}_i$ in the direction of $\bold{v}_i$. Under this general guidance, I <em>could</em> have gone about it in the way that was done in the GPU Gems article. That was to add a “Gaussian splat” to the velocity array, and that “splat” was formally expressed as something like this</p>
\[\bold{f}_i \, \Delta t \, e^{\left\Vert \bold{x} - \bold{x}_i \right\Vert^2 / r^2}\]
<p>where $\bold{f}_i$ is a vector with some reasonably predetermined magnitude but a direction equal to that of $\bold{v}_i$. From the multiplication $\bold{f}_i \Delta t$, you may notice that the time discretization in play is just Euler’s method and that the space discretization in play is to just evaluate it at the points of the grid. Across all the inputs in the queue, the update would have been</p>
\[\bold{v}_\text{force}(\bold{x}) = \bold{v}(\bold{x}) + \sum_{i = 0}^n \bold{f}_i \, \Delta t \, e^{\left\Vert \bold{x} - \bold{x}_i \right\Vert^2 / r^2}\]
<p>where $n$ is the number of items in the queue. I had two issues with it. First, I specifically wanted to capture how you can’t push the fluid faster than the speed of your arm in the water. This was especially important when someone was moving the stylus very gently. Second, evaluating the splat at every single point would’ve been expensive. My crude solution to this was to just set $\bold{v}(\bold{x}_i)$ to be <em>equal</em> to $\bold{v}_i$. In code, that turns out to merely be the following</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="nc">touch</span> <span class="n">current_touch</span><span class="p">;</span>
<span class="k">while</span><span class="p">(</span><span class="n">xQueueReceive</span><span class="p">(</span><span class="n">touch_queue</span><span class="p">,</span> <span class="o">&</span><span class="n">current_touch</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span> <span class="o">==</span> <span class="n">pdTRUE</span><span class="p">){</span> <span class="c1">// empty the queue</span>
<span class="n">velocity_field</span><span class="o">-></span><span class="n">index</span><span class="p">(</span><span class="n">current_touch</span><span class="p">.</span><span class="n">coords</span><span class="p">.</span><span class="n">y</span><span class="p">,</span> <span class="n">current_touch</span><span class="p">.</span><span class="n">coords</span><span class="p">.</span><span class="n">x</span><span class="p">)</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">.</span><span class="n">x</span> <span class="o">=</span> <span class="n">current_touch</span><span class="p">.</span><span class="n">velocity</span><span class="p">.</span><span class="n">y</span><span class="p">,</span> <span class="p">.</span><span class="n">y</span> <span class="o">=</span> <span class="n">current_touch</span><span class="p">.</span><span class="n">velocity</span><span class="p">.</span><span class="n">x</span><span class="p">};</span>
<span class="p">}</span>
<span class="n">velocity_field</span><span class="o">-></span><span class="n">update_boundary</span><span class="p">();</span> <span class="c1">// in case the dragging went near the boundary, we need to update it</span>
</code></pre></div></div>
<p>where, if you’re confused about the apparent “axes swap”, see the section in <a href="/2023/07/30/esp32_fluid_sim_2.html">Part 2</a> about the AdafruitGFX coordinate system. Formally, I can write this code as as</p>
\[\bold{v}_\text{force}(\bold{x}_i) = \bold{v}_i\]
\[\bold{v}_\text{force}(\bold{x}) = \bold{v}(\bold{x}) \text{ for } \bold{x} \not= \bold{x}_i \text{ for any } i\]
<p>The third step is the pressure step, corresponding to the $- \frac{1}{\rho} \nabla p$ term. Out of all the terms in the definition of $\frac{\partial \bold v}{\partial t}$, it must be calculated <em>last</em>, capping off the velocity update before we can proceed to the density update. I already discussed this in <a href="/2023/07/30/esp32_fluid_sim_2.html">Part 2</a>, but in short, the so-called pressure <em>does not represent a real process</em>. Rather, it is a correction term that eliminates divergence in the velocity field. This ensures the incompressibility constraint, $\nabla \cdot \bold v = 0$. (Technically, the specific formulation that Stam presents doesn’t eliminate it entirely, but it does eliminate most of it. We can get into that in the next part.) Since it’s not a real process, no time discretization is in play. Rather, the updated velocity field is straight-up not valid until the correction is calculated.</p>
<p>It would be more correct to state that Stam’s fluid simulation follows the modified definition that he presents in “Stable Fluids”, that is</p>
\[\frac{\partial \bold v}{\partial t} = \mathbb{P} \big( - (\bold v \cdot \nabla) \bold v + \nu \nabla^2 \bold v + \bold f \big)\]
<p>where $\mathbb{P}$ is a linear projection onto the space of velocity fields with zero divergence. This definition clearly shows that $\mathbb{P}$ must be calculated last, though it hides the fact that calculating it does involve a gradient. Anyway, applying the reductions that we’ve been running with so far, that would just be</p>
\[\frac{\partial \bold v}{\partial t} = \mathbb{P} \big( - (\bold v \cdot \nabla) \bold v + \bold f \big)\]
<p>where, we’ve again set $\nu$ to zero.</p>
<p>On the matter of actually calculating $\mathbb{P}$, there’s so much to say in the next part. I’ll provide the code then as well.</p>
<p>That just leaves the fourth and final step, the semi-Lagrangian advection of the density, corresponding to the term $-(\bold v \cdot \nabla) \rho$. Well, we’ve made it the only term in the definition of $\frac{\partial \rho}{\partial t}$, and we’ve already implemented it. There are no more obstacles here. The only thing I’d mention is that extending the fluid sim to full color is quite trivial. Instead of advecting a single density field, we can advect <em>three</em> density fields—one for red dye, one for blue dye, and one for green dye.</p>
<p>That fills most of the outline, implementing every part of the reduced Navier-Stokes equations except for the pressure step. That’s the applied force and the semi-Lagrangian advection of the velocity and density. There, we paid special attention to the derivation and the no-slip boundary condition, since that comes from the physical situation being simulated. We also went a bit into the general idea of discretizing time (i.e. numerical integration) and discretizing space in order to give context. That’s everything I know about those steps that I think could help their implementation. In the next and final post, we’ll go over what, exactly, the pressure step is, including the relevant linear algebra. Stay tuned!</p>Kenny Pengknypng44@gmail.comIf you’ve read Part 2 and Part 3 already, then you’re as equipped to read this part as I can make you. You’ve already heard me mention that we should be passing in touch inputs, consisting of locations of velocities. You’ve also already heard that we’re getting out color arrays. Some mechanism should be turning the former into the latter, and it should be broadly inspired by the physics, which we had written out as partial differential equations. This post and the next post—the final ones—are about that mechanism. To be precise, this post covers everything but the pressure step, and the next will give it its own airtime.revRSS: The basic infrastructure behind finding reverse split press releases and trading on them2023-11-04T00:00:00+00:002023-11-04T00:00:00+00:00http://kenny-peng.com/2023/11/04/revrss_infrastructure<p><em>Note: Though this article mentions the idea of trading on reverse splits, the idea is given not for any compensation and not as personal financial advice for the reader’s specific financial situation.</em></p>
<p>A couple of years ago, I used to be subscribed to a mailing list called “Reverse Split Arbitrage”, and I remember being surprised that the trading tips that landed in my mailbox did make me a bit of money. The central idea of it was based on a kind of stock market technicality.</p>
<p>When a company executes a “reverse split”, it takes every X shares and merges them into a single share, thereby raising the price because the value of the company is divided among fewer shares. “X” is a number that comes from the announced ratio “1-for-X”. (This language is similar to the “X-for-1” ratio of stock splits, or in other words <em>forward</em> splits, though in that case the value is divided among <em>more</em> shares to <em>lower</em> the price.) Reverse splits typically happen because the price has fallen under $1, the minimum price set by the NYSE and Nasdaq to stay listed.</p>
<p>But here’s the big-money question: what if an investor has less than X shares left over? Under the given ratio, that would have to become a so-called “fractional share”. Companies typically take one of four approaches to this fraction:</p>
<ol>
<li>pay cash for this fraction,</li>
<li>round it down to zero or one, whichever is nearer,</li>
<li>round it down to zero unconditionally, but most commonly,</li>
<li>round it up to one unconditionally.</li>
</ol>
<p>Which option the company takes can almost always be found in the press release or SEC filing that is published shortly before the reverse split happens. These emails I had gotten from Reverse Split Arbitrage would alert me to these reverse splits that would round up, but after some time, I wasn’t getting them anymore.</p>
<p>Still, it turns out that plenty of reverse splits are still happening, and many of them are still rounding up. I wanted to get back into trading on them, but I didn’t have the mailing list to help me any more. I had to rig up something myself, of course! This was also something I wanted to share with others—for zero compensation especially. For now, I’m doing a soft launch of this at <a href="https://www.revrss.com">www.revrss.com</a>, and it’s in a limited form that focuses only on press releases (not SEC filings) and requires the reader to read them themselves. The intention is to make it more public after overcoming these limitations, but I’ve been able to use it myself just fine.</p>
<p>With that said, even getting this far required quite a bit of infrastructure! If I were to describe what I’m doing in one phrase, referring to the technologies involved just by their name, it would be “a WebSub-enabled RSS news aggregator, served via nginx over Cloudflare Tunnels”.</p>
<figure>
<img src="/images/2023-11-4-figure1.png" alt="diagram showing infrastructure of revRSS project as of Nov 4th, 2023, consisting of a primary server interacting with newswires and using Cloudflares Tunnels as its public face, while at the same time a user can be notified by their online RSS reader via a WebSub broker. primary server circled in red to show that it is within my home network" />
<figcaption>The infrastructure of revRSS as of Nov 4th, 2023, with the primary server being in my home network. It's worth noting here that, if a powerful enough server was rented from a cloud provider, the primary server, the WebSub broker, and Cloudflare Tunnels could be replaced by that single server.</figcaption>
</figure>
<p>And now, I’ll say it again in longform.</p>
<p>Press releases about reverse splits (and whether they’ll round up) happen to be distributed by one of four newswires, <a href="https://www.businesswire.com">Business Wire</a>, <a href="https://www.prnewswire.com">PR Newswire</a>, <a href="https://www.accesswire.com">ACCESSWIRE</a>, and <a href="https://www.globenewswire.com">Globe Newswire</a>, though they may also be sent out via smaller newswires like <a href="https://www.newsfilecorp.com">Newsfile Corp</a>, <a href="https://www.einpresswire.com">EIN Presswire</a>, or <a href="https://www.dowjones.com/professional/newswires/">Dow Jones Newswire</a>. The first four are used dramatically more than the latter three.</p>
<p>Though newswires usually forward their news directly to their journalist clients, they also share it directly with the public via one channel or another. If a newswire made their news available in <a href="https://en.wikipedia.org/wiki/RSS">RSS</a>, the standard format for distributing news from machine to machine, then I had written a program for interpreting that. (In the case of PR Newswire, I managed to talk to someone there about getting an RSS feed!) If they instead made it available via their website, then I had to resort to “web scraping”, in other words parsing the HTML code meant for web browsers.</p>
<p>Naturally, if a newswire offered RSS, I went for it over going to their website. In either case, though, I could use a Python library for parsing XML and HTML data, called <a href="https://www.crummy.com/software/BeautifulSoup/bs4/doc/">Beautiful Soup</a>, to do the heavy lifting. In general, both XML and HTML organize data into “tags”—“tags” being containers of many blocks of text, <em>sub</em>-tags, or even both at the same time. In the case of RSS, which is a kind of XML, the way a news article and its associated metadata is encoded with these tags is exactly defined in the <a href="https://www.rssboard.org/rss-specification/">RSS specification</a>, and so the specification was a good reference in instructing Beautiful Soup to find the tags associated with said article. In the case of HTML though, an article ends up being encoded in ways varying from website to website, and so I ended up needing to pick through each website by hand to find the tags to give to Beautiful Soup.</p>
<p>Still, with effort, I could have myself a list of articles from all of the relevant newswires—each article with its title, link, published date, and excerpt. I just needed to filter it for press releases about reverse splits and then sort by latest. With that said, a dire wish of mine here is to achieve a better filter here. Because the language that declares a reverse split with round-ups varies, identifying such without making many false positives or false negatives would require good natural language processing. For now, I’m leaning toward more false positives, selecting only for reverse splits but not whether they’ll round up. That can be done with a simple keyword search. With this (admittedly faultily) filtered list, then sorted by latest, I could even begin to report something to the public.</p>
<p>My choice for how I did this was RSS again, not a full website nor a mailing list. With that said, serving RSS is not like the latter and much like the former. To be exact here, the relationship is identical to serving a <a href="https://en.wikipedia.org/wiki/Static_web_page">“static website”</a>, or in other words a website built on a set of fixed assets, including HTML, CSS, images, or even Javascript but <em>not</em> including responses of a database. As mentioned here, RSS is just a format, and so an RSS service is just a <em>single file</em>, served as if it were a logo on some corporate website. Consequently, I could construct this file using Beautiful Soup and then serve it using a configuration of the <a href="https://en.wikipedia.org/wiki/Nginx">nginx</a> program, which was designed for such static assets.</p>
<p>Speaking of static websites, I configured nginx to also serve the revRSS website (just a for-your-information site) which was a static website. For that, “static site generator” programs like <a href="https://jekyllrb.com">Jekyll</a> can autogenerate all the assets of a static website from plaintext files and configuration files (which can come from publicly available templates like <a href="https://beautifuljekyll.com">Beautiful Jekyll</a>). I think detailing how using Jekyll went for me is outside the scope of this article, but I mention this because I want to highlight how serving the site and serving the reverse splits feed are completely equivalent. In fact, the <em>same</em> nginx configuration serves both.</p>
<p>Anyway, a key disadvantage of RSS from mailing lists is that notifications are impossible because there is no list of subscribers to contact. This wouldn’t be a problem if—say—one made a habit of checking the feed every morning, but I don’t think that should be necessary. So, since I wanted to serve RSS but also deliver notifications, what was I to do? The answer was to use another program on the side that follows the <a href="https://en.wikipedia.org/wiki/WebSub">WebSub (formerly PubSubHubbub) protocol</a>. This other program maintains the list, and some apps like <a href="https://en.wikipedia.org/wiki/NewsBlur">NewsBlur</a> are <a href="https://blog.newsblur.com/2012/04/02/building-real-time-feed-updates-for-newsblur/">capable of joining that list</a>. It could be run on the same server that runs nginx, but I used a public “broker”. In particular, I used the one run by Google at <a href="https://pubsubhubbub.appspot.com/">pubsubhubbub.appspot.com</a>.</p>
<p>Finally, I wanted to host everything on a powerful server at home, but my internet provider doesn’t allow me to open the standard ports for HTTP and HTTPS, 80 and 443. By “opening” ports, I mean accepting incoming connections there. Though opening other ports and manually punching in the port numbers may technically work for me, that wouldn’t work for the public. One solution for this I’ve done before is a <a href="https://unix.stackexchange.com/questions/46235/how-does-reverse-ssh-tunneling-work">reverse SSH tunnel</a>, a type of SSH connection that one server makes to another server in order for the latter to act as a face of the former, accepting connections at its <em>own</em> ports <em>for</em> the former. In this scenario, a connection would be <em>issued</em> by my server (not <em>accepted</em>) and from there traffic is forwarded back, and this would get around my internet provider’s restriction. To do this, the other server could just be rented from a cloud provider like Google Cloud—possibly while staying within the limits of their free tier.</p>
<p>However, I went for something similar using Cloudflare Tunnels instead. The tradeoff: I don’t have to manage two servers, but I lose control of the other end to Cloudflare. With that said, I planned to proxy my traffic through them anyway because I wanted to use their content delivery network to serve the heaviest parts of the revRSS site for me, including fonts and images. To me, their Tunnels feature was icing on the cake.</p>
<p>So, that’s how I’m getting and trading on the latest press releases about potential reverse split round-ups as they happen. With this infrastructure, it’s also how—technically—you can too. It’s a basic infrastructure that actually needs to become more complex before it’s something I could count on more simply, really, and yet it invokes a wide range of concepts already. From file formats to servers to tunnels, each has a different role in transporting the news of a reverse split from the company to my phone.</p>
<p>I could end up adding more to this pipeline, and if I write a piece on it, you can click to it here.</p>Kenny Pengknypng44@gmail.comA couple years ago, I used to be subscribed to a mailing list called "Reverse Split Arbitrage", and I remember being surprised that the trading tips that landed in my inbox really did make me a bit of money. The central idea of it was based on a stock market technicality.Rebuilding ESP32-fluid-simulation: an outline of the sim task (Part 3)2023-09-22T00:00:00+00:002023-09-22T00:00:00+00:00http://kenny-peng.com/2023/09/22/esp32_fluid_sim_3<p>Okay, I wondered if I should have led this series with the physics, but I think saving it for last was the right call. As I was writing about <a href="/2023/07/21/esp32_fluid_sim_1.html">the FreeRTOS tasks involved and their communication</a> and the <a href="/2023/07/30/esp32_fluid_sim_2.html">touch and render tasks specifically</a>, I started to think about how I could write about this with the detail and approachability it deserves.</p>
<p>To start, I’ll be honest: I’m not presenting anything groundbreaking here. In 1999, Jos Stam introduced a simple and fast form of fluid simulation in his conference paper called “Stable Fluids”, and in 2003, he published a straightforward version of it in “Realtime Fluid Dynamics for Games”. Many people have written guides to “fluid simulation” that have been specifically based on these two papers since. Two key examples to me: <a href="https://developer.nvidia.com/gpugems/gpugems/part-vi-beyond-triangles/chapter-38-fast-fluid-dynamics-simulation-gpu">a chapter of NVIDIA’s <em>GPU Gems</em></a> and a <a href="https://jamie-wong.com/2016/08/05/webgl-fluid-simulation/">blog post by Jamie Wong</a>. To be pendantic, I find now that the current field of fluid simulation is much, <em>much</em> larger than what any of these references imply. Still, these were the guides I followed when I first wrote ESP32-fluid-simulation. In both was Stam’s technique, and between everything I just linked to, you could probably write your own implementation of it eventually.</p>
<figure>
<div style="max-height: 400px; display: block; margin: auto; aspect-ratio: 4/3;"><iframe height="100%" width="100%" src="https://www.youtube-nocookie.com/embed/t-erFRTMIWA" title="Jos Stam's 1999 Interactive Fluid Dynamics Demo" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen=""></iframe></div>
<figcaption>A tape about Stam's technique from circa 1999, available on Youtube since 2011!</figcaption>
</figure>
<p>That said, between then and when I rewrote it recently, I picked up some background knowledge that proved incredibly useful. I’m not saying here that I became an expert on fluid sims—I can’t advise you on designing a new technique from scratch. Rather, if I had known it back then, I wouldn’t have made nearly as many wrong turns. It turns out that implementing Stam’s technique gets easier when you understand the <em>whats</em> of the operations he would have you write, if not the whys.</p>
<div class="info-panel">
<h4 id="review-vector-fields-and-scalar-fields">Review: Vector fields and scalar fields</h4>
<p>If you recall any vector calculus, then <a href="https://en.wikipedia.org/wiki/Vector_field">“vector fields”</a> and <a href="https://en.wikipedia.org/wiki/Scalar_field">“scalar fields”</a> may be an obvious concept to you already, but if not, we can start with the fact that they’re a part of the foundation of fluid dynamics. For now, I’ll review what they are. However, I highly recommend picking up a total understanding of vector calculus somewhere else before looking at any fluid sim techniques besides Stam’s. In fact, perhaps fluid simulations make just the right concrete example to keep in mind while learning!</p>
<p>Anyway, let’s sketch out what vector fields and scalar fields are, and hopefully, the picture is filled in as you keep reading this article. The ordinary idea of a mathematical function is a thing that outputs a number when given a number input. Vector fields and scalar fields are functions—though of different kinds.</p>
<p>Consider a flat, two-dimensional space, and then consider a function that outputs a number when given a <em>location in this space</em> as the input. Furthermore, this location can be expressed as a pair of numbers if we used a coordinate system (three if we worked in three dimensions, and we could, but we won’t here). A concrete example of this would be a function of a location that gives the temperature there—the location being written as the latitude and longitude on the map. It’s 48 degrees Fahrenheit in Arkhangelsk and 84 degrees in Singapore. Considering that Arkangelsk can be found at 64.5°N, 40.5°E and Singapore at 1.2°N, 103.8°E, we can define a temperature function that gives $T(64.5, 40.5) = 48$ and $T(1.2, 103.8) = 89$. We can call it a temperature field, but more generally, it’s a “scalar field”. It’s a scalar-valued function of the location, possibly written as $f(x, y)$.</p>
<figure>
<img src="/images/2023-9-22-figure1.png" alt="weather forecast graphic, showing temperature across the United States" />
<figcaption>A weather forecast graphic, showing temperature across the United States. This can be thought of as a temperature field. Source: <a href="https://graphical.weather.gov/sectors/conus.php">NOAA</a></figcaption>
</figure>
<p>Now on the other hand, a “vector-valued function” is any function that outputs a vector, and a vector-valued function of a location is a “vector field”! A concrete example of this would be a wind velocity field. For any location, such a field would give how fast the wind there blows and the direction in which it goes, and it would be given as the magnitude and direction of a single vector. Like for scalar fields, we could possibly write them as $\bold{f}(x, y)$, the boldface font meaning that we have a vector output.</p>
<figure>
<img src="/images/2023-9-22-figure2.png" alt="weather forecast graphic, showing wind speed and direction in the Southeastern US and in particular of Tropical Storm Ophelia" />
<figcaption>A weather forecast graphic, showing wind speed and direction in the Southeastern US during Tropical Storm Ophelia, using color for magnitude and arrows for direction. Vector fields are typically shown using arrows of varying lengths. Source: <a href="https://graphical.weather.gov/sectors/conus.php">NOAA</a></figcaption>
</figure>
<p>That said, though functions of location they are, written like one they are really not. Rather, the dependence on location is assumed, and then $f(x, y)$ and $\bold{f}(x, y)$ are just written as $f$ and $\bold{f}$ instead. Another thing to keep in mind: coordinates are just a pair of numbers, but we can also think of them as a single coordinate <em>vector</em>. Though we may never actually draw that arrow, the interchangeability is relevant. For example, I briefly talked in the previous post about the similarity between a velocity vector and a <em>change</em> in the coordinate vector over a finite period of time.</p>
</div>
<p><!-- div class="info-panel" --></p>
<p>First, it would be helpful to picture what we want to simulate. The input and output are the <em>touch</em> and <em>screen</em> of a touchscreen, and the user dragging around the stylus on it should stir around the fluid on display. The physical scenario this should match is if someone stuck their arm into a bed of dyed water and then stirred it around. In such a scenario, the color would be determined by the concentration of the dye, but the dye itself moves! To capture this physical behavior with a computer simulation, we can start by describing it with a mathematical model.</p>
<p>In Stam’s “Real-time Fluid Dynamics for Games”, he wanted to capture smoke rising from a cigarette and being blown around by air currents. To do so, he ascribed a velocity field (a vector field) and a smoke density field (a scalar field) to the air. But that was it for his model: everything else about it he threw out. In the same way, we can reduce the bed of water to just a velocity field and a dye concentration field.</p>
<figure>
<img src="/images/2023-9-22-figure3.gif" alt="" />
<figcaption></figcaption>
</figure>
<p>Now, what was the relationship between these two fields? Stam wrote that the density field undergoes <a href="https://en.wikipedia.org/wiki/Advection">“advection”</a> by the velocity field. That’s the process of fluid carrying around (smoke particles, dye, or anything in general), and this happens everywhere. He also wrote that it undergoes <a href="https://en.wikipedia.org/wiki/Diffusion_equation">“diffusion”</a>, which is the spontaneous spreading of a thing in a fluid from areas of higher density <em>without</em> being carried by the velocity. He provided an “advection-diffusion” equation that captures both, and it’s a “partial differential equation”.</p>
<!-- TODO: add animations of convection and diffusion, one for each and then one jointly, using the sim -->
<div class="info-panel">
<h4 id="review-partial-derivatives-and-the-differential-operators">Review: Partial derivatives and the differential operators</h4>
<p>Just like how we can take the derivative of your ordinary function, we can take a differential operator of a field. However, these differential operators don’t just mean the slope of a tangent line, but rather they each represent a different way the field changes over a change in location. The critical ones to understand here are the “divergence” and the “gradient”, but the “Laplacian” is also worth touching on. (A formal vector calculus course would also cover the “curl”, the identities, and the associated theorems.)</p>
<p>First of all, differential operators are constructed from the “partial derivatives”. These are the derivatives you already know, but we strictly take them with respect to <em>one</em> of the components while holding the others constant. The reason? Formally, your ordinary derivative is the limit of the change in your ordinary function $f(x)$ over the change in the input $x$ as that change in the input approaches zero.</p>
\[\frac{df}{dx} = \lim_{\Delta x \to 0} \frac{f(x+\Delta x) - f(x)}{\Delta x}\]
<p>However, in the case of fields, by doing this to only <em>one</em> of the components of the location coordinate, the partial derivative just formally means the change in the field $f(x, y)$ over the change in <em>that component</em>. Keeping the other components constant is naturally a part of measuring this change. In two dimensions, fields can have a partial derivative with respect to $x$ or one with respect to $y$. Then, $y$ or $x$ respectively is held constant.</p>
\[\frac{\partial f}{\partial x} = \lim_{\Delta x \to 0} \frac{f(x + \Delta x, y) - f(x, y)}{\Delta x}\]
\[\frac{\partial f}{\partial y} = \lim_{\Delta y \to 0} \frac{f(x, y + \Delta y) - f(x, y)}{\Delta y}\]
<p>A good example would actually be to perform a derivation. Given the function $f(x, y) = x^2 + 2xy + y^2$ as a field, let’s find the partial derivative with respect to $x$.</p>
\[\begin{align*} \frac{\partial}{\partial x}(x^2 + 2xy + y^2) & = 2x + 2y + 0 \\ & \boxed{ = 2x + 2y } \end{align*}\]
<p>Notice that—because $y$ is taken as a constant—$y^2$ drops out and $2xy$ is treated as an $x$-term with a coefficient of $2y$. And finally, to expand on this a bit with a geometric picture, we know that the derivative is the slope of the tangent line, but to be exact, it’s the line tangent to the curve of $f(x)$ at the point $(x, f(x))$. The partial derivative is still the slope of <em>a</em> line that <em>is tangent</em> to the surface of the field at the point $(x, y, f(x, y))$, but it is also strictly running in the $x$-direction for $\partial/\partial x$ or in the $y$-direction for $\partial/\partial y$. Technically, infinitely many lines satisfy the conditions of being tangent to the surface at that point, and these lines form a tangent plane, but we only concern ourselves with the two.</p>
<figure>
<img src="/images/2023-9-22-figure4.png" alt="Diagram of the two lines tangent to the field with slopes equal to the partial derivatives" />
<figcaption>The surface plot of another scalar field $f(x, y) = x^2 + y^2$, which is like the plot of the curve of your ordinary function, along with the two lines tangent to it that have slopes equal to the partial derivatives.</figcaption>
</figure>
<p>That aside, taking a partial derivative with respect to some single component is not as useful as taking <em>every</em> partial derivative with respect to <em>each</em> component. This set is written like a vector of sorts (though a vector it is not) called the “del operator”. For two dimensions, that is</p>
\[\nabla \equiv \begin{bmatrix} \displaystyle \frac{\partial}{\partial x} \\[1em] \displaystyle \frac{\partial}{\partial y} \end{bmatrix}\]
<p>The constructions out of this set that we call the differential operators can absolutely be written without using the del operator, but you’d usually see that they are.</p>
<p>The <a href="https://en.wikipedia.org/wiki/Gradient">“gradient”</a> is the simplest construction: line up each and every partial derivative of a scalar field into a vector. Keeping in mind here that the partial derivative of a field (like $x^2+2xy+y^2$) is actually yet another function of the location (like $x^2 + 2y$), a vector composed of these will itself vary by the location. The gradient of a scalar field is a vector field! We can get to exactly how the gradient gets applied to our fluid sim later, but one useful fact to picture here is that it can be shown that the gradient always happens to point in the direction of steepest ascent in the scalar field. Walking in the direction of the gradient of the temperature field, for example, would warm you up the fastest!</p>
\[\nabla f = \begin{bmatrix} \displaystyle \frac{\partial f}{\partial x} \\[1em] \displaystyle \frac{\partial f}{\partial y} \end{bmatrix}\]
<p>Using the del operator, it looks kind of like scalar multiplication from the right.</p>
<figure>
<img src="/images/2023-9-22-figure5.png" alt="Surface plot of a scalar field and the plot of its gradient" />
<figcaption>In orange, the surface plot of a scalar field. Beneath it and in blue, the plot of the gradient, showing that it points in the direction of steepest ascent. Source: <a href="https://commons.wikimedia.org/wiki/File:3d-gradient-cos.svg">MartinThoma via Wikimedia Commons</a>, <a href="https://creativecommons.org/publicdomain/zero/1.0/">CC0 1.0</a></figcaption>
</figure>
<p>And remember, the gradient is just one shockingly meaningful operator that we can construct from the partial derivatives, which were just slopes of tangent lines! The <a href="https://en.wikipedia.org/wiki/Divergence">“divergence”</a> is a slightly more complicated construction: if we write out a vector field using its components</p>
\[\bold{f}(x, y) = \begin{bmatrix} f_x(x, y) \\ f_y(x, y) \end{bmatrix}\]
<p>then we can take the partial derivative of each component with respect to its <em>associated component</em> of the coordinates (that’s $f_x$ to $\partial/\partial x$ and $f_y$ to $\partial/\partial y$) and then add them up. We should be able to recognize here that the divergence of a vector field is a scalar field. And what is the meaning of this scalar field? For now, it can be imagined as the degree to which the vectors surrounding an input location are pointing away from it, though Gauss’s theorem expresses this more formally (a bit out-of-scope for now).</p>
\[\nabla \cdot \bold{f} = \frac{\partial f_x}{\partial x} + \frac{\partial f_y}{\partial y}\]
<p>Using the del operator, it looks kind of like a dot product.</p>
<figure>
<img src="/images/2023-9-22-figure6.png" alt="Three diagrams, the left showing outward-pointing arrows, the middle showing inward-pointing arrows, and the right showing a balance between the two." />
<figcaption>Three diagrams, the left showing positive divergence with predominantly outward-facing arrows, and the middle showing negative divergence with predominantly inward-facing arrows, the right showing zero divergence with a balance between the two. But again, Gauss's theorem gives the exact picture.</figcaption>
</figure>
<p>Finally, the <a href="https://en.wikipedia.org/wiki/Laplace_operator">“Laplacian”</a> is actually the divergence of the gradient of a scalar field, and this ultimately means that it’s also a scalar field! It is also the sum of the second-order partial derivatives (besides the mixed ones, but that’s totally out-of-scope).</p>
\[\nabla^2 f \equiv \nabla \cdot (\nabla f) = \frac{\partial^2 f}{\partial x^2} + \frac{\partial^2 f}{\partial y^2}\]
<p>Using the del operator, some take the liberty of expressing this composition as a single $\nabla^2$ operator.</p>
<p>There is also the extension of the Laplacian onto vector fields, but it really is just the Laplacian on each component.</p>
\[\nabla^2 \bold{f} = \begin{bmatrix} \nabla^2 f_x \\ \nabla^2 f_y \end{bmatrix}\]
<p>The gradient, divergence, and Laplacian are all the differential operators that are relevant here, and hopefully these will become more concrete to you as we use them from here on to describe Stam’s fluid sim technique. However, I’d again recommend formally learning vector calculus if you’d like to look at other techniques.</p>
</div>
<p><!-- div class="info-panel" --></p>
<p>A “partial differential equation” is kind of like a system of linear equations in this context. Here, they still relate known and unknown variables, and they still have a solution which is the value of the unknowns. However, these “values” are entire fields, not just numbers! Given this, partial differential equations also involve the differential operators of these field-valued variables.</p>
<p>The advection-diffusion equation that Stam provides is a simple example of one: advection and diffusion are <em>independent terms</em>, and their sum is exactly how the density field evolves over time. It is</p>
\[\frac{\partial \rho}{\partial t} = -(\bold{v} \cdot \nabla) \rho + \kappa \nabla^2 \rho + S\]
<p>where $\rho$ is the density field and $\bold{v}$ is the velocity field. $-(\bold{v} \cdot \nabla) \rho$ is the advection term, and $\kappa \nabla^2 \rho$ is the diffusion term—$\kappa$ being a constant for us to control the strength of the diffusion. $S$ is just a term that lets us add density (of smoke, or concentration of dye in our case) to the scene. Notice how this equation is a definition of $\partial \rho / \partial t$. It’s the partial derivative of the density field with respect to time, and it means that $\rho$ is a variable whose value is a function of location and time. However, it’s more useful for us to think of it equivalently as a field that evolves over time. An evolving density field is exactly what we want to show on the screen!</p>
<p>You may also notice that $(\bold{v} \cdot \nabla)$ is clearly some kind of construction from the partial derivatives because it uses the del operator $\nabla$. That is the “advection” operator. I’ve only seen it in fluid dynamics papers and yet still don’t totally understand it. Still, we’ll see how Stam treats it, but that’ll have to be in the next post.</p>
<p>All said though, where is the room in this model for the user’s input? Is the velocity field just a thing we get to set? (Right now, we have two unknowns, $\rho$ and $\bold{v}$, but one equation!) No, it’s more complicated than that: the way water and air move continues to change even after we stop stirring it. That leads to the missing piece to stirring digital water: we need a physical way to define $\partial \bold{v} / \partial t$ (a.k.a. the acceleration!) just like how $\partial \rho / \partial t$ was defined. That missing piece is the famous “Navier-Stokes equations”.</p>
<p>The <a href="https://en.wikipedia.org/wiki/Navier%E2%80%93Stokes_equations">“Navier-Stokes equations”</a> are also partial differential equations. A definition of Navier-Stokes can be found in any fluid dynamics article, but the one Stam provided in “Stable Fluids” is the most directly relevant one.</p>
\[\frac{\partial \bold{v}}{\partial t} = -(\bold{v} \cdot \nabla) \bold{v} - \frac{1}{\rho} \nabla p + \nu \nabla^2 \bold{v} + \bold{f}\]
\[\nabla \cdot \bold{v} = 0\]
<p>The first one is a definition of $\partial \bold{v} / \partial t$. Here, $-(\bold{v} \cdot \nabla) \bold{v}$ and $\nu \nabla^2 \bold{v}$ represent advection and diffusion again, though these are also known as “convection” and acceleration due to “viscosity”, respectively. That is to say, the velocity is carried around and diffused just like how the smoke density was. The only difference is that the constant $\nu$ here is the <a href="https://en.wikipedia.org/wiki/Viscosity">“kinematic viscosity”</a>, and it’s higher for fluids like honey and lower for fluids like water. That aside, $\bold{f}$ represents the acceleration due to external forces, and there is the place in our mathematical model where the user input would go!</p>
<p>On the other hand, $-\frac{1}{\rho} \nabla p$ is an interesting term—it’s <em>not</em> independent here. Let me try to explain. Nominally, it’s the acceleration due to a difference in pressure, and the negative of the gradient represents the tendency for fluids to flow from regions of high pressure to regions of low pressure. (Since the gradient points in the direction of steepest ascent, then going in the opposite direction gives the steepest <em>descent</em>.) But all this is only in name! As Stam had put it, “[t]he pressure and the velocity fields which appear in the Navier-Stokes equations are in fact related”. Ultimately, he used it as a correction term in order to guarantee that the second equation holds. It’s a constraint on the velocity field that is critical to looking like water, called “incompressibility”.</p>
<p>Simply, it states the divergence is zero. Unfortunately, knowing <em>what</em> the incompressibility constraint is turns out to not be the same as knowing <em>why</em> it is nor why incompressibility matters. Those are things beyond what I can comfortably explain in the first place (which I’d think is worse than it being out-of-scope). On the other hand, I definitely have my understanding of how the so-called pressure term is used to ensure zero divergence! However, that’ll again have to be another matter to cover in the next posts.</p>
<p>That aside, I’m going to adjust the equations while we’re here. Because the overall project was about simulating dye in water on an ESP32 and not smoke in air on a GPU, I didn’t use the entire equation. Anyway, this can be thought of as an exercise in finding what part of the physics can be ignored while still looking sorta-physical, I suppose. I really have to wash my hands of any assertions I’m making at this moment, for I am no expert in this field. With that said, I can confirm that deleting the diffusion term by letting $\kappa = 0$ doesn’t look so egregious. We also don’t have to add more dye, so we can delete $S$ while we’re at it. That actually just leaves the advection alone.</p>
\[\frac{\partial \rho}{\partial t} = -(\bold{v} \cdot \nabla) \rho\]
<p>Furthermore, I also got away with letting $\nu = 0$, deleting that term and reducing the Navier-Stokes equations to the following.</p>
\[\frac{\partial \bold{v}}{\partial t} = -(\bold{v} \cdot \nabla) \bold{v} - \frac{1}{\rho} \nabla p + \bold{f}\]
\[\nabla \cdot \bold{v} = 0\]
<p>So ends this post. With the governing equations (advection-diffusion and Navier-Stokes), we’ve laid out the fundamental outline of Stam’s technique. We’ve also reviewed the relevant vector calculus, though no more than that. Though I didn’t have all the authority I needed to get the whys, we should be equipped to understand the whats. In the last parts, we’ll fill in the outline to get an end-to-end fluid simulation. If you’re still here before the <a href="/2024/01/20/esp32_fluid_sim_4.html">next post</a> comes out though, there’s always the <a href="https://github.com/colonelwatch/ESP32-fluid-simulation">ESP32-fluid-simulation source code</a> on GitHub.</p>Kenny Pengknypng44@gmail.comOkay, I wondered if I should have led this series with the physics, but I think saving it for last was the right call. As I was writing about the FreeRTOS tasks involved and their communication and the touch and render tasks specifically, I started to think about how I could write about this with the detail and approachability it deserves.Rebuilding ESP32-fluid-simulation: the touch and render tasks (Part 2)2023-07-30T00:00:00+00:002023-07-30T00:00:00+00:00http://kenny-peng.com/2023/07/30/esp32_fluid_sim_2<p>So, how exactly did my rebuild of <a href="https://github.com/colonelwatch/ESP32-fluid-simulation">ESP32-fluid-simulation</a> do the touch and render tasks? This post is the second in a series of posts about it, and the first was a task-level overview of the whole project. But while it’s nice and all to know the general parts of the project and how they communicate in a high-level sense, the meat of it is the implementation, and I’m here to serve it. The next parts are dedicated to the sim physics, but we’ll talk here about the input and output: the <em>touch</em> and <em>screen</em> of a touchscreen.</p>
<p>The implementation starts from the hardware, naturally. As I established, I went with the ESP32-2432S032 development board that I heard about on <a href="https://www.youtube.com/c/brianlough">Brian Lough’s</a> Discord channel, where it was dubbed the “Cheap Yellow Display” (CYD). That choice guided the libraries that I was going to build on, and that defined the coding problems I had to solve.</p>
<figure>
<img src="/images/2023-7-30-figure1.jpeg" alt="image of the 'Cheap Yellow Display'" />
<figcaption>The ESP32-2432S032, a.k.a. the Cheap Yellow Display</figcaption>
</figure>
<p>Materially, the only component of it that I used was the touchscreen, and it used an ILI9341 LCD driver and an XPT2046 resistive touchscreen controller. In some demonstrative examples, Lough used the <a href="https://github.com/Bodmer/TFT_eSPI">TFT_eSPI</a> library to interact with the former chip and the <a href="https://github.com/PaulStoffregen/XPT2046_Touchscreen">XPT2046_Touchscreen</a> library for the latter chip, and these examples included what pins and configuration to associate with each. None of this setup I messed with.</p>
<p>We can cover the touch task first. To begin with, I already had a general idea for what it should do: a user had to be able to drag their stylus across the screen and then see water stirring as if they had stuck their arm into it and whirled it around in reality. With that in mind, what should we want to capture, exactly?</p>
<p>The objective can be split into three parts. First, we should obviously check if the user is touching the screen in the first place! Second, assuming that the user is touching the screen, we should obviously get <em>where</em> they touched the screen. Finally, if we keep track of the previous touch location, we can use it later to estimate how fast they were dragging the stylus across the screen—assuming they were, that is. We’ll get to that in a bit.</p>
<p>To deal with the first two matters, reading the <a href="https://github.com/PaulStoffregen/XPT2046_Touchscreen#reading-touch-info">documentation for the XPT2046_Touchscreen library</a> takes us most of the way. A call to the <code class="language-plaintext highlighter-rouge">.touched()</code> method tells us whether the user touched the screen. Assuming this returns true, getting the where is just a call to the <code class="language-plaintext highlighter-rouge">.getPoint()</code> method. It returns an object that contains the coordinates of the touch—coordinates that we’ll need to further process.</p>
<p>First, we should quickly note that the XPT2046 always assumes that the screen is 4096x4096, regardless of what the dimensions actually are. That can just be fixed by rescaling. To be exact, the <code class="language-plaintext highlighter-rouge">getPoint()</code> method returns a <code class="language-plaintext highlighter-rouge">TSPoint</code> struct with members <code class="language-plaintext highlighter-rouge">.x</code>, <code class="language-plaintext highlighter-rouge">.y</code>, and <code class="language-plaintext highlighter-rouge">.z</code>. Ignoring <code class="language-plaintext highlighter-rouge">.z</code>, we first multiply <code class="language-plaintext highlighter-rouge">.x</code> by the screen width and <code class="language-plaintext highlighter-rouge">.y</code> by the height. (In fact, I multiplied them by a fourth of that because I had to run the sim at sixteenth-resolution, but that’s beside the point.) Then, we divide <code class="language-plaintext highlighter-rouge">.x</code> and <code class="language-plaintext highlighter-rouge">.y</code> by 4096. Rescaling in this way, multiplying before dividing, preserves the most precision.</p>
<p>With that said, you’re free to ask here: <em>why</em> should <code class="language-plaintext highlighter-rouge">.x</code> be multiplied by the width, and <code class="language-plaintext highlighter-rouge">.y</code> by the height? That would imply that <code class="language-plaintext highlighter-rouge">.x</code> is a horizontal component and <code class="language-plaintext highlighter-rouge">.y</code> is a vertical component, right? That’s correct, but a surprising complication comes from the fact that we’re feeding a fluid sim.</p>
<p>The second thing we need to do is recognize that the XPT2046_Touchscreen library is written to yield coordinates in the coordinate system established by the <a href="https://learn.adafruit.com/adafruit-gfx-graphics-library/overview">Adafruit_GFX</a> library. It’s a somewhat niche convention that has tripped me up multiple times despite how simple it is, so I’ll cover it here.</p>
<p>The Adafruit_GFX library has set conventions that are now widespread across the Arduino ecosystem. Even up to function signatures (name, input types, output types, etc), the way to interact with adhering display libraries <em>doesn’t change</em> from library to library—save a couple of lines or so. For example, my transition of this project from an RGB LED matrix to the CYD was <em>trivial</em>, yet there couldn’t be more of a gap between their technologies. This is because the libraries I used for them, <a href="https://github.com/adafruit/Adafruit_Protomatter">Adafruit_Protomatter</a> and TFT_eSPI respectively, adhered to the conventions.</p>
<p>One of these conventions is their coordinate system. When I say coordinate system, “Cartesian” might be the word that pops into your mind, but the Cartesian coordinate system was <em>not</em> what Adafruit_GFX used, even though they do refer to pixels by “(x, y)” coordinates. In the ordinary Cartesian system, the positive-x direction is rightwards, and the positive-y direction is upwards. They had them be rightwards and <em>downwards</em> respectively.</p>
<figure>
<img src="/images/2023-7-30-figure2.png" alt="diagram showing Adafruit_GFX coordinates" />
</figure>
<p>This should be compared to the way a 2D array is indexed in C. Given the array <code class="language-plaintext highlighter-rouge">float A[N][M]</code>, <code class="language-plaintext highlighter-rouge">A[i][j]</code> refers to the element <code class="language-plaintext highlighter-rouge">i</code> rows down and <code class="language-plaintext highlighter-rouge">j</code> columns to the right. This notation is just a fact of C, but to keep things clear in a moment, I’ll refer to it as “matrix indexing”.</p>
<figure>
<img src="/images/2023-7-30-figure3.png" alt="diagram showing matrix indexing" />
<figcaption>Note: <code>i</code> and <code>j</code> are represented in this diagram as "i, j"</figcaption>
</figure>
<p>In a way, we can think of <code class="language-plaintext highlighter-rouge">i</code> and <code class="language-plaintext highlighter-rouge">j</code> as coordinates. In fact, if we equate <code class="language-plaintext highlighter-rouge">i</code> to “y” (the downward-pointing one used by Adafruit_GFX, which I’m writing here in quotes) and <code class="language-plaintext highlighter-rouge">j</code> to “x”, then I’d argue that matrix indexing and the Adafruit_GFX coordinates are <em>wholly equivalent</em>—as long as we adhere to this rename. Well, we don’t end up sticking with it, actually.</p>
<p>We’ll cover this in more depth in the next posts, but it turns out that the type of fluid simulation I’m using is constructed on a Cartesian grid which <em>doesn’t</em> use matrix indexing. Points on the grid are referred to by their Cartesian coordinates (x, y), exactly as you’d expect. It’s also starkly different from the Adafruit_GFX coordinates “(x, y)”. (In this article, I’ll write (x, y) when I mean the Cartesian coordinates and “(x, y)” when I mean the Adafruit_GFX coordinates.) At the same time, <code class="language-plaintext highlighter-rouge">i</code> and <code class="language-plaintext highlighter-rouge">j</code> can be used to refer to the point on the grid at the <code class="language-plaintext highlighter-rouge">i</code>-th column from the left and the <code class="language-plaintext highlighter-rouge">j</code>-th row from the bottom. I’ll refer to it as “Cartesian indexing”.</p>
<figure>
<img src="/images/2023-7-30-figure4.png" alt="diagram showing Cartesian indexing" />
<figcaption>Note: <code>i</code> and <code>j</code> are represented in this diagram as "i, j"</figcaption>
</figure>
<p>Increasing <code class="language-plaintext highlighter-rouge">i</code> moves you rightward, and increasing <code class="language-plaintext highlighter-rouge">j</code> moves you upward. In other words, <code class="language-plaintext highlighter-rouge">i</code> specifies the x-coordinate, and <code class="language-plaintext highlighter-rouge">j</code> specifies the y-coordinate. This correspondence between Cartesian coordinates and Cartesian indexing is <em>flipped</em>, roughly, from the correspondence between Adafruit_GFX coordinates and matrix indexing. It’s not an exact reversal because that postitive-y means up while positive-“y” means down.</p>
<figure>
<img src="/images/2023-7-30-figure5.png" alt="diagram showing the axes of matrix and Cartesian indexing" />
<figcaption>The axes of matrix and Cartesian indexing</figcaption>
</figure>
<p>What does this mean for us? What’s the consequence? We need to change coordinate systems i.e. transform the touch inputs. Fortunately, I’ve found a cheap trick for this. If you look at the <code class="language-plaintext highlighter-rouge">i</code>’s and <code class="language-plaintext highlighter-rouge">j</code>’s in the above diagram (and set aside the conflicting x’s and y’s), you may suspect that the transform we need to do with a rotation. I did try this, and it did work. That said, the trick is to know that the physics doesn’t change if we run the simulation <em>on a space that is itself rotated</em>.</p>
<figure>
<img src="/images/2023-7-30-figure6.png" alt="diagram showing the trick, running the sim in that is space rotated relative to the screen, demonstrating that Cartesian indexing and matrix indexing on the same space gives points in that space two different coordinates, whereas the trick forces the points in both indexing schemes to have the same coordinates" />
<figcaption>With the trick, the screen and sim no longer operate on the same space, but corresponding points have the same coordinates/indices. Without the trick, points in the shared space have different coordinates/indices for the sim and screen.</figcaption>
</figure>
<p>Going about it this way, the <code class="language-plaintext highlighter-rouge">i</code> and <code class="language-plaintext highlighter-rouge">j</code> of a pixel on the screen, using matrix indexing, and the <code class="language-plaintext highlighter-rouge">i</code> and <code class="language-plaintext highlighter-rouge">j</code> of a point in the simulation space, using Cartesian indexing but also being rotated relative to the screen, are identical. With this trick, the transform is to do nothing! (If we speak in x, “x”, y, and “y”, instead, then that’s a swap of the axes, but it’s more like swapping labels.)</p>
<p>It also happens here that the actual arrays used for sim operations are now the same shape as the arrays used for screen operations. This comes from the correspondences we mentioned before being flipped.</p>
<p>Combining the physical rotation of the sim space with the scaling that also accounts sim being sixteenth-resolution, we now have taken a touch from the XPT2046 format to the sim space.</p>
<p>That leaves the third part to capture: an estimate of the velocity. There is nothing built in for this, so I had to tease out one out. An idea that I exploited to get it is that, as the stylus is dragged across the screen, it had to have traveled from where we last observed a touch to where we see a touch now. This is a displacement that we divide by the time elapsed to get an <em>approximation</em> of the velocity. To use vector notation, we can write this as the expression</p>
\[\tilde {\bold v} = \frac{\Delta \bold x}{\Delta t}\]
<p>where $\Delta t$ is the time elapsed and $\Delta \bold x$ is a vector composed of the change in the $x$-coordinate and the change in the $y$-coordinate. (We can use either coordinate system’s definition of x and y for this, trick or not.) This approximation gets less accurate as $\Delta t$ increases, but I settled for 20 ms without too much thought. I just enforced this period with a delay.</p>
<figure>
<img src="/images/2023-7-30-figure7.png" alt="diagram showing how we approximate velocity using the previous displacement" />
<figcaption>Appoximating the current velocity with the previous displacement</figcaption>
</figure>
<p>That said, the caveat is that this idea doesn’t define what to do when the user <em>starts</em> to drag the stylus, where there is no previous touch. Strictly speaking, we can save ourselves from going into undefined territory if we code in this logic: if the user was touching the screen before <em>and</em> is still touching the screen now, then we can calculate a velocity, and in all other cases (not now but yes before, not now and not before, and yes now but not before) we cannot.</p>
<p>Finally, if we had detected a touch and calculated a velocity, then the touch task succeeded in generating a valid input, and this can be put in the queue to be served to the sim!</p>
<p>That leaves the render task, using the TFT_eSPI library. We’ll again cover this in a future part, but the fluid simulation puts out individual arrays for red, green, and blue, but they together represent the color. Let’s say that I had full-resolution arrays instead of sixteenth-resolution ones. Then, we’ve already set the sim up such that we need not do anything to change coordinate systems. Every pixel on the screen is some <code class="language-plaintext highlighter-rouge">i</code> rows down and some <code class="language-plaintext highlighter-rouge">j</code> columns to the right, and its RGB values can be found at (i, j) in the respective arrays. The approach would be to go pixel by pixel, indexing into the arrays with the pixel’s singular <code class="language-plaintext highlighter-rouge">i</code> and <code class="language-plaintext highlighter-rouge">j</code>, encoding the RGB values into 16-bit color, and then sending it out. It would have been as simple as that.</p>
<p>Now, let’s reintroduce the fact that we only have sixteenth-resolution arrays.</p>
<p>Because this now means that the arrays correspond to a screen <em>smaller</em> than the one we have, we have a classic upscaling problem. There are sophisticated ways to go about it, but I went with the cheapest one: each element in the array gets to be a 4x4 square on the screen. From what I could tell, it was all I could afford. Because it meant that the 4x4 square was of a single color, I could reuse the encoding work sixteen times! Really though, if I had more computing power, I suspect that this would’ve been an excellent situation for those sophisticated methods to tackle.</p>
<p>This choice of upscaling alone might offer a fast enough render of the fluid colors, especially if we batch the 16 pixels that make up the square into a single call of <code class="language-plaintext highlighter-rouge">fillRect()</code>. That’s one of the functions that was established by Adafruit_GFX. However, I found that I needed an even faster render, so I turned to some features that were unique to TFT_eSPI: “Sprites” and the “direct memory access” (DMA) mode.</p>
<p>Now, googling for “direct memory access” is bound to yield what it is and exactly how to implement it, but to use the DMA mode offered by TFT_eSPI, we only need to know the general idea. That is, a peripheral like the display bus can be configured to read a range of memory <em>without the CPU handling it</em>. For us, this means we would be able to construct the next batch of pixels <em>while</em> the last one is being transferred out. However, to do this effectively, we’ll need to batch together more than just 16 pixels.</p>
<p>That’s where “Sprites” come in. Yes, you might think of pixel-art animation when I say “sprites”, but here, it’s a convenient wrapper around some memory. Presenting itself as a tiny virtual screen, called the “canvas”, it offers the same functions that we can expect from a library following Adafruit_GFX. As long as we remember to use coordinates that place the square in this canvas (and <em>not</em> the whole screen!), we can load it up with many squares using the same <code class="language-plaintext highlighter-rouge">fillRect()</code> call, but under the hood, no transferring takes place yet. Once this sprite is packed with squares, only then do we initiate a transaction with a single call to <code class="language-plaintext highlighter-rouge">pushImageDMA()</code>, this function invoking the DMA mode. From there, we can start packing a new batch of squares at the same time.</p>
<figure>
<img src="/images/2023-7-30-figure8.png" alt="" />
<figcaption><code>fillRect()</code> is called with (x_local, y_local) as where the square starts, <code>pushImageDMA</code> is called later with (x_canvas, y_canvas) as where the sprite starts, and meanwhile the previous sprite is still transferring</figcaption>
</figure>
<p>The caveat: if we pack squares into the <em>same</em> memory that we’re transferring out with DMA, then we’d end up overwriting squares before they ever reach the display. Therefore, we’d want two sprites—one for reading and one for writing—and then we’d flip which gets read from and which gets written to. This flip would happen after initiating the transaction but before we start packing new squares. Finally, the terminology for this is “double buffering”, more specifically that’s the <a href="https://en.wikipedia.org/wiki/Multiple_buffering#Page_flipping">“page flipping”</a> approach to it, but the purpose of it here is more than just preventing screen tearing.</p>
<figure>
<img src="/images/2023-7-30-figure9.png" alt="" />
<figcaption>Classical page flipping is two buffers and two pointers: no data is copied between the buffers but the pointers get swapped.</figcaption>
</figure>
<p>That covers the touch and render tasks, altogether describing how I used the touchscreen module. But between the hardware and my code are the TFT_eSPI and XPT2046_Touchscreen libraries, and that set in stone the features and conventions I got to work with. In particular, I had to lay out the exact relationship between Cartesian indices and the “(x, y)” Adafruit_GFX coordinates that have become ubiquitous across Arduino libraries, in large part because of the Adafruit_GFX library. With the rotation trick, we eliminated the transform between them. With that in mind, using XPT2046_Touchscreen was just a matter of scaling and maintaining a bit of memory. On the other hand, I turned to the DMA mode and “Sprites” that were uniquely offered by the TFT_eSPI library just to keep pace. Those features also kept within the Adafruit_GFX box, so just a bit of extra care (double buffering, that is) was needed.</p>
<p>With this post and the last post done, there’s one last task to cover: the <a href="/2023/09/22/esp32_fluid_sim_3.html">next post</a> is an overview of the physics before we get into the implementation. Stay tuned! But if you’re here before that post comes out, there’s always the code itself <a href="https://github.com/colonelwatch/ESP32-fluid-simulation">on Github</a>.</p>Kenny Pengknypng44@gmail.comSo, how exactly did my rebuild of ESP32-fluid-simulation do the touch and render tasks? This post is the second in a series of posts about it, and the first was a task-level overview of the whole project. But while it’s nice and all to know the general parts of the project and how they communicate in a high-level sense, the meat of it is the implementation, and I’m here to serve it. The next parts are dedicated to the sim physics, but we’ll talk here about the input and output: the touch and screen of a touchscreen.Rebuilding ESP32-fluid-simulation: overview of tasks and intertask communication in FreeRTOS (Part 1)2023-07-21T00:00:00+00:002023-07-21T00:00:00+00:00http://kenny-peng.com/2023/07/21/esp32_fluid_sim_1<p>I graduated from college a couple of months ago, and ever since I’ve been interested in revisiting the things I put out while I was still learning. In particular, I obsessed over how I could make it appear more accessible <em>and</em> more professional. To that end, I decided that I needed to tie my works closer to established research and—if not that—fundamental concepts that are easy to look up. I had been trying that with my blog posts, but this new post is about <a href="https://github.com/colonelwatch/ESP32-fluid-simulation">ESP32-fluid-simulation</a>, namely one of my old projects about fluid simulation on an ESP32.</p>
<p>Coincidentally, I was lurking on <a href="https://www.youtube.com/c/brianlough">Brian Lough’s</a> Discord channel when I learned of a cool new development board, packing an ESP32 and a touchscreen. Retailing for just about $14 when you count shipping, it was far more accessible than the RGB LED matrix I was using back then. It seemed like a perfect platform to target my new edition of this old fluid sim, and I could even add touch input while I was at it.</p>
<figure>
<img src="/images/2023-7-21-figure1.jpeg" alt="demo of ESP32-fluid-simulation, showing the colors of the screen being stirred by touch" />
</figure>
<p>So, how did this project get built again using established research and otherwise stuff you can look up? I’m trying to be thorough here, so this will actually be the first out of three posts. Where we start and where I started is at the highest level: the breakup of a single loop that does everything into many loops that are smaller, share time on a processor, and communicate with each other. (This is also the perfect chance to show what this project does at a high level.) After this post, we can get to the input, rendering, and simulation itself.</p>
<p>What allows a processor to split its time and facilitate this communication is a <a href="https://en.wikipedia.org/wiki/Real-time_operating_system">“real-time operating system” (RTOS)</a>. I don’t have the expertise to summarize everything that an RTOS is, but I can safely say that two things (not exclusive) that an RTOS can do, split processor time and facilitate communication, are things an “operating system” (OS) can do generally. Why this disclaimer? My knowledge about these features mainly comes from a lesson in parallel programming on Linux that I took in school. This and “concurrent programming” on an RTOS have some overlapping concepts, but they’re not the same. In fact, the difference led me to a real trip-up as I was rewriting this project, and I can detail how this happened along the way.</p>
<p>The part of operating systems—generally—that allows a processor to split time is the scheduler. Let’s lay out the characteristics of the scheduler that gets used in the ESP32. The ESP-IDF comes with its own distribution of the open-source <a href="https://en.wikipedia.org/wiki/FreeRTOS">FreeRTOS</a>, this version being called <a href="https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-reference/system/freertos_idf.html">“ESP-IDF FreeRTOS”</a>, and it can <a href="https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-reference/system/freertos_idf.html#preemption">be</a> <a href="https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-reference/system/freertos_idf.html#time-slicing">shown</a> that it roughly matches the default configuration. That configuration is <a href="https://www.freertos.org/single-core-amp-smp-rtos-scheduling.html">“preemptive” scheduling with “round-robin time-slicing” for “equal priority tasks”</a>. What do all those keywords mean? “Preemptive” means that splitting processor time is achieved by having higher-priority loops (“tasks” in FreeRTOS terminology) interrupt lower-priority loops. With few exceptions, higher-priority tasks <em>always</em> interrupt lower-priority tasks. These tasks do what they do and then stop interrupting, though they <em>themselves</em> can be interrupted by even higher-priority tasks. The below diagram shows one example of how this happens.</p>
<figure>
<img src="/images/2023-7-21-figure2.png" alt="example of task preemption" />
<figcaption>The highest-priority available task is the one that will be run</figcaption>
</figure>
<p>“Round-robin time-slicing” for “equal priority tasks” just means that tasks take turns in the case of a tie.</p>
<figure>
<img src="/images/2023-7-21-figure3.png" alt="example of round-robin time-slicing" />
<figcaption>Two tasks that are equal in priority do not interrupt each other, but a scheduler with round-robin time-slicing will still split time between them</figcaption>
</figure>
<p>When the scheduling works, all tasks can appear to be running at the same time!</p>
<figure>
<img src="/images/2023-7-21-figure4.png" alt="the ideal" />
<figcaption>The ideal</figcaption>
</figure>
<p>Still, this scheduler behavior isn’t the same as in Linux. On one hand, a high-priority task is guaranteed to run on time, barring even higher-priority tasks and said exceptions (keyword “priority inversion”). On the other, if a high-priority task runs <em>forever</em>, then a lower-priority task <em>never</em> runs. That’s been termed “starvation”. This is what I accidentally caused, but to describe how I got there, we need to lay out the actual tasks that make up the project along with that other feature of an RTOS I mentioned: facilitating communication between tasks.</p>
<p>Originally, ESP32-fluid-simulation was written like any other Arduino project. It used the <code class="language-plaintext highlighter-rouge">setup()</code> and <code class="language-plaintext highlighter-rouge">loop()</code> functions for code that ran once and code that ran forever, respectively. Putting aside the code in <code class="language-plaintext highlighter-rouge">setup()</code>, the <code class="language-plaintext highlighter-rouge">loop()</code> function had five general blocks: (1) calculate new internal velocities, (2) add user input to the new velocities, (3) correct the new velocities, (4) calculate new fluid colors using the corrected velocities, and finally (5) render the new colors. For context, we capture everything we want to model about the fluid using just the internal velocities and color, but we’ll get to that in a later post. Altogether, this sequence can be visualized with a simple flowchart, showing the whole big loop.</p>
<figure>
<img src="/images/2023-7-21-figure5.png" alt="flowchart of original design, showing blocks in sequence" />
</figure>
<p>However, the Arduino core for ESP32 was written on top of ESP-IDF, and we’ve already established that the ESP-IDF uses FreeRTOS. As a result, all FreeRTOS functions can be called in Arduino code (not even a header <code class="language-plaintext highlighter-rouge">#include</code> is needed!). So, I immediately broke out the five blocks into three tasks: an touch task, a simulation task, and a render task. In each task, the input of a block might be the output of another block that sits in another task, and we’ll get the data across… somehow. We’ll get to that. With this in mind, we can at least update the flowchart to show three concurrent tasks and the data dependencies between them.</p>
<figure>
<img src="/images/2023-7-21-figure6.png" alt="preliminary flowchart of new design, showing three concurrent sequences of blocks and data dependencies between them, in blue" />
</figure>
<p>The missing thing here is the facilitation of communication, which I left <em>exclusively</em> to FreeRTOS. To be more precise, FreeRTOS offers a couple of “synchronization primitives” that can be used to guarantee that “race conditions” never happen. Ignore using synchronization primitives in your concurrent applications at your own peril, for “race conditions” mean that the result depends on whatever way the scheduler executes your tasks. In other words, you can’t depend on the result at all! For example, the classic bank account example shows how a badly coded ATM network can vanish your money, thanks to a race condition.</p>
<figure>
<img src="/images/2023-7-21-figure7.png" alt="example of how race conditions can obliterate your bank balance" />
</figure>
<p>I can’t cover every synchronization primitive, but the two I need to cover are the “binary semaphore” and the “mutex”. I’ll also cover the “queue”, an all-in-one package for safe communication that FreeRTOS offers. (You can see the <a href="https://www.freertos.org/features.html">FreeRTOS documentation</a> for the rest, but the <a href="https://www.digikey.com/en/maker/projects/what-is-a-realtime-operating-system-rtos/28d8087f53844decafa5000d89608016">guide to FreeRTOS offered by Digikey</a> is also useful.) As we cover these in the context of my three tasks, we’ll also be able to go over my trip-up.</p>
<p>A <a href="https://en.wikipedia.org/wiki/Lock_(computer_science)">“mutex”</a> is the canonical solution to our bank account race condition. A task must “take” the mutex, read and write the balance (in general, any shared memory), and finally “give” back the mutex. Because no interrupting task can take a mutex that is already taken, the race condition is prevented! This guarantee is called “mutual exclusion”. Furthermore, while the interrupting task cannot take the mutex it is forced to wait until it can, and in that time the scheduler is free to run lower-priority tasks. When the interrupting task runs into this, it’s in a “blocked” state.</p>
<figure>
<img src="/images/2023-7-21-figure8.png" alt="example of how a locked mutex causes a thread to be blocked" />
</figure>
<p>A <a href="https://en.wikipedia.org/wiki/Semaphore_(programming)">“binary semaphore”</a> has a different canonical purpose. Quite simply, one task is blocked until another task says it can go ahead, and this go-ahead flag is then reset after that. Because the other task gives the go-ahead, it can also complete any operations it needs to complete before then. This guarantee is called “precedence”.</p>
<figure>
<img src="/images/2023-7-21-figure9.png" alt="example of how a semaphore that has not been incremented causes a thread to be blocked" />
<figcaption>See <a href="https://stackoverflow.com/questions/29606162/what-is-the-original-meaning-of-p-and-v-operations-in-a-context-of-a-semaphore">the StackOverflow question</a> for what "P" and "V" stand for, but they pretty much mean the semaphore operations this figure implies</figcaption>
</figure>
<p>Finally, I’ll only be vague here because the <a href="https://www.freertos.org/Embedded-RTOS-Queues.html">FreeRTOS documentation on “queues”</a> is clear enough already: besides the classic synchronization primitives, an all-in-one package for communication between tasks, called a “queue”, is also offered. Tasks can just send to the queue and receive from the queue—all without triggering race conditions. Further, if a task is sending to a full queue or receiving from an empty one, it is blocked. They’re quite convenient in that sense!</p>
<p>All said, when I say that a task is “blocked”, that’s because we’re using the “blocking” mode. FreeRTOS also offers a “non-blocking” mode that instead lets the task do something else, and this non-blocking mode also offers the same guarantees. In all cases except one, I used the blocking mode.</p>
<p>Moving on, how do these apply to our three tasks? Between the touch task and the simulation task, I just needed the touch task to pass along valid touches to the simulation task. For that, I defaulted to a queue, and I used the non-blocking mode here to make the simulation task receive everything in the queue but move on after that. I left the touch task to send into the queue in the blocking mode. Between the simulation task and the render task, however, the semantics of a queue didn’t make much sense. After all, would I really “send” a set of large arrays (representing fluid color) between tasks? Instead, I allocated a single set of arrays and managed to make the two tasks share the set without race conditions. The race conditions I was anticipating: the simulation task starts updating the fluid colors while the render task is still reading them, or the render task starts reading while the simulation task is still writing.</p>
<p>At first, I thought that I only needed a mutex. If I wasn’t using an RTOS, this technically would’ve worked, but therein lay my problem. I needed semaphores instead. Why I couldn’t do without semaphores has to do with the preemptive scheduling built into FreeRTOS. Because the render task happened to have a higher priority than the simulation task, it would take the mutex, give it back, and then <em>immediately take it back again</em>.</p>
<figure>
<img src="/images/2023-7-21-figure10.png" alt="figure of the sim task never getting unblocked because the render thread is not stopped" />
</figure>
<p>Nothing stopped the render task from running forever, and so the simulation task was starved. If the scheduler was more like the Linux scheduler or if the tasks were on equal priority, then the simulation task technically would’ve gotten to take the mutex eventually. But I’m glad that I wasn’t technically correct because that forced me to acknowledge the semaphore-based solution to the race condition. This solution also worked on FreeRTOS and didn’t involve the processor wasting time on a task that spun between taking and giving back the mutex endlessly. Using binary semaphores, I got this: a write is always preceded by a complete read, and a read is always preceded by a completed write. In the following diagram, the former is represented by semaphore “R”, and the latter is represented by semaphore “W”.</p>
<figure>
<img src="/images/2023-7-21-figure11.png" alt="figure of the sim and render tasks running concurrently, each task being blocked by a semaphore that the other task eventually raises" />
</figure>
<p>Each semaphore prevented one of the race conditions, but they also blocked the tasks from spinning.</p>
<p>Now with the queue and these binary semaphores in mind, that completes how I broke apart a single Arduino <code class="language-plaintext highlighter-rouge">loop()</code> into smaller tasks that safely pass data to each other. To visualize it in its entirety, we can update the flowchart with this communication.</p>
<figure>
<img src="/images/2023-7-21-figure12.png" alt="flowchart of new design, showing three concurrent sequences of blocks and communication between them, in blue" />
</figure>
<p>To explain the symbols a bit, the pipeline symbol stands for the queue, and the document symbol stands for the shared fluid colors. The dashed arrows represent communication between the tasks, pointing from where it’s initiated to where it’s awaited. (As we’ve established, they literally do wait for it!)</p>
<p>All said, while this post and flowchart emphasized the concurrent programming with safe communication that FreeRTOS offers, it also happens to serve as a high-level overview of this reimagining of my old project—and from a task-focused perspective at that! This nicely sets the stage for explaining what each task does in the next posts. Stay tuned to read about the touch and render tasks in the <a href="/2023/07/30/esp32_fluid_sim_2.html">next post</a>!</p>
<p>If you’re already here before that post comes out though, there’s always the code itself at the <a href="https://github.com/colonelwatch/ESP32-fluid-simulation">ESP32-fluid-simulation</a> repo on GitHub.</p>Kenny Pengknypng44@gmail.comI graduated from college a couple of months ago, and ever since I’ve been interested in revisiting the things I put out while I was still learning. In particular, I obsessed over how I could make it appear more accessible and more professional. To that end, I decided that I needed to tie my works closer to established research and—if not that—fundamental concepts that are easy to look up. I had been trying that with my blog posts, but this new post is about ESP32-fluid-simulation, namely one of my old projects about fluid simulation on an ESP32.Recoloring backgrounds to align with the Solarized base palette again (plus color, light mode support, and a demo!)2023-06-02T00:00:00+00:002023-06-02T00:00:00+00:00http://kenny-peng.com/2023/06/02/solarized_background_2<p>A couple of months back, I wrote <a href="/2022/11/06/solarized_background.html">“Recoloring backgrounds to align with the Solarized Dark base palette”</a>, and when I wrote that I wasn’t expecting to do a second part. At the time, because I had just encountered the <a href="https://ethanschoonover.com/solarized/">Solarized</a> palette, I didn’t even begin to fathom how you could add colors to the backgrounds. Still, even then I could imagine what it would look like, and shortly after I wrote that article I started to go down what seemed like the right path. I found myself making a 3D scatter plot of the entire Solarized palette as <a href="https://en.wikipedia.org/wiki/CIELAB_color_space">CIELAB</a> values, and it looked to me like a spinning top in the middle of falling over.</p>
<figure>
<img src="/images/2023-6-2-figure1.png" alt="Solarized palette as points in CIELAB space" />
</figure>
<p>So, I thought that all I might need to do was transform the colors of an image into points in CIELAB space, tip them over just the same, and then transform them back into RGB color. However, I didn’t come around to trying that idea until now. After a great deal of experimentation, I’ve found a particular style of “solarizing” images that generally works for any image: start by following the monochrome scheme that aligns with the Solarized base palette, then allow some subtle tinting with the other colors.</p>
<figure>
<img src="/images/2023-6-2-figure2.png" alt="Solarized Carina cliffs with color" />
</figure>
<p>You can try it for yourself using a demo I put on <a href="https://huggingface.co/spaces/colonelwatch/background-solarizer">HuggingFace</a>.</p>
<p>Ultimately, it didn’t just involve tipping over a top. The general outline for achieving the effect is this:</p>
<ol>
<li>Transform the colors of the image into points in CIELAB space,</li>
<li>reduce their saturation/”chroma” component,</li>
<li>remap their lightness component,</li>
<li>rotate and shift them (still in CIELAB space), and finally</li>
<li>transform them back into RGB color.</li>
</ol>
<p>It’s worth noticing here that all the work was done in CIELAB space. It is the coordinate space in which the Solarized palette was canonically defined, but it’s also a space with a very convenient property. That is: the lightness of a color is an independent component. Out of the components of a point in CIELAB space, $L$, $a$, and $b$, lightness is just $L$. Given some—say—purple, you can get the same purple but brighter or darker by varying just $L$, and you leave the $a$ and $b$ components alone. If we worked in RGB instead, we would have to vary the red, green, and blue components together.</p>
<p>The $a$ and $b$ components together form a plane of all possible mixtures of the primary colors, and a specific $a$ and $b$ mean a specific mixture. Going in the $+a$ direction gets a redder mixture. The $-a$ direction gets a greener mixture. $+b$ gets a yellower one, and $-b$ a bluer one. That said, in this case, we should think about the $a-b$ plane in polar coordinates. In polar, the angle is called the “hue” (the very same hue that you’d pick from a color wheel), and the magnitude is called the saturation or “chroma”.</p>
<p>The $L$, $a$, and $b$ components all have meanings that make each step of the process into simple operations. On top of that, <code class="language-plaintext highlighter-rouge">scikit-image</code> gives us convenient functions that step <a href="https://scikit-image.org/docs/stable/api/skimage.color.html#skimage.color.rgb2lab">in</a> and <a href="https://scikit-image.org/docs/stable/api/skimage.color.html#skimage.color.lab2rgb">out</a> of CIELAB space, called <code class="language-plaintext highlighter-rouge">rgb2lab</code> and <code class="language-plaintext highlighter-rouge">lab2rgb</code> respectively. That’s the advantage of working in CIELAB space. With that in mind, what are we trying to do in each step? We’ll want to cover this backward, starting with the shift and rotate—the meat of the method!</p>
<p>In my previous post, I chose to throw out color, and then I mapped the grayscale values onto the line going through the Solarized base palette in CIELAB space.</p>
<figure>
<img src="/images/2023-6-2-figure3.png" alt="Solarized palette as points in CIELAB space with line" />
</figure>
<p>However, all grayscale values can be thought of as the line where $a=0$ and $b=0$, or in other words the $L$-axis, and “throwing out color” can simply be thought of as a linear projection of all values onto it. Because we can think of the Solarized base palette as a line and all grayscale values as another line, a similar (but not the same) way to do what I did before is to do the projection then apply an “affine” function. “Affine” functions take the general form</p>
\[y = Ax+b\]
<p>and they differ from linear functions (<em>their</em> general form being $y=Ax$) only by a translation, expressed as the additional term $b$. Using an affine function makes sense here because the canonical center of CIELAB space is $(50, 0, 0)$, not the origin. (For that matter, the center of the Solarized base palette isn’t the origin either.)</p>
<p>On the mention of an affine function, you might follow up that thought by solving for A and b, perhaps by using a linear algebra package. In fact, though we have the Solarized base palette to possibly serve as $y$, we have <em>nothing</em> to serve for $x$. Before anyone mentions it, the Solarized website shows the colors it replaces for the xterm program, but a different set of colors of a different program can be replaced by the Solarized palette just the same. If we took the xterm colors as $x$, then we can just as arbitrarily take the colors of Google Chrome or Visual Studio Code as $x$. That is to say again: we have no solid choice for $x$. In that way, we’re forced to give up on using data to determine $A$ and $b$.</p>
<p>Instead, let’s give $A$ and $b$ some value, but we’ll guide our choice with some intuition. We’ll start with this: since we already know the center of CIELAB space and the center of the Solarized base palette, we can rewrite the affine transform as</p>
\[y - y_0 = A (x - x_0)\]
<p>where we should notice that we implicitly set $b$ to $y_0 - A x_0$. This intuitively defines $b$ as whatever brings the center of $Ax$ from $A x_0$ to $y_0$.</p>
<p>That leaves defining $A$. Given that we’re passing in $x-x_0$ and getting out $y-y_0$, subtraction of the centers $x_0$ and $y_0$ means we’re actually passing in a line through the origin and getting out a different line through the origin. The natural operation that should come to mind here is rotation.</p>
<p>One definition of a rotation matrix is parameterized by <a href="https://en.wikipedia.org/wiki/Rotation_matrix#General_rotations">yaw, pitch, and roll</a></p>
\[\begin{align*} A & = \underbrace{ \begin{bmatrix} \cos\alpha & -\sin\alpha & 0 \\ \sin\alpha & \cos\alpha & 0 \\ 0 & 0 & 1 \end{bmatrix} }_\text{yaw} \underbrace{ \begin{bmatrix} \cos\beta & 0 & \sin\beta \\ 0 & 1 & 0 \\ -\sin\beta & 0 & \cos\beta \end{bmatrix} }_\text{pitch} \underbrace{ \begin{bmatrix} 1 & 0 & 0 \\ 0 & \cos\gamma & -\sin\gamma \\ 0 & \sin\gamma & \cos\gamma \end{bmatrix} }_\text{roll} \\ & = \begin{bmatrix} \cos\alpha \cos\beta & \cos\alpha \sin\beta \sin\gamma - \sin\alpha \cos\gamma & \cos\alpha \sin\beta \cos\gamma - \sin\alpha \sin\gamma \\ \sin\alpha \cos\beta & \sin\alpha \sin\beta \sin\gamma + \cos\alpha \cos\gamma & \sin\alpha \sin\beta \cos\gamma - \cos\alpha \sin\gamma \\ -\sin\beta & \cos\beta \sin\gamma & \cos\beta \cos\gamma \end{bmatrix} \end{align*}\]
<p>where $\alpha$, $\beta$, and $\gamma$ are the yaw, pitch, and roll angles respectively.</p>
<p>In my previous post, I found that the principal component of the Solarized base palette line was $(0.9510, 0.1456, 0.2726)$. For the $L$-axis, we can just take $(1, 0, 0)$ as the unit vector that spans it. Since these two vectors are unit-length, we can say that the rotation matrix is such that</p>
\[\begin{bmatrix} 0.9510 \\ 0.1456 \\ 0.2726 \end{bmatrix} = A \begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix}\]
<p>Solving for $\alpha$, $\beta$, and $\gamma$ yields</p>
\[\begin{bmatrix} 0.9510 \\ 0.1456 \\ 0.2726 \end{bmatrix} = \begin{bmatrix} \cos\alpha \cos\beta \\ \sin\alpha \cos\beta \\ -\sin\beta \end{bmatrix}\]
\[\begin{align*} \alpha & = 0.152 \\ \beta & = -0.275 \\ \gamma & \text{ is free} \end{align*}\]
<p>where we happen to find that roll about the $L$-axis, or in other words hue rotation, doesn’t matter! Let’s just let $\gamma = 0$.</p>
<p>We’ve now fully defined the shift and rotate, that being an affine transform. Therefore, we could now get something like my old post while working entirely in CIELAB space. Instead, remember that we could throw out colors by projecting onto the $L$-axis? To get colors, we just <em>don’t do that</em> and then proceed with the shift and rotate anyway! Let’s visualize what we’ve done so far with the help of this diagram.</p>
<figure>
<img src="/images/2023-6-2-figure4.png" alt="Shift and rotate breakdown" />
</figure>
<p>Now, what about the preprocessing steps?</p>
<p>Let’s look at the lightness remap first. Solarized is a low-contrast palette that offers a light mode and a dark mode. If we flip to the <a href="https://ethanschoonover.com/solarized/#usage-development">development section</a> of the Solarized documentation, we find that it does so by assigning an upper and lower subset (not mutually exclusive) of the base palette to each respectively.</p>
<p>Given one mode or another, a fair expectation is that colors <em>exclusive to the alternate mode</em> are never encountered or else the theme is not low-contrast! For the same reason, we shouldn’t expect colors that are outside both palettes as well. Therefore, we need to restrict the range in which we expect points going through the rotate and shift to land, and that target range is a segment of the line going through the base palette along with the neighborhood around that segment.</p>
<figure>
<img src="/images/2023-6-2-figure5.png" alt="Shift and rotate breakdown" />
</figure>
<p>Taking the dark mode first, the target range is the segment between <code class="language-plaintext highlighter-rouge">base03</code> and <code class="language-plaintext highlighter-rouge">base1</code>—excluding the brightest <code class="language-plaintext highlighter-rouge">base2</code> and <code class="language-plaintext highlighter-rouge">base3</code>—and the neighborhood around it. We can invert the rotate and shift to find what values on the $L$-axis they correspond to. That’s how we find that the condition for achieving the target range is $8.1397 < L < 59.4372$. Therefore, if we remap the points of the input such that their lightness components fall into that range, we’re golden. The remap is</p>
\[L_\text{new} = \frac{59.4372-8.1397}{100-0} L + 8.1397\]
<p>where $100$ and $0$ are the maximum and minimum possible lightness. On top of that, we don’t need to touch the $a$ and $b$ components. However, this remap may as well be the definition of destroying contrast, and breaking out of the target range a bit may be worth it. Taking $8.1397 < L < 59.4372$ as just a guideline, we can bounce between setting a new remap and generating a histogram until the distribution of lightnesses mostly falls in that range. I’ve provided an interface for that tweaking on HuggingFace, and we can go through an example in a moment.</p>
<p>Taking the light mode, the target range is between <code class="language-plaintext highlighter-rouge">base01</code> and <code class="language-plaintext highlighter-rouge">base3</code>, ignoring <code class="language-plaintext highlighter-rouge">base03</code> and <code class="language-plaintext highlighter-rouge">base02</code>, and this corresponds to a target lightness of $38.7621 < L < 93.8699$. The rest of the process is the same.</p>
<p>Finally, what about reducing the chroma? We do that to enforce the style, and that called for subtle tinting. As I mentioned before, when we rewrite the $a$-$b$ coordinates as polar coordinates, the chroma is the magnitude and the hue is the angle. So, cutting the chroma by some factor means cutting the magnitude of the $a$-$b$ coordinate. Of course, cutting the $a$ component and the $b$ component each by the same factor is equivalent. If we let the factor by which we cut the chroma be $\mu$, then</p>
\[a_\text{new} = \mu a \qquad b_\text{new} = \mu b\]
<p>where I’ve found that $\mu = 0.25$ is a factor I like.</p>
<p>So, that’s the entire process for “solarizing” a background image defined. Let’s step through it in order with an example to review. We can input the Carina Cliffs into the <a href="https://huggingface.co/spaces/colonelwatch/background-solarizer">Huggingface demo</a>.</p>
<figure>
<img src="/images/2023-6-2-figure6.png" alt="Demo preprocessing" />
</figure>
<p>Here, we see that I had set the actual lightness range to $10 < L < 70$. After I clicked the preprocess button to perform the chroma cut and lightness remapping, we also see that the lightness histogram is acceptably in the target range for Solarized Dark. Finally, I clicked the transform button to perform the shift and rotate, yielding me the new background.</p>
<figure>
<img src="/images/2023-6-2-figure7.png" alt="Demo transform" />
</figure>
<p>In the absence of data to base this process on, we were still successful in finding a way to align backgrounds to the Solarized base palette while also adding a bit of color to it. To do so, we chose sensible and geometric operations in CIELAB space, and we satisfied some constraints by inverting those operations to find the conditions to do so. Though this method works generally, I’ll add that there are places where change might be interesting, perhaps on the matter of defining a new style that works generally or reshaping the distribution of lightnesses. But in any case, though what I did wasn’t exactly tipping over the spinning top, I can have the wonderful colors of the Carina Cliffs back now!</p>Kenny Pengknypng44@gmail.comA couple of months back, I wrote “Recoloring backgrounds to align with the Solarized Dark base palette”, and when I wrote that I wasn’t expecting to do a second part. At the time, because I had just encountered the Solarized palette, I didn’t even begin to fathom how you could add colors to the backgrounds. Still, even then I could imagine what it would look like, and shortly after I wrote that article I started to go down what seemed like the right path. I found myself making a 3D scatter plot of the entire Solarized palette as CIELAB values, and it looked to me like a spinning top in the middle of falling over.Detecting motion in RPLIDAR data using optical flow2023-05-26T00:00:00+00:002023-05-26T00:00:00+00:00http://kenny-peng.com/2023/05/26/rplidar_motion<p>Over a week, I happened to hack together an interesting procedure that ended up being an important part of the senior capstone project I was contributing to. The objective of this procedure: if it moves…</p>
<figure>
<img src="/images/2023-5-26-figure1.gif" alt="tracking of three moving people in a room anim" />
<figcaption>Context: three people in moving a room</figcaption>
</figure>
<p>…detect it! The sensor involved here is the RPLIDAR, a low-cost “laser range scanner” that yields distances from itself at all angles. The principle behind the procedure is <a href="https://en.wikipedia.org/wiki/Optical_flow">“optical flow”</a>, a whole class of techniques for inferring the velocity of an object in a video by looking from frame to frame. The specific technique I used is a classic called the “Lucas-Kanade method”. It turned out that the same reasoning that constructs it (and optical flow more generally) also works with the data taken from the RPLIDAR.</p>
<p>That said, there has to be a fair bit of preprocessing on that data beforehand. I think the preprocessing itself poses an interesting introduction to some backgrounds though, so I’ll cover it too. To see this whole procedure, we’ll use the below example data to visualize the steps. Before, I used that data to devise the procedure in the first place, and it had been collected for me by someone else.</p>
<figure>
<img src="/images/2023-5-26-figure2.gif" alt="raw samples anim" />
</figure>
<p>First, the RPLIDAR yields an irregular sampling of the room around it for a variety of reasons—from protocol overhead to measurement failure. Some may call this kind of data “unstructured”. On the other hand, with video essentially being a grid of dynamically updating pixels, optical flow expects regularly-sampled data. One easy-to-see solution to this is an “interpolation”. The general idea behind “interpolation” is to construct a continuous function that goes through discrete samples, unstructured or not, then collect new, regularly-sampled data from the function.</p>
<p>At the time, I chose to use <a href="https://en.wikipedia.org/wiki/Radial_basis_function_interpolation">“radial basis function” (RBF) interpolation</a>. However, that ended up being a poor choice because something about the data forced me to accept a very relaxed form of it. What do I mean here? The result of an interpolation is not necessarily smooth. The simplest kind of interpolation, <a href="https://en.wikipedia.org/wiki/Linear_interpolation">linear interpolation</a> or “lerp”, is just connecting the samples with straight lines.</p>
<figure>
<img src="/images/2023-5-26-figure3.png" alt="linear interpolation" />
</figure>
<p>Linear interpolations can be extremely jagged for some data. RBF interpolation promises a degree of smoothness on the other hand, but it can also simply fail—to put it shortly. Explaining exactly how it fails seems a bit beyond the scope here, but suffice it to say that it failed here. The result of that failure was the relaxed form, and it amounted to a kind of curve-fitting. Though it still yielded a smooth, continuous function, it no longer went through the points. Well, curve-fitting is another solution to this problem, anyway. We can collect regularly-sampled data from it too.</p>
<figure>
<img src="/images/2023-5-26-figure4.png" alt="curve-fitting" />
</figure>
<p>Here, let’s use a proper curve-fitting procedure in the first place! A good one is the Python <code class="language-plaintext highlighter-rouge">make_smoothing_spline</code> function <a href="https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.make_smoothing_spline.html">offered by SciPy</a>. This routine has some peculiarities, so I’ll leave here an <code class="language-plaintext highlighter-rouge">Interpolator</code> class that has a working use of it.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Interpolator</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">memory_size</span><span class="o">=</span><span class="mi">512</span><span class="p">,</span> <span class="n">lam</span><span class="o">=</span><span class="mf">1e-3</span><span class="p">):</span>
<span class="bp">self</span><span class="p">.</span><span class="n">memory</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">zeros</span><span class="p">((</span><span class="n">memory_size</span><span class="p">,</span> <span class="mi">2</span><span class="p">))</span>
<span class="bp">self</span><span class="p">.</span><span class="n">lam</span> <span class="o">=</span> <span class="n">lam</span>
<span class="k">def</span> <span class="nf">update</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">samples</span><span class="p">):</span>
<span class="bp">self</span><span class="p">.</span><span class="n">memory</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">roll</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">memory</span><span class="p">,</span> <span class="o">-</span><span class="nb">len</span><span class="p">(</span><span class="n">samples</span><span class="p">),</span> <span class="n">axis</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
<span class="bp">self</span><span class="p">.</span><span class="n">memory</span><span class="p">[</span><span class="o">-</span><span class="nb">len</span><span class="p">(</span><span class="n">samples</span><span class="p">):]</span> <span class="o">=</span> <span class="n">samples</span>
<span class="k">def</span> <span class="nf">take</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">x</span><span class="p">):</span>
<span class="c1"># get samples in ascending order of angle
</span> <span class="n">angles</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">memory</span><span class="p">[:,</span> <span class="mi">0</span><span class="p">]</span>
<span class="n">argsort_angles</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">argsort</span><span class="p">(</span><span class="n">angles</span><span class="p">)</span>
<span class="n">angles</span> <span class="o">=</span> <span class="n">angles</span><span class="p">[</span><span class="n">argsort_angles</span><span class="p">]</span>
<span class="n">distances</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">memory</span><span class="p">[:,</span> <span class="mi">1</span><span class="p">][</span><span class="n">argsort_angles</span><span class="p">]</span>
<span class="c1"># remove duplicate angles
</span> <span class="n">angles_dedup</span> <span class="o">=</span> <span class="p">[</span><span class="n">angles</span><span class="p">[</span><span class="mi">0</span><span class="p">]]</span>
<span class="n">distances_dedup</span> <span class="o">=</span> <span class="p">[</span><span class="n">distances</span><span class="p">[</span><span class="mi">0</span><span class="p">]]</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">angles</span><span class="p">)):</span>
<span class="k">if</span> <span class="n">angles</span><span class="p">[</span><span class="n">i</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="o">!=</span> <span class="n">angles</span><span class="p">[</span><span class="n">i</span><span class="p">]:</span>
<span class="n">angles_dedup</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">angles</span><span class="p">[</span><span class="n">i</span><span class="p">])</span>
<span class="n">distances_dedup</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">distances</span><span class="p">[</span><span class="n">i</span><span class="p">])</span>
<span class="c1"># the above was because make_smoothing_spline requires angle[i] > angle[i-1]
</span> <span class="n">interp_func</span> <span class="o">=</span> <span class="n">make_smoothing_spline</span><span class="p">(</span><span class="n">angles_dedup</span><span class="p">,</span> <span class="n">distances_dedup</span><span class="p">,</span> <span class="n">lam</span><span class="o">=</span><span class="bp">self</span><span class="p">.</span><span class="n">lam</span><span class="p">)</span>
<span class="k">return</span> <span class="n">interp_func</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
</code></pre></div></div>
<p>Notice that the samples are stored in a buffer before they get interpolated. The person who looked at the data before me noticed that the Python RPLIDAR driver that was used, <code class="language-plaintext highlighter-rouge">rplidar</code>, only gave bursts of samples that didn’t contain a full rotation. Therefore, I needed to hold on to at least part of the previous burst. The output of this particular code when inputting our example data is this</p>
<figure>
<img src="/images/2023-5-26-figure5.gif" alt="interpolated anim" />
</figure>
<p>However, it’s still noisy. It jitters a little from frame to frame, and I’ve once seen this noise become a problem before. (For the record, this noise was even worse when I used linear interpolation.)</p>
<div class="info-panel">
<h4 id="review-removing-noise-using-low-pass-filters">Review: Removing noise using low-pass filters</h4>
<p><a href="https://en.wikipedia.org/wiki/Low-pass_filter">“Low-pass filters”</a> and <a href="https://en.wikipedia.org/wiki/Filter_(signal_processing)">“filters”</a> in general have a wide variety of uses, but you may or may not be familiar with a major function of “low-pass filters”: removing noise. But to answer why this works, we have to ask ourselves a more basic question: what is noise? In the broadest sense, it’s the part of a signal that we don’t want. In a specific case, we have to <em>decide what we don’t want</em>, deeming that as noise, before we remove it.</p>
<p>Though I don’t trade stocks, a stock’s price is a great example. When people say to “buy the dip”, they recognize that prices have short-term trends (“the dip”) and long-term trends (a company’s continuing—presumably, anyway—track record of making money and thereby increasing shareholder value). Yet both of these behaviors make up the price. A company’s stock price might fall due to a random string of sells while the company itself makes money over the period at the same rate. If we were long-term traders, then the short-term trends wouldn’t matter to us—they’d be noise, and in this case “high-frequency” noise. We would want to remove them before making our decisions, and that’s where “low-pass filters” would apply. I’m not going to define them more formally, but suffice it to say that moving averages and exponential moving averages happen to fall into this category.</p>
<figure>
<img src="/images/2023-5-26-figure6.png" alt="moving average" />
<figcaption>SMA and EMA technical indicators are low-pass filters. By Alex Kofman via Wikimedia and used under the <a href="https://creativecommons.org/licenses/by-sa/3.0/">CC BY-SA 3.0 license</a></figcaption>
</figure>
<p>Coincidentally, if we happened to be short-term traders, then the opposite would be true! Long-term trends would be noise, and there are “high-pass filters” for that.</p>
</div>
<p><!-- div class="info-panel" --></p>
<p>To deal with the noise in the interpolated data, we’ll want to use a low-pass filter. At the time, my choice of a particular one was just a guess: feel free to Google “second-order Butterworth digital filter” or “IIR filters” if you want. Here, just a moving average of the last four frames also suffices.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">MaFilter</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">n_channels</span><span class="o">=</span><span class="mi">360</span><span class="p">,</span> <span class="n">n_samples</span><span class="o">=</span><span class="mi">4</span><span class="p">):</span>
<span class="bp">self</span><span class="p">.</span><span class="n">samples</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">zeros</span><span class="p">((</span><span class="n">n_channels</span><span class="p">,</span> <span class="n">n_samples</span><span class="p">))</span>
<span class="bp">self</span><span class="p">.</span><span class="n">n_samples</span> <span class="o">=</span> <span class="n">n_samples</span>
<span class="bp">self</span><span class="p">.</span><span class="n">i</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">def</span> <span class="nf">filter</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">x_t</span><span class="p">):</span>
<span class="bp">self</span><span class="p">.</span><span class="n">samples</span><span class="p">[:,</span> <span class="bp">self</span><span class="p">.</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">x_t</span>
<span class="bp">self</span><span class="p">.</span><span class="n">i</span> <span class="o">=</span> <span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">)</span><span class="o">%</span><span class="bp">self</span><span class="p">.</span><span class="n">n_samples</span>
<span class="k">return</span> <span class="n">np</span><span class="p">.</span><span class="n">mean</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">samples</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
</code></pre></div></div>
<p>Applying this code to our example data yields this</p>
<figure>
<img src="/images/2023-5-26-figure7.gif" alt="moving average anim" />
</figure>
<p>This data is finally a good base to extract motion out of! Now, optical flow has a rich history involving many, <em>many</em> specific end-to-end techniques. <a href="http://www.cs.toronto.edu/~fleet/research/Papers/flowChapter05.pdf">“Optical Flow Estimation” by Fleet and Weiss</a> and <a href="https://moodle2.units.it/pluginfile.php/256938/mod_resource/content/1/1994Barron.pdf">“Performance of optical flow techniques” by Barron, Fleet, and Beauchemin</a> look to me like very comprehensive descriptions of the older ones. However, since those texts were about applying optical flow on video, let’s work out the same reasoning on our RPLIDAR data. We can let $r(\theta, t)$ be the distance from the RPLIDAR at angle $\theta$ and time $t$. (It’s worth noting here that a single frame here is one-dimensional, but a frame of a video is two-dimensional.) Motion can be expressed as the equality</p>
\[r(\theta, t) = r(\theta+\Delta\theta, t+\Delta t)\]
<p>or, in other words, the translation of distances by $\Delta \theta$ over a timespan of duration $\Delta t$. The next step is the “linearization” of this equality: a Taylor series centered at $r(\theta, t)$ replaces the right-hand side, but then we truncate away all terms involving second-order partial derivatives. The approximation we get is</p>
\[r(\theta, t) \approx r(\theta, t) + \frac{\partial r}{\partial \theta} \Delta \theta + \frac{\partial r}{\partial t} \Delta t\]
<p>Considering that $\Delta \theta / \Delta t$ is essentially velocity, we can isolate this as the ratio of partial derivatives</p>
\[\frac{\Delta \theta}{\Delta t} \approx - \frac{\partial r / \partial t}{\partial r / \partial \theta}\]
<p>This here is the point of divergence from the basic optical flow analysis on two-dimensional frames of video. In the two-dimensional case, the velocity has two components, and we wouldn’t have found an expression for both from a single equation. In general, that’s an underdetermined linear system, also called the “aperture problem” in optical flow texts. Here, the one-dimensional frame means velocity (with a single component) that we <em>can</em> just solve for.</p>
<p>To turn this into a procedure, the partial derivatives can be approximated by the finite differences</p>
\[\frac{\partial r}{\partial t} \approx \frac{r(\theta, t) - r(\theta, t-\Delta t)}{\Delta t}\]
\[\frac{\partial r}{\partial \theta} \approx \frac{r(\theta+\Delta \theta, t) - r(\theta-\Delta \theta, t)}{2 \Delta \theta}\]
<p>where $t-\Delta t$ means the previous frame, $\theta+\Delta \theta$ means to the next angle in the grid, and $\theta-\Delta \theta$ the previous. $\Delta \theta$ comes from the spacing of the grid, and $\Delta t$ can be measured using Python’s <code class="language-plaintext highlighter-rouge">time.time()</code>. Altogether, we have now completely specified one possible velocity estimation procedure. In practice, it gave me a few problems that weren’t just noise.</p>
<figure>
<img src="/images/2023-5-26-figure8.gif" alt="direct estimation anim" />
</figure>
<p>To be clear, this is the absolute value of the raw velocities times ten. You can see here a couple of issues:</p>
<ul>
<li>Small flash-points in the velocity estimation that were consistent enough to beat the low-pass filter</li>
<li>A hole in the velocity estimate at the center of the moving object</li>
</ul>
<p>One particular thing I tried that seemingly dealt with both problems is the “Lucas-Kanade method”. Originally, it was devised as the solution to the underdetermined linear system conundrum. On the assumption that neighboring pixels shared the same motion, the equations constructed from these pixels were imported, and this turned an underdetermined system into an overdetermined one with a least-squares solution. Doesn’t the same assumption apply here?</p>
<p>The modified construction is as follows. We can represent the partial derivatives at some $\theta$ and $t$ as $\partial r / \partial \theta \mid_{(\theta, t)}$ and $\partial r / \partial t \mid_{(\theta, t)}$. For some specific $\theta_i$, let’s also consider the 16 angles to its right and the 16 to its left, altogether $\theta_{i-16}, \theta_{i-15}, \dots, \theta_{i+16}$. The partial derivatives (approximated by finite differences) can be taken at these angles and formed into the vectors</p>
\[R_\theta(\theta, t) = \begin{bmatrix} \partial r / \partial \theta \mid_{(\theta_{i-16}, t)} \\ \partial r / \partial \theta \mid_{(\theta_{i-15}, t)} \\ \vdots \\ \partial r / \partial \theta \mid_{(\theta_{i+16}, t)} \end{bmatrix}\]
\[R_t(\theta, t) = \begin{bmatrix} \partial r / \partial t \mid_{(\theta_{i-16}, t)} \\ \partial r / \partial t \mid_{(\theta_{i-15}, t)} \\ \vdots \\ \partial r / \partial t \mid_{(\theta_{i+16}, t)} \end{bmatrix}\]
<p>What do we do with these vectors? We can start again with the linearization</p>
\[r(\theta, t) \approx r(\theta, t) + \frac{\partial r}{\partial \theta} \Delta \theta + \frac{\partial r}{\partial t} \Delta t\]
<p>and manipulate it into the “equation”</p>
\[0 \approx \frac{\partial r}{\partial \theta} \frac{\Delta \theta}{\Delta t} + \frac{\partial r}{\partial t}\]
<p>which we can extend with our vectors under the shared motion assumption</p>
\[0 \approx R_\theta \frac{\Delta \theta}{\Delta t} + R_t\]
<p>where $R_\theta$ and $R_t$ are just shorthand here for $R_\theta(\theta, t)$ and $R_t(\theta, t)$. Though this vector equation usually doesn’t have a solution, it takes the classic form of “minimize $Ax-b$”. The solution to “minimize $Ax-b$” is $x = (A^\intercal A)^{-1} A^\intercal b$, or in our case</p>
\[\frac{\Delta \theta}{\Delta t} \approx (R_\theta^\intercal R_\theta)^{-1} R_\theta^\intercal R_t\]
<p>It is convenient here that $R_\theta$ and $R_t$ are vectors. We can see that $R_\theta^\intercal R_\theta$ is just the square magnitude $\left\Vert R_\theta \right\Vert^2$ and $R_\theta^\intercal R_t$ is just the dot product $R_\theta \cdot R_t$. So, we can just reduce the velocity estimator to just</p>
\[\frac{\Delta \theta}{\Delta t} \approx \frac{R_\theta \cdot R_t}{\left\Vert R_\theta \right\Vert^2}\]
<p>Using the following code, we can apply this estimator to our example data and get the following result</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">VelocityEstimator</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">window_size</span><span class="o">=</span><span class="mi">16</span><span class="p">):</span>
<span class="bp">self</span><span class="p">.</span><span class="n">h_prev</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">zeros</span><span class="p">(</span><span class="mi">360</span><span class="p">)</span>
<span class="bp">self</span><span class="p">.</span><span class="n">window_size</span> <span class="o">=</span> <span class="n">window_size</span>
<span class="k">def</span> <span class="nf">estimate</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span> <span class="n">dt</span><span class="p">):</span>
<span class="n">dtheta</span> <span class="o">=</span> <span class="mi">2</span><span class="o">*</span><span class="n">np</span><span class="p">.</span><span class="n">pi</span><span class="o">/</span><span class="mi">360</span>
<span class="n">dh_dt</span> <span class="o">=</span> <span class="p">(</span><span class="n">h</span><span class="o">-</span><span class="bp">self</span><span class="p">.</span><span class="n">h_prev</span><span class="p">)</span><span class="o">/</span><span class="n">dt</span>
<span class="n">dh_dtheta</span> <span class="o">=</span> <span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">roll</span><span class="p">(</span><span class="n">h</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span><span class="o">-</span><span class="n">np</span><span class="p">.</span><span class="n">roll</span><span class="p">(</span><span class="n">h</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span><span class="o">/</span><span class="p">(</span><span class="mi">2</span><span class="o">*</span><span class="n">dtheta</span><span class="p">)</span>
<span class="n">dh_dtheta_neighbors</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">empty</span><span class="p">((</span><span class="mi">360</span><span class="p">,</span> <span class="mi">2</span><span class="o">*</span><span class="bp">self</span><span class="p">.</span><span class="n">window_size</span><span class="o">+</span><span class="mi">1</span><span class="p">),</span> <span class="n">dtype</span><span class="o">=</span><span class="n">np</span><span class="p">.</span><span class="n">float32</span><span class="p">)</span>
<span class="n">dh_dt_neighbors</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">empty</span><span class="p">((</span><span class="mi">360</span><span class="p">,</span> <span class="mi">2</span><span class="o">*</span><span class="bp">self</span><span class="p">.</span><span class="n">window_size</span><span class="o">+</span><span class="mi">1</span><span class="p">),</span> <span class="n">dtype</span><span class="o">=</span><span class="n">np</span><span class="p">.</span><span class="n">float32</span><span class="p">)</span>
<span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">2</span><span class="o">*</span><span class="bp">self</span><span class="p">.</span><span class="n">window_size</span><span class="o">+</span><span class="mi">1</span><span class="p">):</span>
<span class="n">shift</span> <span class="o">=</span> <span class="n">j</span><span class="o">-</span><span class="bp">self</span><span class="p">.</span><span class="n">window_size</span>
<span class="n">dh_dtheta_neighbors</span><span class="p">[:,</span> <span class="n">j</span><span class="p">]</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">roll</span><span class="p">(</span><span class="n">dh_dtheta</span><span class="p">,</span> <span class="n">shift</span><span class="p">)</span>
<span class="n">dh_dt_neighbors</span><span class="p">[:,</span> <span class="n">j</span><span class="p">]</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">roll</span><span class="p">(</span><span class="n">dh_dt</span><span class="p">,</span> <span class="n">shift</span><span class="p">)</span>
<span class="c1"># calculates all estimated velocities as many dot products
</span> <span class="n">elementwise_product</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">multiply</span><span class="p">(</span><span class="n">dh_dtheta_neighbors</span><span class="p">,</span> <span class="n">dh_dt_neighbors</span><span class="p">)</span>
<span class="n">v_est</span> <span class="o">=</span> <span class="o">-</span><span class="n">np</span><span class="p">.</span><span class="nb">sum</span><span class="p">(</span><span class="n">elementwise_product</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span><span class="o">/</span><span class="n">np</span><span class="p">.</span><span class="nb">sum</span><span class="p">(</span><span class="n">dh_dtheta_neighbors</span><span class="o">**</span><span class="mi">2</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="bp">self</span><span class="p">.</span><span class="n">h_prev</span> <span class="o">=</span> <span class="n">h</span>
<span class="k">return</span> <span class="n">v_est</span>
</code></pre></div></div>
<figure>
<img src="/images/2023-5-26-figure9.gif" alt="lucas-kanade anim" />
</figure>
<p>Compared to the results of the other procedure, the hole mostly disappears and the flash-points are suppressed. This signal appears to be so clean that all you could need is a threshold detector (possibly with hysteresis) to find all the directions of motion.</p>
<p>So, that’s the process I used to detect motion using the RPLIDAR. It’s made of a lot of random concepts—perhaps because it was hacked together over a week. So, it might serve more as a demonstration of how these concepts get applied than a whole, proven procedure. I’m sure that there are more effective, simple, or rigorous ways to solve the same problem. Still, this outline hopefully was an interesting read that inspires you to dive deeper into any of the backgrounds it invokes.</p>Kenny Pengknypng44@gmail.comOver a week, I happened to hack together an interesting procedure that ended up being an important part of the senior capstone project I was contributing to. The objective of this procedure: if it moves, detect it! The sensor involved here is the RPLIDAR, a low-cost "laser range scanner" that yields distances from itself at all angles. The principle behind the procedure is "optical flow", a whole class of techniques for inferring the velocity of an object in a video by looking from frame to frame. The specific technique I used is a classic called the "Lucas-Kanade method". It turned out that the same reasoning that constructs it (and optical flow more generally) also works with the data taken from the RPLIDAR.Recoloring backgrounds to align with the Solarized Dark base palette2022-11-06T00:00:00+00:002022-11-06T00:00:00+00:00http://kenny-peng.com/2022/11/06/solarized_background<p>I know that I’m not the only person who made the “Carina Cliffs” into their desktop background on the day those first JWST shots were released. I had the idea shortly after I saw them, and it’s stayed on my desktop through the months since. However, I also switched from Windows to Pop!_OS to Arch Linux along the way, and sooner or later I wanted to theme my system. I eventually settled on <a href="https://ethanschoonover.com/solarized/">Solarized Dark</a> as my palette of choice, but then I had a problem. Solarized Dark focused on muted hues of blue as its base palette, but that clashed with the vibrant, orange splashes of my new favorite background.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="n">plt</span>
<span class="kn">from</span> <span class="nn">skimage</span> <span class="kn">import</span> <span class="n">io</span><span class="p">,</span> <span class="n">color</span>
<span class="n">image</span> <span class="o">=</span> <span class="n">io</span><span class="p">.</span><span class="n">imread</span><span class="p">(</span><span class="s">'carina.png'</span><span class="p">)</span>
<span class="n">image</span> <span class="o">=</span> <span class="n">image</span><span class="p">[::</span><span class="mi">8</span><span class="p">,</span> <span class="p">::</span><span class="mi">8</span><span class="p">]</span> <span class="c1"># downsample the image to 1/64 size for this blog post
</span><span class="n">io</span><span class="p">.</span><span class="n">imshow</span><span class="p">(</span><span class="n">image</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>
<p><img src="/images/2022-11-6-figure1.jpeg" alt="original carina imshow" /></p>
<p>The ordinary idea would have been to switch to a background that aligned better, but–no–I wanted to keep my “Carina Cliffs”. So, I needed to recolor it. There were a couple of ways I could have gone about it, like composing the shot from scratch. The original infrared data <em>was</em> out there, but I was no color scientist.</p>
<p>Instead, my plan started with converting the image to grayscale (though throwing out the color hurt somewhat).</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">image</span> <span class="o">=</span> <span class="n">color</span><span class="p">.</span><span class="n">rgb2gray</span><span class="p">(</span><span class="n">image</span><span class="p">)</span>
<span class="n">io</span><span class="p">.</span><span class="n">imshow</span><span class="p">(</span><span class="n">image</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>
<p><img src="/images/2022-11-6-figure2.jpeg" alt="grayscale carina imshow" /></p>
<p>Next, I wanted to map grayscale values to colors along a “curve” going through the base palette of Solarized Dark. But what was this “curve”?</p>
<p>The Solarized Dark palette originally defined its colors as carefully placed points in the CIELAB space. Unlike RGB, the CIELAB space moved away from pixel brightnesses to coordinates based on human vision. Consequently, moving along any straight path in this space should look like a natural transition of colors. This is what I wanted to take advantage of by drawing a “curve”.</p>
<p>That said, though I knew Solarized Dark was careful about its color coordinates. I didn’t know <em>exactly</em> what it did. At worst, I thought that I might need to draw a Bezier curve, but it turned out to be much simpler.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># palette[:, 0] is L, palette[:, 1] is A, palette[:, 2] is B
</span><span class="n">palette</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span>
<span class="p">[</span> <span class="mi">15</span><span class="p">,</span> <span class="o">-</span><span class="mi">12</span><span class="p">,</span> <span class="o">-</span><span class="mi">12</span><span class="p">],</span> <span class="c1"># Base03
</span> <span class="p">[</span> <span class="mi">20</span><span class="p">,</span> <span class="o">-</span><span class="mi">12</span><span class="p">,</span> <span class="o">-</span><span class="mi">12</span><span class="p">],</span> <span class="c1"># Base02
</span> <span class="p">[</span> <span class="mi">45</span><span class="p">,</span> <span class="o">-</span><span class="mi">7</span><span class="p">,</span> <span class="o">-</span><span class="mi">7</span><span class="p">],</span> <span class="c1"># Base01
</span> <span class="p">[</span> <span class="mi">50</span><span class="p">,</span> <span class="o">-</span><span class="mi">7</span><span class="p">,</span> <span class="o">-</span><span class="mi">7</span><span class="p">],</span> <span class="c1"># Base00
</span> <span class="p">[</span> <span class="mi">60</span><span class="p">,</span> <span class="o">-</span><span class="mi">6</span><span class="p">,</span> <span class="o">-</span><span class="mi">3</span><span class="p">],</span> <span class="c1"># Base0
</span> <span class="p">[</span> <span class="mi">65</span><span class="p">,</span> <span class="o">-</span><span class="mi">5</span><span class="p">,</span> <span class="o">-</span><span class="mi">2</span><span class="p">],</span> <span class="c1"># Base1
</span> <span class="p">[</span> <span class="mi">92</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">10</span><span class="p">],</span> <span class="c1"># Base2
</span> <span class="p">[</span> <span class="mi">97</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">10</span><span class="p">],</span> <span class="c1"># Base3
</span><span class="p">])</span>
<span class="n">mean</span> <span class="o">=</span> <span class="n">palette</span><span class="p">.</span><span class="n">mean</span><span class="p">(</span><span class="n">axis</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">'mean:'</span><span class="p">,</span> <span class="n">mean</span><span class="p">)</span>
<span class="n">U</span><span class="p">,</span> <span class="n">sigma</span><span class="p">,</span> <span class="n">V</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">linalg</span><span class="p">.</span><span class="n">svd</span><span class="p">(</span><span class="n">palette</span><span class="o">-</span><span class="n">mean</span><span class="p">)</span>
<span class="n">principal_component</span> <span class="o">=</span> <span class="n">V</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="k">print</span><span class="p">(</span><span class="s">'principal_component:'</span><span class="p">,</span> <span class="n">principal_component</span><span class="p">)</span>
<span class="n">line_pts</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">outer</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">linspace</span><span class="p">(</span><span class="o">-</span><span class="mi">42</span><span class="p">,</span> <span class="mi">42</span><span class="p">,</span> <span class="mi">10</span><span class="p">),</span> <span class="n">principal_component</span><span class="p">)</span><span class="o">+</span><span class="n">mean</span>
<span class="n">fig</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">figure</span><span class="p">()</span>
<span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">axes</span><span class="p">(</span><span class="n">projection</span><span class="o">=</span><span class="s">'3d'</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="n">scatter3D</span><span class="p">(</span><span class="n">palette</span><span class="p">[:,</span> <span class="mi">0</span><span class="p">],</span> <span class="n">palette</span><span class="p">[:,</span> <span class="mi">1</span><span class="p">],</span> <span class="n">palette</span><span class="p">[:,</span> <span class="mi">2</span><span class="p">],</span> <span class="n">c</span><span class="o">=</span><span class="n">palette</span><span class="p">[:,</span> <span class="mi">0</span><span class="p">])</span>
<span class="n">ax</span><span class="p">.</span><span class="n">plot3D</span><span class="p">(</span><span class="o">*</span><span class="n">line_pts</span><span class="p">.</span><span class="n">T</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>mean: [55.5 -6.125 -2.875]
principal_component: [0.95104299 0.14562397 0.27260023]
</code></pre></div></div>
<p><img src="/images/2022-11-6-figure3.png" alt="principal component analysis of Solarized Dark base palette" /></p>
<p>In fact, the entire base palette was placed approximately along a straight line! The “curve” I wanted could just be this line. In my searches, I found one approach to getting it: <a href="https://stackoverflow.com/questions/2298390/fitting-a-line-in-3d">finding the “principal component” using the “SVD”</a>. That method gave some parameters of the line that I needed.</p>
<p>That was:</p>
<ol>
<li><code class="language-plaintext highlighter-rouge">mean</code>: a reference point on the line</li>
<li><code class="language-plaintext highlighter-rouge">principal_component</code>: a unit vector in the direction of the line</li>
</ol>
<p>There was just one last thing I needed: the endpoints. This was something I just eyeballed.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">t_start</span> <span class="o">=</span> <span class="o">-</span><span class="mi">42</span> <span class="c1"># approx where base03 is
</span><span class="n">t_end</span> <span class="o">=</span> <span class="mi">11</span> <span class="c1"># approx where base1 is
</span>
<span class="k">print</span><span class="p">(</span><span class="s">'t_start:'</span><span class="p">,</span> <span class="n">t_start</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">'t_end:'</span><span class="p">,</span> <span class="n">t_end</span><span class="p">)</span>
<span class="c1"># copied from previous cell
</span><span class="n">fig</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">figure</span><span class="p">()</span>
<span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">axes</span><span class="p">(</span><span class="n">projection</span><span class="o">=</span><span class="s">'3d'</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="n">scatter3D</span><span class="p">(</span><span class="n">palette</span><span class="p">[:,</span> <span class="mi">0</span><span class="p">],</span> <span class="n">palette</span><span class="p">[:,</span> <span class="mi">1</span><span class="p">],</span> <span class="n">palette</span><span class="p">[:,</span> <span class="mi">2</span><span class="p">],</span> <span class="n">c</span><span class="o">=</span><span class="n">palette</span><span class="p">[:,</span> <span class="mi">0</span><span class="p">])</span>
<span class="n">ax</span><span class="p">.</span><span class="n">plot3D</span><span class="p">(</span><span class="o">*</span><span class="n">line_pts</span><span class="p">.</span><span class="n">T</span><span class="p">)</span>
<span class="c1"># plot the endpoints of the line
</span><span class="n">ax</span><span class="p">.</span><span class="n">plot3D</span><span class="p">(</span><span class="o">*</span><span class="p">(</span><span class="n">principal_component</span><span class="o">*</span><span class="n">t_start</span><span class="o">+</span><span class="n">mean</span><span class="p">).</span><span class="n">T</span><span class="p">,</span> <span class="s">'x'</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'blue'</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="n">plot3D</span><span class="p">(</span><span class="o">*</span><span class="p">(</span><span class="n">principal_component</span><span class="o">*</span><span class="n">t_end</span><span class="o">+</span><span class="n">mean</span><span class="p">).</span><span class="n">T</span><span class="p">,</span> <span class="s">'x'</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'blue'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>t_start: -42
t_end: 11
</code></pre></div></div>
<p><img src="/images/2022-11-6-figure4.png" alt="PCA of Solarized Dark base palette with endpoints" /></p>
<p>These endpoints were represented as the final parameters:</p>
<ol>
<li><code class="language-plaintext highlighter-rouge">t_start</code>: zero brightness will be mapped to <code class="language-plaintext highlighter-rouge">principal_component*t_start+mean</code></li>
<li><code class="language-plaintext highlighter-rouge">t_end</code>: max brightness will be mapped to <code class="language-plaintext highlighter-rouge">principal_component*t_end+mean</code></li>
</ol>
<p>And with this line fully defined, I could hop to it from grayscale as I planned.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">orig_shape</span> <span class="o">=</span> <span class="n">image</span><span class="p">.</span><span class="n">shape</span>
<span class="n">image</span> <span class="o">=</span> <span class="n">image</span><span class="p">.</span><span class="n">flatten</span><span class="p">()</span>
<span class="n">image</span> <span class="o">=</span> <span class="n">image</span><span class="o">*</span><span class="p">(</span><span class="n">t_end</span><span class="o">-</span><span class="n">t_start</span><span class="p">)</span><span class="o">+</span><span class="n">t_start</span>
<span class="n">image</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">outer</span><span class="p">(</span><span class="n">image</span><span class="p">,</span> <span class="n">principal_component</span><span class="p">)</span><span class="o">+</span><span class="n">mean</span>
<span class="n">image</span> <span class="o">=</span> <span class="n">image</span><span class="p">.</span><span class="n">reshape</span><span class="p">(</span><span class="o">*</span><span class="n">orig_shape</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span>
<span class="n">image</span> <span class="o">=</span> <span class="n">color</span><span class="p">.</span><span class="n">lab2rgb</span><span class="p">(</span><span class="n">image</span><span class="p">)</span>
<span class="n">io</span><span class="p">.</span><span class="n">imshow</span><span class="p">(</span><span class="n">image</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>
<p><img src="/images/2022-11-6-figure5.jpeg" alt="recolored carina imshow" /></p>
<p>And so, I had my new “Carina Cliffs”, recolored to align with my new theme! I’m sure that this isn’t the only method, but it was the first one that I tried and liked.</p>
<figure>
<img src="/images/2022-11-6-figure6.jpeg" alt="themed laptop with recolored carina cliffs" />
</figure>
<p>If anyone else wants to recolor their backgrounds in this way, it turns out to be quite the churn. For an 8K background like the “Carina Cliffs”, I’ve had a couple of OOM-kills along the way on my 8GB machine, but I have optimized the process into this quick and small script.</p>
<script src="https://gist.github.com/cfa1816d06067aceda1f191f8a86ba7d.js"> </script>Kenny Pengknypng44@gmail.comI know that I’m not the only person who made the “Carina Cliffs” into their desktop background on the day those first JWST shots were released. I had the idea shortly after I saw them, and it’s stayed on my desktop through the months since. However, I also switched from Windows to Pop!_OS to Arch Linux along the way, and sooner or later I wanted to theme my system. I eventually settled on Solarized Dark as my palette of choice, but then I had a problem. Solarized Dark focused on muted hues of blue as its base palette, but that clashed with the vibrant, orange splashes of my new favorite background.Investigating the math of waveshapers: Chebyshev polynomials2022-06-18T00:00:00+00:002022-06-18T00:00:00+00:00http://kenny-peng.com/2022/06/18/chebyshev_harmonics<p>Over a year ago, I wrote <a href="/2020/11/23/teensy_harmonic_distortion.html">“Adding harmonic distortions with Arduino Teensy”</a>. In that post, I happened upon a way to apply any arbitrary profile of harmonics using a Teensy-based waveshaper (just except that waveshapers categorically can’t vary the phase of each harmonic). However, when I wrote that, I totally missed out on the established literature on the topic! Even in 1979, there was <a href="https://www.jstor.org/stable/3680281">“A Tutorial on Non-Linear Distortion or Waveshaping Synthesis”</a>, and I ultimately had taken a very convoluted path only to arrive at the same place!</p>
<p>To compare it to the method I showed before, one can adapt that 1979 tutorial to the Teensy waveshaper quite naturally, and the adapted method is far easier to implement and more concise. However, to do the adaptation, we have to know one thing: what is a “Chebyshev polynomial”?</p>
<p><a href="https://en.wikipedia.org/wiki/Chebyshev_polynomials">Chebyshev polynomials</a> can be used in a rigorous approach to building waveshapers according to some desired profile of harmonics. In this context, their claim to fame is that they’re polynomials that can twist a $\cos x$ wave into its $n$-th harmonic, or in other words</p>
\[T_n(\cos x) = \cos(nx)\]
<p>You already know one if you can recall the double-angle formula, $\cos(2x) = 2\cos^2 x - 1$. Now, imagine un-substituting $\cos x$ from the right-hand side, and you’ll get the Chebyshev polynomial $T_2(x) = 2x^2 - 1$. Then, imagine a double-<em>double-</em>angle formula, $\cos(4x) = 2\cos^2(2x)-1$, and expand that to $8\cos^4 x - 8\cos^2 x + 1$. Unsubstituting $\cos x$ from that gets the Chebyshev polynomial $T_4(x) = 8x^4 - 8x^2 + 1$.</p>
<figure>
<img src="/images/2022-6-18-figure2.png" alt="T_4(x) and cos(4x) plots" />
<figcaption>Okay, my only reason for bringing up $T_4(x)$ was this elegant-looking plot, though it's not as elegant for other $n$. That aside!</figcaption>
</figure>
<p>Now, algebraically manipulating these angle identities into polynomials is a nice hat trick, but there is a simpler way to think of all the Chebyshev polynomials. In the first section of <em>Chebyshev Polynomials</em> by Mason and Handscomb (the first book that appeared on Google Scholar, don’t @ me), you can find the claim that algebraic manipulations of De Moivre’s theorem are—technically—all that you need to find a Chebyshev polynomial $T_n(x)$ for arbitrary $n$. But in that same section, you can find an easy recurrence that connects them all:</p>
\[T_n(x) = 2x T_{n-1}(x) - T_{n-2}(x)\]
<p>where $T_0(x) = 1$ and $T_1(x) = x$ to start. For example, we can use this recurrence to get from $T_2(x)$ to $T_4(x)$ by way of $T_3(x)$</p>
\[\begin{align*} T_3(x) & = 2x T_2(x) - T_1(x) \\ & = 2x (2x^2-1)-x \\ & = 4x^3 - 3x \end{align*}\]
\[\begin{align*} T_4(x) & = 2x T_3(x) - T_2(x) \\ & = 2x (4x^3-3x)-(2x^2-1) \\ & = 8x^4 - 6x^2 - 2x^2 + 1 \\ & = 8x^4-8x^2+1 \end{align*}\]
<p>where you can notice here that $T_3(x)$ corresponds with the triple-angle formula!</p>
<p>Hopefully, that’s enough about Chebyshev polynomials for us to start understanding how to use them here. Assume that $\cos x$ is our input signal (we can see how this assumption breaks down later). By the definition of the Chebyshev polynomials, $\cos x$ happens to be equal to $T_1(\cos x)$, and so we can therefore use $T_1(x) = x$ as a kind of stand-in for $\cos x$. In the same way, we can represent some $n$-th harmonic as the polynomial $T_n(x)$. Therefore, some linear combination of $\cos x$ and its harmonics can be represented as a linear combination of the Chebyshev polynomials, and that would be another polynomial in itself!</p>
<p>In other words, if we let $\alpha_n$ be the ratios between the harmonic and the fundamental (for $n \geq 2$, since $n = 1$ is the fundamental itself), then this polynomial can be written as</p>
\[f(x) = T_1(x) + \sum_{n=2}^\infty \alpha_n T_n(x)\]
<p>In fact, this is only a few minor tweaks away from being what we throw into the lookup table of a Teensy waveshaper. Everything can be written in only four steps!</p>
<div class="info-panel">
<h4 id="how-to-generate-a-waveshaper-lookup-table-in-four-steps">How to generate a waveshaper lookup table in four steps!</h4>
<ol>
<li>
<p>Decide what amplitude ratios $\alpha_n$ each $n$-th harmonic should have with the fundamental frequency</p>
</li>
<li>
<p>Build a preliminary function $f_0(x)$ as the linear combination of the Chebyshev polynomials</p>
\[f_0(x) = T_1(x) + \sum_{n=2}^\infty \alpha_n T_n(x)\]
<p>where the first Chebyshev polynomials are</p>
\[\begin{align*} T_0(x) & = 1 \\ T_1(x) & = x \\ T_2(x) & = 2x^2-1 \\ T_3(x) & = 4x^3-3x \\ T_4(x) & = 8x^4-8x^2+1 \end{align*}\]
<p>and the rest can be derived by the recurrence relation</p>
\[T_{n+1}(x) = 2 x T_n(x)-T_{n-1}(x)\]
</li>
<li>
<p>Shift $f_0(x)$ so that it maps zero to zero (for preventing constant DC) by evaluating $f_0(x)$ at $x=0$ then subtracting that</p>
\[f_1(x) = f_0(x)-f_0(0)\]
</li>
<li>
<p>Normalize $f_1(x)$ by finding the maximum absolute value for $-1 < x < 1$ (try plotting $f_1(x)$) then dividing by that</p>
\[f_2(x) = \frac{f_1(x)}{f_{\text{1,maxabs}}}\]
</li>
</ol>
<p>The above function, $f_2(x)$, is your final function. Evaluate it at as many points within $-1 < x < 1$ as can fit in your waveshaper’s LUT! If the input sine wave swings exactly within $-1 < x < 1$, then the ratios $\alpha_n$ will be realized. Otherwise, different and smaller ratios will occur.</p>
<details>
<p><summary>Using this method, I can perfectly replicate my old post!</summary></p>
<ol>
<li>
<p>In that old post, I chose to give the second harmonic a weight of $0.2$ and no weight to the higher ones, so $\alpha_2 = 0.2$ and $\alpha_n = 0$ for $n > 2$.</p>
</li>
<li>
<p>The sum reduces to a single Chebyshev polynomial term, so the preliminary function is</p>
\[f_0(x) = x + 0.2 (2x^2-1)\]
</li>
<li>
<p>We can calculate that $f_0(0)=-0.2$, so our new function must be</p>
\[f_1(x) = x+0.2 (2x^2-1)+0.2\]
</li>
<li>
<p>Plotting $f_1(x)$ reveals that it achieves a maximum absolute value of 1.4 at $x=1$, so our final function must be</p>
\[f_2(x) = \frac{x+0.2 \cdot (2x^2-1)+0.2}{1.4}\]
</li>
</ol>
<p>That function simplifies to $\frac{2}{7}x^2+\frac{5}{7}x$.</p>
</details>
<figure>
<img src="/images/2022-6-18-figure1.png" alt="New and old plots" />
</figure>
</div>
<p><!-- div class="info-panel" --></p>
<p>We’ve essentially reached parity with my last blog post, but one question remains: what happened to all the phase shifts I had done? In fact, if I used the $\cos x$ wave and not the $\sin x$ wave as my basis, I could have avoided that altogether. While Chebyshev polynomials do what’s written on their tin when passed $\cos x$ as the input, you can show that it doesn’t do the same for $\sin x$ waves:</p>
\[\begin{align*} T_n(\sin x) & = T_n \big(\cos(x - \frac{\pi}{2}) \big) \\ & = \cos\big(n(x-\frac{\pi}{2})\big) \\ & = \cos(nx - n\frac{\pi}{2}) \\ & = \sin(nx-n\frac{\pi}{2}+\frac{\pi}{2}) \\ & = \sin\big(nx-(n-1)\frac{\pi}{2}\big)\end{align*}\]
<p>And hence came the phase shifts.</p>
<p>Finally, let’s address the assumption that we made from the start: that our input was a $\cos x$ wave. We’ve seen now that even trying $\sin x$ waves instead already breaks the result. That is, only when we give one specific sinusoid, $\cos x$, will we get all the harmonics back with no phase shifts. Another way we can break this is to give it a wave of some varying amplitude $a(t) \leq 1$ (i.e. an ADSR envelope) or even an arbitrary input. In that case, I don’t know where the impacts end. At the very least, I can address one of them: constant DC shifts.</p>
<p>For $a(t) = 0$, a waveshaper will see nothing but zero, and it may decide to map that to something nonzero. This is because Chebyshev polynomials weren’t defined with that in mind either. For example, $T_2(0)=-1$. If my headphones saw -1 volts at DC, they’d blow. From my old post, I had only seen that happen when I added even harmonics, and I had seen that adding a constant equal to $\alpha_n$ or $-\alpha_n$ would correct that. Ultimately though, the easiest way to correct this effect is to just evaluate the waveshaper function at $x=0$, then subtract that value. That’s step 3.</p>Kenny Pengknypng44@gmail.comOver a year ago, I wrote “Adding harmonic distortions with Arduino Teensy”. In that post, I happened upon a way to apply any arbitrary profile of harmonics using a Teensy-based waveshaper (just except that waveshapers categorically can’t vary the phase of each harmonic). However, when I wrote that, I totally missed out on the established literature on the topic! Even in 1979, there was “A Tutorial on Non-Linear Distortion or Waveshaping Synthesis”, and I ultimately had taken a very convoluted path only to arrive at the same place!Playing with the VideoCore IV GPU on a Raspberry Pi Zero using VC4CL2021-09-14T00:00:00+00:002021-09-14T00:00:00+00:00http://kenny-peng.com/2021/09/14/raspi_zero_opencl<p>Recently, I learned about <a href="https://github.com/doe300/VC4CL">VC4CL</a>, an implementation of OpenCL on the VideoCore IV, the GPU on every Raspberry Pi (except the Pi 4, which uses the VideoCore VI). The press about it seemed to talk about how it’s been woefully underused in many projects, so I was naturally excited to use it myself. I was lately obsessed with making a fluid simulation toy, and I figured embedded GPGPU might be the answer.</p>
<p>I ended up picking the Raspberry Pi Zero for my project because it was small and cheap yet packing the same GPU, and I’m always attracted to running goliath things on David-like hardware.</p>
<p>To begin though, getting VC4CL on the Raspberry Pi Zero was a challenge to begin with–foreshadowing I didn’t notice at the time. I followed this short and neat <a href="https://qengineering.eu/install-opencl-on-raspberry-pi-3.html">guide</a>, but I would wait <em>hours</em> just to see gcc getting killed at the linking stage every time. Some Googling revealed that this was an OOM (out-of-memory) kill, and the solution was a temporary swap space, according to StackOverflow. The below script makes sure to allocate that, so I guarantee that it works on a Raspberry Pi Zero.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>apt update
<span class="nb">sudo </span>apt upgrade <span class="nt">-y</span>
<span class="nb">sudo </span>apt <span class="nb">install </span>cmake git <span class="nt">-y</span>
<span class="nb">sudo </span>apt <span class="nb">install </span>ocl-icd-opencl-dev ocl-icd-dev <span class="nt">-y</span>
<span class="nb">sudo </span>apt <span class="nb">install </span>opencl-headers <span class="nt">-y</span>
<span class="nb">sudo </span>apt <span class="nb">install </span>clinfo <span class="nt">-y</span>
<span class="nb">sudo </span>apt <span class="nb">install </span>libraspberrypi-dev <span class="nt">-y</span>
<span class="nb">sudo </span>apt <span class="nb">install </span>clang clang-format clang-tidy <span class="nt">-y</span>
<span class="nb">mkdir </span>opencl
<span class="nb">cd </span>opencl
git clone https://github.com/doe300/VC4CLStdLib.git
git clone https://github.com/doe300/VC4CL.git
git clone https://github.com/doe300/VC4C.git
<span class="nb">dd </span><span class="k">if</span><span class="o">=</span>/dev/zero <span class="nv">of</span><span class="o">=</span>./tempswap <span class="nv">count</span><span class="o">=</span>1K <span class="nv">bs</span><span class="o">=</span>1M
mkswap ./tempswap
<span class="nb">sudo chown </span>root:root ./tempswap
<span class="nb">sudo chmod </span>600 ./tempswap
<span class="nb">sudo </span>swapon ./tempswap
<span class="nb">cd </span>VC4CLStdLib
<span class="nb">mkdir </span>build
<span class="nb">cd </span>build
cmake ..
make
<span class="nb">sudo </span>make <span class="nb">install
sudo </span>ldconfig
<span class="nb">cd</span> ../../VC4C
<span class="nb">mkdir </span>build
<span class="nb">cd </span>build
cmake ..
make
<span class="nb">sudo </span>make <span class="nb">install
sudo </span>ldconfig
<span class="nb">cd</span> ../../VC4CL
<span class="nb">mkdir </span>build
<span class="nb">cd </span>build
cmake ..
make
<span class="nb">sudo </span>make <span class="nb">install
sudo </span>ldconfig
<span class="nb">cd</span> ../..
<span class="nb">sudo </span>swapoff ./tempswap
<span class="nb">sudo rm</span> ./tempswap
</code></pre></div></div>
<p>After a couple more hours of compiling, I could finally confirm OpenCL functionality with <code class="language-plaintext highlighter-rouge">sudo clinfo</code> (sudo is necessary for all OpenCL applications on Raspberry Pi because the GPU is wired in with effectively privileged memory access, read the VC4CL repo for more information).</p>
<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">pi@raspberrypi:~ $</span><span class="w"> </span><span class="nb">sudo </span>clinfo
<span class="go">Number of platforms 1
Platform Name OpenCL for the Raspberry Pi VideoCore IV GPU
Platform Vendor doe300
Platform Version OpenCL 1.2 VC4CL 0.4.9999 (2cf1d93)
Platform Profile EMBEDDED_PROFILE
Platform Extensions cl_khr_il_program cl_khr_spir cl_khr_create_command_queue cl_altera_device_temperature cl_altera_live_object_tracking cl_khr_icd cl_khr_extended_versioning cl_khr_spirv_no_integer_wrap_decoration cl_khr_suggested_local_work_size cl_vc4cl_performance_counters
Platform Extensions function suffix VC4CL
Platform Name OpenCL for the Raspberry Pi VideoCore IV GPU
Number of devices 1
Device Name VideoCore IV GPU
Device Vendor Broadcom
Device Vendor ID 0x14e4
Device Version OpenCL 1.2 VC4CL 0.4.9999 (2cf1d93)
Driver Version 0.4.9999
Device OpenCL C Version OpenCL C 1.2
Device Type GPU
Device Profile EMBEDDED_PROFILE
Device Available Yes
Compiler Available Yes
Linker Available Yes
Max compute units 1
Max clock frequency 300MHz
Core Temperature (Altera) 31 C
Device Partition (core)
Max number of sub-devices 0
Supported partition types None
Supported affinity domains (n/a)
Max work item dimensions 3
Max work item sizes 12x12x12
Max work group size 12
Preferred work group size multiple 1
Preferred / native vector sizes
char 16 / 16
short 16 / 16
int 16 / 16
long 0 / 0
half 0 / 0 (n/a)
float 16 / 16
double 0 / 0 (n/a)
Half-precision Floating-point support (n/a)
Single-precision Floating-point support (core)
Denormals No
Infinity and NANs No
Round to nearest No
Round to zero Yes
Round to infinity No
IEEE754-2008 fused multiply-add No
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Double-precision Floating-point support (n/a)
Address bits 32, Little-Endian
Global memory size 67108864 (64MiB)
Error Correction support No
Max memory allocation 67108864 (64MiB)
Unified memory for Host and Device Yes
Minimum alignment for any data type 64 bytes
Alignment of base address 512 bits (64 bytes)
Global Memory cache type Read/Write
Global Memory cache size 32768 (32KiB)
Global Memory cache line size 64 bytes
Image support No
Local memory type Global
Local memory size 67108864 (64MiB)
Max number of constant args 32
Max constant buffer size 67108864 (64MiB)
Max size of kernel argument 256
Queue properties
Out-of-order execution No
Profiling Yes
Prefer user sync for interop Yes
Profiling timer resolution 1ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels No
IL version SPIR-V_1.5 SPIR_1.2
SPIR versions 1.2
printf() buffer size 0
Built-in kernels (n/a)
Device Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_nv_pragma_unroll cl_arm_core_id cl_ext_atomic_counters_32 cl_khr_initialize_memory cl_arm_integer_dot_product_int8 cl_arm_integer_dot_product_accumulate_int8 cl_arm_integer_dot_product_accumulate_int16 cl_arm_integer_dot_product_accumulate_saturate_int8 cl_khr_il_program cl_khr_spir cl_khr_create_command_queue cl_altera_device_temperature cl_altera_live_object_tracking cl_khr_icd cl_khr_extended_versioning cl_khr_spirv_no_integer_wrap_decoration cl_khr_suggested_local_work_size cl_vc4cl_performance_counters
NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) OpenCL for the Raspberry Pi VideoCore IV GPU
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [VC4CL]
clCreateContext(NULL, ...) [default] Success [VC4CL]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) Success (1)
Platform Name OpenCL for the Raspberry Pi VideoCore IV GPU
Device Name VideoCore IV GPU
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1)
Platform Name OpenCL for the Raspberry Pi VideoCore IV GPU
Device Name VideoCore IV GPU
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1)
Platform Name OpenCL for the Raspberry Pi VideoCore IV GPU
Device Name VideoCore IV GPU
ICD loader properties
ICD loader Name OpenCL ICD Loader
ICD loader Vendor OCL Icd free software
ICD loader Version 2.2.12
ICD loader Profile OpenCL 2.2
</span></code></pre></div></div>
<p>Just to test my concept, I reused most of the code from <a href="https://github.com/colonelatch/ESP32-fluid-simulation">ESP32-fluid-simulation</a>. Although the simulation used in that project was rather crude, I just wanted a low baseline to quickly begin with. In fact, I couldn’t even get that to work. Sure, the numbers it yielded confirmed OpenCL worked, but it was strangely slow.</p>
<p>Over a weekend, I threw the kitchen sink at it: sacrificing accuracy, optimizing, and even overclocking. Still, to run 10 seconds of simulation at 30 FPS, the Raspberry Pi Zero took 16.982 seconds. Meanwhile, my Raspberry Pi 3–not overclocked and using the same GPU–ran the same code in 6.376 seconds. Considering that the Pi 3 was more than twice as fast, the <em>CPU</em> on the Zero was clearly too slow!</p>
<p>The entire simulation was running on the GPU, so why? Jacobi iteration. Each iteration was embarrassingly parallel by itself, but the next iteration was dependent on the previous. That meant each iteration meant a new kernel call in order to preserve the dependency, so I ended up needing to call <em>hundreds</em> of kernels per second. In fact, calling a kernel was quite expensive on the Zero’s weak CPU. The algorithm itself–as parallel as it was–just wasn’t parallel enough, and the Zero couldn’t handle OpenCL’s overhead as a result.</p>
<p>So, that meant that I should just pursue what I originally wanted on the Raspberry Pi 3; it’s got a CPU powerful enough to handle the kernel calls. However, I <em>really</em> want the neat and tiny form factor of the Zero. What I might do instead is cellular automaton fluid. A technique used in some 2D games, it’s not as mathematically rigorous as Eulerian fluid simulation, but it should require way less kernel calls. Anyway, I think it would be a fun exploration.</p>
<p>But I digress. I did succeed in using the Raspberry Pi Zero’s GPU, though it went in a way I completely didn’t expect. I think that VC4CL–and embedded GPGPU as a whole–can offer an unprecedented level of compute that enables projects that were unthinkable before. Eulerian fluid simulation on the Zero turned out to be too CPU-bound to prove it, but I’m determined to make a project that’s really, <em>really</em> parallel and demonstrate.</p>Kenny Pengknypng44@gmail.comRecently, I learned about VC4CL, an implementation of OpenCL on the VideoCore IV, the GPU on every Raspberry Pi (except the Pi 4, which uses the VideoCore VI). The press about it seemed to talk about how it’s been woefully underused in many projects, so I was naturally excited to use it myself. I was lately obsessed with making a fluid simulation toy, and I figured embedded GPGPU might be the answer.