Publish 'Deep Dive into Recent Bugfixes' article
Все проверки выполнены успешно
Builds / ExectOS WebSite (push) Successful in 16s

Этот коммит содержится в:
2025-09-14 12:53:40 +02:00
родитель 96ac81ad41
Коммит 99cb326e9b

Просмотреть файл

@@ -0,0 +1,111 @@
+++
title = 'Deep Dive into Recent Bug Fixes'
author = 'Aiken Harris'
date = '2025-09-14T11:51:17+01:00'
+++
August has been an intense and productive month for the ExectOS project. When many thought development had slowed down,
we managed to push forward with progress - a clear sign that ExectOS is still actively evolving. Alongside implementing
Dynamic Multi-Level Paging and adding experimental LA57 support, we also spent a considerable amount of time addressing
a wide range of bugs, improving stability and compatibility across the entire system.
In one of our recent updates, we published a full list of bug fixes included in the latest builds. This time we wanted to
share some of the more interesting technical findings that came out during the development process. While many fixes were
straightforward, others required a deep dive into low-level kernel internals, bootloader behavior, and hardware-specific
quirks. In this article, we have selected several cases that stood out due to their complexity or the insights they
provided. Our goal is to give you a closer look behind the scenes and highlight the complex engineering challenges we
encounter.
### Correcting String Parameter Types: From CONST PWCHAR to PCWSTR
During a recent code review, we identified several inconsistencies in function definitions where parameters or return
types used **CONST PWCHAR** instead of **PCWSTR**. At first glance, the two may seem equivalent, but the distinction is
subtle and significant. The type **PCWSTR** is defined as **const WCHAR***, meaning it represents a pointer to immutable
data - the function promises not to modify the contents of the string it receives. In contrast, **PWCHAR** is defined as
**WCHAR* **, a pointer to mutable data, and when combined with the const qualifier as **CONST PWCHAR**, the compiler
interprets it as **WCHAR * const**. This means the pointer itself is constant, but the data it points to can still be
modified. In other words, the original declarations implied that the function would not change the pointer's address but
made no guarantees about preserving the actual string content. Since our intention was to ensure that the functions never
alter the provided strings, we corrected the signatures to use **PCWSTR**. This change improves type safety, prevents
accidental data modifications, and better communicates the intended contract of these APIs.
### Fixing a Stack Corruption Bug in BlEnumerateBlockDevices
While investigating random crashes in the bootloader, we discovered a subtle but critical bug in the
**BlEnumerateBlockDevices()** function, specifically within a **BlDebugPrint()** call responsible for logging detected
hard disk partitions. The issue stemmed from using the wrong format specifier for printing the partition size. The code
was using **%u** to display a value computed as **HDPath->PartitionSize * Media->BlockSize**, which is stored in a 64-bit
**UINT64** variable. On 32-bit UEFI systems, **%u** expects a **32-bit unsigned int**, causing only the lower 32 bits of
the value to be read while the upper 32 bits remained on the stack. As a result, all subsequent arguments were
misaligned, leading **BlDebugPrint()** to interpret invalid data as a string pointer for the **%S** specifier. In
practice, this caused corrupted output, random memory reads, and eventually a protection fault that froze the system. The
fix was straightforward but essential: we replaced **%u** with **%llu** to correctly print the 64-bit value, preventing
stack corruption and restoring XTLDR stability.
### Eliminating Screen Flicker with Smarter Boot Menu Redrawing
Previously, the boot menu suffered from a performance issue where every keypress - even a simple UP/DOWN arrow -
triggered a complete redraw of the entire menu. This approach was inefficient and caused noticeable flickering. To fix
this, we introduced a smarter redrawing mechanism controlled by two new flags: **RedrawBootMenu** and **RedrawEntries**.
The **RedrawBootMenu** flag indicates when the entire menu, including the header and footer, must be redrawn - for
example, after displaying the help screen or an error message. The **RedrawEntries** flag, on the other hand, signals
when the list of visible entries needs refreshing, such as when the selection moves beyond the currently displayed area.
For normal navigation, the optimized arrow key handling now tracks the previously highlighted item. When the new
selection remains within the visible range, only two drawing operations are performed: the old item is redrawn without
highlighting, and the new one is drawn with highlighting. The rest of the list remains untouched, eliminating unnecessary
updates. This change completely removes flickering, significantly reduces CPU overhead, and results in smooth, menu
navigation.
### Restoring Build Compatibility by Replacing asm with \_\_asm\_\_
After upgrading CMake from 4.0.3 to 4.1.0, the project became uncompilable when building with clang-cl under the C23
standard. The root cause was the stricter enforcement of language compliance introduced in the new CMake release.
Previously, CMake 4.0.3 did not explicitly pass a C standard flag, which allowed clang-cl to operate in a GNU-compatible
mode where the shorthand asm keyword - provided as a GNU extension - was accepted. However, starting with CMake 4.1.0,
the build system began explicitly adding the **-clang:-std=c23** flag, forcing the compiler into strict ISO C23 mode and
disabling GNU-specific extensions. As a result, any occurrence of asm caused a compilation error:
```error: use of undeclared identifier 'asm'```
To resolve this, we replaced all instances of **asm** with the standard-compliant **\_\_asm\_\_** keyword. Unlike **asm**,
which is a GCC-specific convenience, **\_\_asm\_\_** is a reserved identifier explicitly recognized by both GCC and Clang,
even in strict standards mode. This change ensures consistent compatibility across compilers and aligns the codebase with
the C23 standard.
### Fixing a Race Condition in HlComPortReadLsr Caused by a Shared Static Variable
We discovered a critical race condition and state management issue in the HlComPortReadLsr function, which handles
reading from serial (COM) ports. The root cause was the use of a local static variable:
``STATIC UCHAR RingFlag;``
In C, declaring a local variable as static means it is allocated only once and retains its value across all future calls
to the function. However, this also means there is only a single instance of RingFlag, shared by every call to
**HlComPortReadLsr()** - regardless of which COM port (PCPPORT instance) is being accessed. This design flaw introduced
two major problems - State Corruption. The function incorrectly allowed the state of one port to "leak" into another. If
**HlComPortReadLsr()** was called for COM1 and set **RingFlag** to a specific value, a subsequent call for COM2 would
reuse that value, leading to incorrect logic and inconsistent results. The solution was to remove the shared static
variable and make **RingFlag** instance-specific, ensuring each port operates on its own state.
### Fixing Framebuffer Rendering by Using Correct Pitch and Bytes-Per-Pixel Calculations
We fixed a bug in the framebuffer driver that caused severe graphical glitches on many real devices. The issue stemmed
from incorrect assumptions about how pixel data is laid out in video memory. The original implementation in
**xtoskrnl/hl/fbdev.c** made two major mistakes when calculating pixel addresses:
* **Ignoring the Pitch (Stride):** The code assumed that moving from one scanline to the next could be done by adding
**screen_width * bytes_per_pixel**. It used **HlpFrameBufferData.PixelsPerScanLine** - the number of visible pixels
per line - to perform the calculation. However, modern graphics hardware often aligns each scanline to specific memory
boundaries (e.g., 16 bytes), which means the actual distance between the start of consecutive lines, known as the
**Pitch (or stride)**, can be larger than **width * bytes_per_pixel**. By ignoring Pitch, the code miscalculated the
start of each line, causing images to appear skewed, torn, or completely distorted.
* **Hardcoding Pixel Size:** The original driver assumed that each pixel was always 4 bytes (32-bit color), using
formulas like:
``FrameBufferIndex = 4 * PositionY + 4 * PositionX;``
This made the driver incompatible with display modes using 24-bit or 16-bit color depths.
We have addressed these issues by introducing a hardware-correct approach. We have added BytesPerPixel to the framebuffer
data structure. It's value is calculated once during initialization. This eliminates all hardcoded assumptions about pixel
size.
Additionally, all drawing functions were updated to use the proper formula for calculating pixel address. This fix ensures
correct rendering across all supported color depths and hardware configurations, eliminates graphical artifacts, and makes
the framebuffer driver fully compliant with modern graphics standards.