From 03af9ef06b9080ab33ab3d05b7f169060888ed18 Mon Sep 17 00:00:00 2001 From: Derek Schuff Date: Tue, 19 Aug 2025 22:22:44 +0000 Subject: [PATCH 01/12] first draft and debugging update --- site/source/docs/porting/Debugging.rst | 469 ++++++++++++------------- 1 file changed, 234 insertions(+), 235 deletions(-) diff --git a/site/source/docs/porting/Debugging.rst b/site/source/docs/porting/Debugging.rst index 3a7c3b67bfbe2..78a17a01a554f 100644 --- a/site/source/docs/porting/Debugging.rst +++ b/site/source/docs/porting/Debugging.rst @@ -4,103 +4,172 @@ Debugging ========= -One of the main advantages of debugging cross-platform Emscripten code is that the same cross-platform source code can be debugged on either the native platform or using the web browser's increasingly powerful toolset — including debugger, profiler, and other tools. - -Emscripten provides a lot of functionality and tools to aid debugging: - -- :ref:`Compiler debug information flags ` that allow you to preserve debug information in compiled code and even create source maps so that you can step through the native C++ source when debugging in the browser. -- :ref:`Debug mode `, which emits debug logs and stores intermediate build files for analysis. -- :ref:`Compiler settings ` to enable runtime checking of memory accesses and common allocation errors. -- :ref:`debugging-manual-debugging` of Emscripten-generated code is also supported, which is in some ways even better than on native platforms. -- :ref:`debugging-autodebugger`, which automatically instruments LLVM bitcode to write out each store to memory. - -This article describes the main tools and settings provided by Emscripten for debugging, along with a section explaining how to debug a number of :ref:`debugging-emscripten-specific-issues`. - - -.. _debugging-debug-information-g: - -Debugging in the browser -======================== - -:ref:`Emcc ` can output debug information in two formats, either as -DWARF symbols or as source maps. Both allow you to view and debug the -*C/C++ source code* in a browser's debugger. DWARF offers the most precise and -detailed debugging experience and is supported as an experiment in Chrome 88 -with an `extension `. See -`here ` for a detailed -usage guide. Source maps are more widely supported in Firefox, Chrome, and -Safari, but unlike DWARF they cannot be used to inspect variables, for example. - -:ref:`Emcc ` strips out most of the debug information from -:ref:`optimized builds ` by default. DWARF can be produced with -the *emcc* :ref:`-g flag `, and source maps can be emitted with the -:ref:`-gsource-map ` option. Be aware that optimisation levels -:ref:`-O1 ` and above increasingly remove LLVM debug information, and -also disable runtime :ref:`ASSERTIONS ` checks. Passing a -``-g`` flag also affects the generated JavaScript code and preserves -white-space, function names, and variable names, - -.. tip:: Even for medium-sized projects, DWARF debug information can be of - substantial size and negatively impact the page performance, particularly - compiling and loading of the module. Debug information can also be emitted in - a file on the side instead with the - :ref:`-gseparate-dwarf ` option! The debug information - size also affects the linking time, because the debug information in all - object files needs to be linked as well. Passing the - :ref:`-gsplit-dwarf ` option can help here, which causes - clang to leave debug information scattered across object files. That debug - information needs to be linked into a DWARF package file (``.dwp``) using the - ``emdwp`` tool then, but that could happen in parallel to the linking of - the compiled output! When running it - after linking, it's as simple as ``emdwp -e foo.wasm -o foo.wasm.dwp``, or - ``emdwp -e foo.debug.wasm -o foo.debug.wasm.dwp`` when used together with - ``-gseparate-dwarf`` (the dwp file should have the same file name as the main - symbol file with an extra ``.dwp`` extension). - -The ``-g`` flag can also be specified with an integer levels: -:ref:`-g0 `, :ref:`-g1 `, :ref:`-g2 ` (default with -``-gsource-map``), and :ref:`-g3 ` (default with ``-g``). Each level -builds on the last to provide progressively more debug information in the -compiled output. - -.. note:: Because Binaryen optimization degrades the quality of DWARF info further, ``-O1 -g`` will skip running the Binaryen optimizer (``wasm-opt``) entirely unless required by other options. You can also throw in ``-sERROR_ON_WASM_CHANGES_AFTER_LINK`` option if you want to ensure the debug info is preserved. See `Skipping Binaryen `_ for more details. - -.. note:: Some optimizations may be disabled when used in conjunction with the debug flags both in the Binaryen optimizer (even if it runs) and JavaScript optimizer. For example, if you compile with ``-O3 -g``, the Binaryen optimizer will skip some of the optimization passes that do not produce valid DWARF information, and also some of the normal JavaScript optimization will be disabled in order to better provide the requested debugging information. +One of the main advantages of debugging cross-platform Emscripten code is that the same cross-platform source code can be debugged on either the native platform or using the web browser's increasingly powerful toolset — including a debugger, profiler, and other tools. + +This article describes the main tools and settings provided by Emscripten for debugging, organized by common developer use cases. + +Overview of Debugging Flags +=========================== + +Emscripten offers a variety of flags to control the generation of debug information. Here is a summary of the most common ones: + +.. list-table:: + :header-rows: 1 + :widths: 20 60 20 + :class: wrap-table-content + + * - Flag + - Primary Use Case + - More Info + * - ``-g`` + - Interactive, source-level debugging with full DWARF information. May disable some optimizations. + - :ref:`emcc-g` + * - ``-gsource-map`` + - Symbolicating production crash logs with source maps. Designed to work with optimizations. + - :ref:`emcc-gsource-map` + * - ``-fsanitize=address`` + - Detecting memory errors (buffer overflows, use-after-free, memory leaks). + - `Clang docs `_ + * - ``-fsanitize=undefined`` + - Detecting undefined behavior (e.g., null pointer dereferences, integer overflow). + - `Clang docs `_ + * - ``-sSAFE_HEAP=1`` + - Checking for memory access errors like null pointer dereferences and unaligned access. + - :ref:`SAFE_HEAP` + * - ``-sASSERTIONS=1`` + - Enabling runtime checks for common errors and incorrect program flow. + - :ref:`ASSERTIONS` + * - ``--profiling`` + - Building with information for speed profiling in the browser's devtools. + - :ref:`emcc-profiling` + * - ``--memoryprofiler`` + - Embedding a visual memory allocation tracker in the generated HTML. + - :ref:`emcc-memoryprofiler` + + +Emitting and controlling debug information +========================================== +Debugging-related information comes in several forms: in Wasm object and binary files (DWARF +sections, Wasm name section), side output files (source maps, symbol maps, DWARF sidecar and package files), +and even in the code itself (assertions and instrumentation, whitespace). +In a traditional Unix-style C toolchain, flags such as ``-g`` are passed to the compiler, placing +DWARF sections in the object files. This DWARF info is combined by the linker and appears in the +output, independently of any optimization settings. +In contrast, although :ref:`Emcc ` supports many common +`clang flags ` to generate DWARF into +the object files, final debug output is largely controlled by link-time flags, and is more affected +by optimization. +For example ``emcc`` strips out most of the debug information after linking if a debugging-related +flag is not provided at link time, even if the input object files contain DWARF. + +In addition to DWARF, wasm files may contain a name section (link) which includes names for each +function; these function names are displayed by browsers when they generate stack traces and in +developer tools. (more info). Source maps are also supported (see below). + +This document contains an overview of the flags used to emit and control debugging behavior, and +use-case-based examples. + +DWARF: +Amount of debug information generated: ``-gN``, ``-gline-tables-only`` +Type of debug information in the binary: ``-gdwarf-5`` (others?) +Where DWARF is written: ``-gsplit-dwarf``, ``-gseparate-dwarf`` -.. _debugging-EMCC_DEBUG: +Type of debug information generated: (dwarf flags), ``-gname``, ``--profiling-funcs``, ``--profiling`` +Type of debug information generated alongside: ``-gsource-maps``, ``--emit-symbol-map`` -Debug mode (EMCC_DEBUG) -======================= +JS Minification: ``--profiling``, ``--minify=0`` -The ``EMCC_DEBUG`` environment variable can be set to enable Emscripten's debug mode: +Runtime safety and bug detection: ``-fsanitize=address|undefined|leak``, ``-sASSERTIONS`` -.. code-block:: bash +Flags that cause DWARF generation also generate a name section in the binary and suppress +minification of the JS glue file (since most DWARF use cases are for interactive debugging). +Other flags should affect only a specific behavior or type of debug info, and are generally +composable. + + + + +Interactive, Source-Level Debugging +============================================= + +For stepping through C/C++ source code in a browser's debugger, you can use debug information in either DWARF or source map format. + +DWARF offers the most precise and detailed debugging experience and is supported in Chrome with an +`extension `. +See `here `_ for a detailed usage guide. +Source maps are more widely supported in Firefox and Safari, but they provide only location mapping +and cannot be used to inspect variables. + + +DWARF can be produced at compile time with the *emcc* :ref:`-g flag `. Be aware that optimixation levels above +:ref:`-O1 ` aincreasingly remove LLVM debug information, and optimization flags at link time also disable +runtime :ref:`ASSERTIONS ` checks. +Passing a ``-g`` flag at link time also affects the generated JavaScript code and preserves white-space, function names, and variable names. + +The ``-g`` flag can also be specified with integer levels: :ref:`-g0 `, :ref:`-g1 `, :ref:`-g2 `, +and :ref:`-g3 ` (default with ``-g``). Each level builds on the last to provide progressively more debug information. +(TODO compile vs link) + +.. tip:: Even for medium-sized projects, DWARF debug information can be of substantial size. Debug information can be emitted in a + separate file with the :ref:`-gseparate-dwarf ` option. To speed up linking, + the :ref:`-gsplit-dwarf ` option can be used. See the next section for more ways to reduce debug info size (TODO update). + +.. note:: Because Binaryen optimization degrades the quality of DWARF info further, higher link-time optimization settings are + not recommended. The ``-O1`` setting will skip running the Binaryen + optimizer (``wasm-opt``) entirely unless required by other options. You can also add the + ``-sERROR_ON_WASM_CHANGES_AFTER_LINK`` option if you want to ensure the debug info is preserved. + See `Skipping Binaryen `_ for more details. +(TODO update) + + +Symbolizing Production Crash Logs +============================================= + +Even when not using an interactive debugger, it's valuable to have source information for compiled +code locations, particularly for stack traces or crash logs. This is also true for fully-optimized +production builds. + +`Source maps ` are commonly used for langauges that compile +to JavaScript (mapping locations in the compiled JS output to locations in the original source +code), but WebAssembly is also supported. Emscripten can emit ource maps can be emitted with +the :ref:`-gsource-map ` link-time option. Source maps are preserved even with +full post-link optimizations, so they work well for this use case. + +DWARF can also be used for this purpose. Typically a binary containing DWARF would be generated +at build time, and then stripped. The stripped copy would be served to users, and the original +would be saved for symbolication purposes. For this use case, full information about about types +and variables from the sources isn't needed; the `-gline-tables-only` compile-time flag causes +clang to generate only the line table information, saving DWARF size and compile/linking time. + +Source maps are easier to parse and more widely supported by ecosystem tooling. And as noted +above, preserving DWARF inhibits some Binaryen optimizations. However DWARF has the advantage +that it includes information about inlining, which can result in more accurate stack traces. - # Linux or macOS - EMCC_DEBUG=1 emcc test/hello_world.cpp -o hello.html +(TODO: -g1 at compile time on native generates DWARF but not for emscripten) - # Windows - set EMCC_DEBUG=1 - emcc test/hello_world.cpp -o hello.html - set EMCC_DEBUG=0 +Emscripten includes a tool called `emsymbolizer` that can map wasm code addresses to sources +using several different kinds of debug info, including DWARF (in wasm object or linked files) +and source maps for line/column info, and symbol maps (see :ref:`emcc-emit-symbol-map`), +name sections and object file symbol tables for function names. -With ``EMCC_DEBUG=1`` set, :ref:`emcc ` emits debug output and generates intermediate files for the compiler's various stages. ``EMCC_DEBUG=2`` additionally generates intermediate files for each JavaScript optimizer pass. -The debug logs and intermediate files are output to -**TEMP_DIR/emscripten_temp**, where ``TEMP_DIR`` is the OS default temporary -directory (e.g. **/tmp** on UNIX). +Fast Edit+Compile with minimal debug information +================================================ -The debug logs can be analysed to profile and review the changes that were made in each step. +When you want the fastest builds, you generally want to avoid generating large debug information +during compile, because it takes time to link into the final binary. It is still worthwhile to use +the ``--profiling`` (TODO gnames?) +flag because browsers understand the name section even when devtools are not in use, resulting in +more useful stack traces at minimal cost. -.. note:: The more limited amount of debug information can also be enabled by specifying the :ref:`verbose output ` compiler flag (``emcc -v``). -.. _debugging-compilation-settings: -Compiler settings -================== +Detecting Memory Errors and Undefined Behavior +============================================== -Emscripten has a number of compiler settings that can be useful for debugging. These are set using the :ref:`emcc -s` option, and will override any optimization flags. For example: +Emscripten has a number of compiler settings that can be useful for catching errors at runtime. +These are set using the :ref:`emcc -s` option, and will override any optimization flags. For example: .. code-block:: bash @@ -111,21 +180,20 @@ Some important settings are: - .. _debugging-ASSERTIONS: - ``ASSERTIONS=1`` is used to enable runtime checks for common memory allocation errors (e.g. writing more memory than was allocated). It also defines how Emscripten should handle errors in program flow. The value can be set to ``ASSERTIONS=2`` in order to run additional tests. - - ``ASSERTIONS=1`` is enabled by default. Assertions are turned off for optimized code (:ref:`-O1 ` and above). + ``ASSERTIONS=1`` is used to enable runtime checks for many types of common errors. It also + defines how Emscripten should handle errors in program flow. The value can be set to ``ASSERTIONS=2`` in order to run additional tests. ``ASSERTIONS=1`` is enabled by default at ``-O0``. - .. _debugging-SAFE-HEAP: - ``SAFE_HEAP=1`` adds additional memory access checks, and will give clear errors for problems like dereferencing 0 and memory alignment issues. - - You can also set ``SAFE_HEAP_LOG`` to log ``SAFE_HEAP`` operations. + ``SAFE_HEAP=1`` adds additional memory access checks with a Binaryen pass, and will give clear + errors for problems like dereferencing 0 and memory alignment issues. + You can also set ``SAFE_HEAP_LOG`` to log ``SAFE_HEAP`` operations. - .. _debugging-STACK_OVERFLOW_CHECK: - Passing the ``STACK_OVERFLOW_CHECK=1`` linker flag adds a runtime magic + ``STACK_OVERFLOW_CHECK=1`` adds a runtime magic token value at the end of the stack, which is checked in certain locations to verify that the user code does not accidentally write past the end of the stack. While overrunning the Emscripten stack is not a security issue for @@ -137,214 +205,145 @@ Some important settings are: performance. Default value is 1 if ``ASSERTIONS=1`` is set, and disabled otherwise. -A number of other useful debug settings are defined in `src/settings.js `_. For more information, search that file for the keywords "check" and "debug". -.. _debugging-sanitizers: +TODO: do these actually change optimization flags? -Sanitizers -========== +A number of other useful debug settings are defined in `src/settings.js `_. For more information, search that file for the keywords "check" and "debug". -Emscripten also supports some of Clang's sanitizers, such as :ref:`sanitizer_ubsan` and :ref:`sanitizer_asan`. +In addition to these settings, Emscripten also supports some of Clang's sanitizers, such as the Undefined Behaviour Sanitizer (UBSan) and the Address Sanitizer (ASan). For more information, see :ref:`Sanitizers`. -.. _debugging-emcc-v: +.. _debugging-profiling: -emcc verbose output -=================== +Profiling Performance +===================== -Compiling with the :ref:`emcc -v ` will cause Emscripten to output -the sub-command that it runs as well as passes ``-v`` to Clang. +Speed +----- -.. _debugging-manual-debugging: +To profile your code for speed, build with :ref:`profiling info `, +then run the code in the browser's devtools profiler. You should then be able to +see in which functions is most of the time spent. -Manual print debugging -====================== +Memory +------ -You can also manually instrument the source code with ``printf()`` statements, then compile and run the code to investigate issues. Note that ``printf()`` is line-buffered, make sure to add ``\n`` to see output in the console. +The browser's memory profiling tools generally only understand +allocations at the JavaScript level. From that perspective, the entire linear +memory that the emscripten-compiled application uses is a single big allocation +(of a ``WebAssembly.Memory``). +To get information about usage inside that object, you need other tools: + +* Emscripten supports the `mallinfo() `_, +API, which gives you information from ``dlmalloc`` about current allocations. +* Emscripten also has a ``--memoryprofiler`` option that displays memory usage in a visual manner. +Note that you need to emit HTML (e.g. with a command like +``emcc test/hello_world.c --memoryprofiler -o page.html``) as the memory profiler +output is rendered onto the page. To view it, load ``page.html`` in your +browser (remember to use a :ref:`local webserver `). The display +auto-updates, so you can open the devtools console and run a command like +``_malloc(1024 * 1024)``. That will allocate 1MB of memory, which will then show +up on the memory profiler display. -If you have a good idea of the problem line you can add ``print(new Error().stack)`` to the JavaScript to get a stack trace at that point. +.. _other-debugging-tools: -Debug printouts can even execute arbitrary JavaScript. For example:: +Other Debugging Tools and Techniques +==================================== - function _addAndPrint($left, $right) { - $left = $left | 0; - $right = $right | 0; - //--- - if ($left < $right) console.log('l` will cause emcc to output +the sub-commands that it runs as well as passes ``-v`` to Clang. +The ``EMCC_DEBUG`` environment variable can be set to emit even more debug +output and generate intermediate files for the compiler's various stages. -Debugging with Chrome Devtools -============================== +.. _debugging-manual-debugging: -Chrome devtools support source-level debugging on WebAssembly files with DWARF information. To use that, you need the Wasm debugging extension plugin here: -https://goo.gle/wasm-debugging-extension +Manual print debugging +---------------------- -See `Debugging WebAssembly with modern tools -`_ for the details. +You can also manually instrument the source code with ``printf()`` statements, +then compile and run the code to investigate issues. The output from the `stdout` and `stderr` +streams is copied to the browser console by default. Note that ``printf()`` is +line-buffered, make sure to add ``\n`` to see output in the console. The functions +in the :ref:`console.h ` header can also be used to access the console +more directly. +.. _debugging-autodebugger: -Handling C++ Exceptions from JavaScript -======================================= +AutoDebugger +------------ -See :ref:`handling-c-exceptions-from-javascript`. +The *AutoDebugger* is the 'nuclear option' for debugging Emscripten code. It will rewrite the +output so it prints out each store to memory. This is useful for comparing the output for +different compiler settings in order to detect regressions. To run the *AutoDebugger*, compile +with the environment variable ``EMCC_AUTODEBUG=1`` set. +.. warning:: This option is primarily intended for Emscripten core developers. .. _debugging-emscripten-specific-issues: -Emscripten-specific issues +Emscripten-Specific Issues ========================== Memory Alignment Issues ----------------------- -The :ref:`Emscripten memory representation ` is compatible with C and C++. However, when undefined behavior is involved you may see differences with native architectures, and also differences between Emscripten's output for asm.js and WebAssembly: +The :ref:`Emscripten memory representation ` is compatible with C and C++. +However, when undefined behavior is involved you may see differences with native architectures: -- In asm.js, loads and stores must be aligned, and performing a normal load or store on an unaligned address can fail silently (access the wrong address). If the compiler knows a load or store is unaligned, it can emulate it in a way that works but is slow. -- In WebAssembly, unaligned loads and stores will work. Each one is annotated with its expected alignment. If the actual alignment does not match, it will still work, but may be slow on some CPU architectures. +- In asm.js, unaligned loads and stores can fail silently (i.e. access the wrong address). +- In WebAssembly, unaligned loads and stores will work; each may be annotated with its expected + alignment. If the actual alignment does not match, it may be very slow on some systems. .. tip:: :ref:`SAFE_HEAP ` can be used to reveal memory alignment issues. -Generally it is best to avoid unaligned reads and writes — often they occur as the result of undefined behavior, as mentioned above. In some cases, however, they are unavoidable — for example if the code to be ported reads an ``int`` from a packed structure in some pre-existing data format. In that case, to make things work properly in asm.js, and be fast in WebAssembly, you must be sure that the compiler knows the load or store is unaligned. To do so you can: +Generally it is best to avoid unaligned reads and writesoften they occur as the result of +undefined behavior, as mentioned above. In some cases, however, they are unavoidable — for example +if the code to be ported reads an ``int`` from a packed structure in some pre-existing data format. +In that case, to make things work properly in asm.js, and be fast in WebAssembly, you must be sure +that the compiler knows the load or store is unaligned. To do so you can: - Manually read individual bytes and reconstruct the full value -- Use the :c:type:`emscripten_align* ` typedefs, which define unaligned versions of the basic types (``short``, ``int``, ``float``, ``double``). All operations on those types are not fully aligned (use the ``1`` variants in most cases, which mean no alignment whatsoever). - +- Use the :c:type:`emscripten_align* ` typedefs, which define unaligned + versions of the basic types (``short``, ``int``, ``float``, ``double``). All operations on those + types are not fully aligned (use the ``1`` variants in most cases, which mean no alignment + whatsoever). Function Pointer Issues ----------------------- If you get an ``abort()`` from a function pointer call to ``nullFunc`` or ``b0`` or ``b1`` (possibly with an error message saying "incorrect function pointer"), the problem is that the function pointer was not found in the expected function pointer table when called. - -.. note:: ``nullFunc`` is the function used to populate empty index entries in the function pointer tables (``b0`` and ``b1`` are shorter names used for ``nullFunc`` in more optimized builds). A function pointer to an invalid index will call this function, which simply calls ``abort()``. +-.. note:: ``nullFunc`` is the function used to populate empty index entries in the function pointer tables (``b0`` and ``b1`` are shorter names used for ``nullFunc`` in more optimized builds). A function pointer to an invalid index will call this function, which simply calls ``abort()`` There are several possible causes: - Your code is calling a function pointer that has been cast from another type (this is undefined behavior but it does happen in real-world code). In optimized Emscripten output, each function pointer type is stored in a separate table based on its original signature, so you *must* call a function pointer with that same signature to get the right behavior (see :ref:`portability-function-pointer-issues` in the code portability section for more information). - Your code is calling a method on a ``NULL`` pointer or dereferencing 0. This sort of bug can be caused by any sort of coding error, but manifests as a function pointer error because the function can't be found in the expected table at runtime. -In order to debug these sorts of issues: -- Compile with ``-Werror``. This turns warnings into errors, which can be useful as some cases of undefined behavior would otherwise show warnings. +To debug these sorts of issues: + +- Compile with ``-Werror`` (or otherwise fix warnings, many of which highlight undefined behavior). - Use ``-sASSERTIONS=2`` to get some useful information about the function pointer being called, and its type. - Look at the browser stack trace to see where the error occurs and which function should have been called. - Enable clang warnings on dangerous function pointer casts using ``-Wcast-function-type``. - Build with :ref:`SAFE_HEAP=1 `. - :ref:`Sanitizers` can help here, in particular UBSan. -Another function pointer issue is when the wrong function is called. :ref:`SAFE_HEAP=1 ` can help with this as it detects some possible errors with function table accesses. - Infinite loops -------------- Infinite loops cause your page to hang. After a period the browser will notify the user that the page is stuck and offer to halt or close it. - If your code hits an infinite loop, one easy way to find the problem code is to use a *JavaScript profiler*. In the Firefox profiler, if the code enters an infinite loop you will see a block of code doing the same thing repeatedly near the end of the profile. - .. note:: The :ref:`emscripten-runtime-environment-main-loop` may need to be re-coded if your application uses an infinite main loop. -.. _debugging-profiling: - -Profiling -========= - -Speed ------ - -To profile your code for speed, build with :ref:`profiling info `, -then run the code in the browser's devtools profiler. You should then be able to -see in which functions is most of the time spent. - -.. _debugging-profiling-memory: - -Memory ------- - -The browser's memory profiling tools generally only understand -allocations at the JavaScript level. From that perspective, the entire linear -memory that the emscripten-compiled application uses is a single big allocation -(of a ``WebAssembly.Memory``). The devtools will not show information about -usage inside that object, so you need other tools for that, which we will now -describe. - -Emscripten supports -`mallinfo() `_, which lets -you get information from ``dlmalloc`` about current allocations. For example -usage, see -`the test `_. - -Emscripten also has a ``--memoryprofiler`` option that displays memory usage -in a visual manner, letting you see how fragmented it is and so forth. To use -it, you can do something like - -.. code-block:: bash - - emcc test/hello_world.c --memoryprofiler -o page.html - -Note that you need to emit HTML as in that example, as the memory profiler -output is rendered onto the page. To view it, load ``page.html`` in your -browser (remember to use a :ref:`local webserver `). The display -auto-updates, so you can open the devtools console and run a command like -``_malloc(1024 * 1024)``. That will allocate 1MB of memory, which will then show -up on the memory profiler display. - -.. _debugging-autodebugger: - -AutoDebugger -============ - -The *AutoDebugger* is the 'nuclear option' for debugging Emscripten code. - -.. warning:: This option is primarily intended for Emscripten core developers. - -The *AutoDebugger* will rewrite the output so it prints out each store to memory. This is useful because you can compare the output for different compiler settings in order to detect regressions. - -The *AutoDebugger* can potentially find **any** problem in the generated code, so it is strictly more powerful than the ``CHECK_*`` settings and ``SAFE_HEAP``. One use of the *AutoDebugger* is to quickly emit lots of logging output, which can then be reviewed for odd behavior. The *AutoDebugger* is also particularly useful for :ref:`debugging regressions `. - -The *AutoDebugger* has some limitations: - -- It generates a lot of output. Using *diff* can be very helpful for identifying changes. -- It prints out simple numerical values rather than pointer addresses (because pointer addresses change between runs, and hence can't be compared). This is a limitation because sometimes inspection of addresses can show errors where the pointer address is 0 or impossibly large. It is possible to modify the tool to print out addresses as integers in ``tools/autodebugger.py``. - -To run the *AutoDebugger*, compile with the environment variable ``EMCC_AUTODEBUG=1`` set. For example: - -.. code-block:: bash - - # Linux or macOS - EMCC_AUTODEBUG=1 emcc test/hello_world.cpp -o hello.html - - # Windows - set EMCC_AUTODEBUG=1 - emcc test/hello_world.cpp -o hello.html - set EMCC_AUTODEBUG=0 - - -.. _debugging-autodebugger-regressions: - -AutoDebugger Regression Workflow ---------------------------------- - -Use the following workflow to find regressions with the *AutoDebugger*: - -- Compile the working code with ``EMCC_AUTODEBUG=1`` set in the environment. -- Compile the code using ``EMCC_AUTODEBUG=1`` in the environment again, but this time with the settings that cause the regression. Following this step we have one build before the regression and one after. -- Run both versions of the compiled code and save their output. -- Compare the output using a *diff* tool. - -Any difference between the outputs is likely to be caused by the bug. - -.. note:: - You may want to use ``-sDETERMINISTIC`` which will ensure that timing - and other issues don't cause false positives. - - Useful Links ============ -- `Blogpost about reading compiler output `_. -- `GDC 2014: Getting started with asm.js and Emscripten `_ (Debugging slides). - `Links to Wasm debugging-related documents `_ From 3c7cad00cca8a6839373d1b4a822dd6f25e1d6c1 Mon Sep 17 00:00:00 2001 From: Derek Schuff Date: Wed, 20 Aug 2025 00:39:00 +0000 Subject: [PATCH 02/12] quality pass --- site/source/docs/porting/Debugging.rst | 133 ++++++++++++---------- site/source/docs/tools_reference/emcc.rst | 7 +- test/test_other.py | 1 + 3 files changed, 80 insertions(+), 61 deletions(-) diff --git a/site/source/docs/porting/Debugging.rst b/site/source/docs/porting/Debugging.rst index 78a17a01a554f..8aec9b9dff2ef 100644 --- a/site/source/docs/porting/Debugging.rst +++ b/site/source/docs/porting/Debugging.rst @@ -29,10 +29,10 @@ Emscripten offers a variety of flags to control the generation of debug informat - :ref:`emcc-gsource-map` * - ``-fsanitize=address`` - Detecting memory errors (buffer overflows, use-after-free, memory leaks). - - `Clang docs `_ + - `Clang ASan docs `_ * - ``-fsanitize=undefined`` - Detecting undefined behavior (e.g., null pointer dereferences, integer overflow). - - `Clang docs `_ + - `Clang UBSan docs `_ * - ``-sSAFE_HEAP=1`` - Checking for memory access errors like null pointer dereferences and unaligned access. - :ref:`SAFE_HEAP` @@ -44,7 +44,8 @@ Emscripten offers a variety of flags to control the generation of debug informat - :ref:`emcc-profiling` * - ``--memoryprofiler`` - Embedding a visual memory allocation tracker in the generated HTML. - - :ref:`emcc-memoryprofiler` + - :ref:`emcc-profiling` + Emitting and controlling debug information @@ -52,25 +53,16 @@ Emitting and controlling debug information Debugging-related information comes in several forms: in Wasm object and binary files (DWARF sections, Wasm name section), side output files (source maps, symbol maps, DWARF sidecar and package files), and even in the code itself (assertions and instrumentation, whitespace). -In a traditional Unix-style C toolchain, flags such as ``-g`` are passed to the compiler, placing -DWARF sections in the object files. This DWARF info is combined by the linker and appears in the -output, independently of any optimization settings. -In contrast, although :ref:`Emcc ` supports many common -`clang flags ` to generate DWARF into -the object files, final debug output is largely controlled by link-time flags, and is more affected -by optimization. -For example ``emcc`` strips out most of the debug information after linking if a debugging-related -flag is not provided at link time, even if the input object files contain DWARF. - -In addition to DWARF, wasm files may contain a name section (link) which includes names for each +For information on DWARF, see :ref:`below `. +In addition to DWARF, wasm files may contain a name section (TODO link) which includes names for each function; these function names are displayed by browsers when they generate stack traces and in -developer tools. (more info). Source maps are also supported (see below). +developer tools. (TODO more info?). Source maps are also supported (see :ref:`below `). This document contains an overview of the flags used to emit and control debugging behavior, and use-case-based examples. DWARF: -Amount of debug information generated: ``-gN``, ``-gline-tables-only`` +Amount of debug information generated: ``-g``, ``-g``, ``-gline-tables-only`` Type of debug information in the binary: ``-gdwarf-5`` (others?) Where DWARF is written: ``-gsplit-dwarf``, ``-gseparate-dwarf`` @@ -88,40 +80,59 @@ composable. - Interactive, Source-Level Debugging ============================================= For stepping through C/C++ source code in a browser's debugger, you can use debug information in either DWARF or source map format. -DWARF offers the most precise and detailed debugging experience and is supported in Chrome with an -`extension `. +DWARF offers the best debugging experience and is supported in Chrome with an +`extension `_. See `here `_ for a detailed usage guide. -Source maps are more widely supported in Firefox and Safari, but they provide only location mapping -and cannot be used to inspect variables. +Source maps are more widely supported, but they provide only location mapping +and cannot be used easily to inspect variables. + - -DWARF can be produced at compile time with the *emcc* :ref:`-g flag `. Be aware that optimixation levels above -:ref:`-O1 ` aincreasingly remove LLVM debug information, and optimization flags at link time also disable -runtime :ref:`ASSERTIONS ` checks. -Passing a ``-g`` flag at link time also affects the generated JavaScript code and preserves white-space, function names, and variable names. +.. _debugging-dwarf: + +DWARF +----- + +In a traditional Unix-style C toolchain, flags such as ``-g`` are passed to the compiler, placing +DWARF sections in the object files. This DWARF info is combined by the linker and appears in the +output, independently of any optimization settings. +In contrast, although :ref:`Emcc ` supports many of the common +`clang flags `_ to generate DWARF into +the object files, final debug output is also controlled by link-time flags, and is more affected +by optimization. +For example ``emcc`` strips out most of the debug information after linking if a debugging-related +flag is not provided at link time, even if the input object files contain DWARF. + +DWARF can be produced at compile time with the *emcc* :ref:`-g flag `. Optimization levels above +:ref:`-O1 ` or :ref:`-Og ` increasingly remove LLVM debug information (as with other architectures), +and optimization flags at link time also disable Emscripten's runtime :ref:`ASSERTIONS ` checks. +Passing a ``-g`` flag at link time also affects the generated JavaScript code (preserving white-space, function names, and variable names). The ``-g`` flag can also be specified with integer levels: :ref:`-g0 `, :ref:`-g1 `, :ref:`-g2 `, -and :ref:`-g3 ` (default with ``-g``). Each level builds on the last to provide progressively more debug information. -(TODO compile vs link) +and :ref:`-g3 ` (default with ``-g``). At compile time these flags control the amount of DWARF in the object files. +At link time, each adds sucessively more kinds of information in the wasm and JS files (DWARF is only retained after linking +when using ``-g`` or ``-g3``). -.. tip:: Even for medium-sized projects, DWARF debug information can be of substantial size. Debug information can be emitted in a +.. tip:: Even for medium-sized projects, DWARF debug information can be large. Debug information can be emitted in a separate file with the :ref:`-gseparate-dwarf ` option. To speed up linking, - the :ref:`-gsplit-dwarf ` option can be used. See the next section for more ways to reduce debug info size (TODO update). + the :ref:`-gsplit-dwarf ` option can be used at compile time. + See `this article `_ + for more details on debugging large files, and see + :ref:`the next section ` for more ways to reduce debug info size. .. note:: Because Binaryen optimization degrades the quality of DWARF info further, higher link-time optimization settings are not recommended. The ``-O1`` setting will skip running the Binaryen optimizer (``wasm-opt``) entirely unless required by other options. You can also add the ``-sERROR_ON_WASM_CHANGES_AFTER_LINK`` option if you want to ensure the debug info is preserved. See `Skipping Binaryen `_ for more details. -(TODO update) +.. _debugging-symbolization: + Symbolizing Production Crash Logs ============================================= @@ -129,17 +140,18 @@ Even when not using an interactive debugger, it's valuable to have source inform code locations, particularly for stack traces or crash logs. This is also true for fully-optimized production builds. -`Source maps ` are commonly used for langauges that compile +`Source maps `_ are commonly used for langauges that compile to JavaScript (mapping locations in the compiled JS output to locations in the original source -code), but WebAssembly is also supported. Emscripten can emit ource maps can be emitted with -the :ref:`-gsource-map ` link-time option. Source maps are preserved even with +code), but WebAssembly is also supported. Emscripten can emit source maps with +the :ref:`-gsource-map ` link-time flag. Source maps are preserved even with full post-link optimizations, so they work well for this use case. DWARF can also be used for this purpose. Typically a binary containing DWARF would be generated at build time, and then stripped. The stripped copy would be served to users, and the original would be saved for symbolication purposes. For this use case, full information about about types -and variables from the sources isn't needed; the `-gline-tables-only` compile-time flag causes -clang to generate only the line table information, saving DWARF size and compile/linking time. +and variables from the sources isn't needed; the +`-gline-tables-only `_ +compile-time flag causes clang to generate only the line table information, saving DWARF size and compile/linking time. Source maps are easier to parse and more widely supported by ecosystem tooling. And as noted above, preserving DWARF inhibits some Binaryen optimizations. However DWARF has the advantage @@ -147,7 +159,7 @@ that it includes information about inlining, which can result in more accurate s (TODO: -g1 at compile time on native generates DWARF but not for emscripten) -Emscripten includes a tool called `emsymbolizer` that can map wasm code addresses to sources +Emscripten includes a tool called ``emsymbolizer`` that can map wasm code addresses to sources using several different kinds of debug info, including DWARF (in wasm object or linked files) and source maps for line/column info, and symbol maps (see :ref:`emcc-emit-symbol-map`), name sections and object file symbol tables for function names. @@ -159,21 +171,25 @@ Fast Edit+Compile with minimal debug information When you want the fastest builds, you generally want to avoid generating large debug information during compile, because it takes time to link into the final binary. It is still worthwhile to use the ``--profiling`` (TODO gnames?) -flag because browsers understand the name section even when devtools are not in use, resulting in -more useful stack traces at minimal cost. - +flag (at link time only) because browsers understand the name section even when devtools are not +in use, resulting in more useful stack traces at minimal cost. Detecting Memory Errors and Undefined Behavior ============================================== -Emscripten has a number of compiler settings that can be useful for catching errors at runtime. -These are set using the :ref:`emcc -s` option, and will override any optimization flags. For example: +The best tools for detecting memory safety and undefined behavior issues. are Clang's sanitizers, +such as the Undefined Behaviour Sanitizer (UBSan) and the Address Sanitizer (ASan). +For more information, see :ref:`Sanitizers`. + + +Emscripten has several other compiler settings that can be useful for catching errors at runtime. +These are set using the :ref:`emcc -s` option, and will override any optimization flags (TODO is this true?). For example: .. code-block:: bash - emcc -O1 -sASSERTIONS test/hello_world + emcc -O1 -sASSERTIONS test/hello_world.c Some important settings are: @@ -181,14 +197,16 @@ Some important settings are: .. _debugging-ASSERTIONS: ``ASSERTIONS=1`` is used to enable runtime checks for many types of common errors. It also - defines how Emscripten should handle errors in program flow. The value can be set to ``ASSERTIONS=2`` in order to run additional tests. ``ASSERTIONS=1`` is enabled by default at ``-O0``. + defines how Emscripten should handle errors in program flow. The value can be set to + ``ASSERTIONS=2`` in order to run additional tests. ``ASSERTIONS=1`` is enabled by default at + ``-O0``. - .. _debugging-SAFE-HEAP: ``SAFE_HEAP=1`` adds additional memory access checks with a Binaryen pass, and will give clear - errors for problems like dereferencing 0 and memory alignment issues. - You can also set ``SAFE_HEAP_LOG`` to log ``SAFE_HEAP`` operations. + errors for problems like dereferencing 0 and memory alignment issues. + You can also set ``SAFE_HEAP_LOG`` to log ``SAFE_HEAP`` operations. (TODO: any advantages over ASan?) - .. _debugging-STACK_OVERFLOW_CHECK: @@ -206,11 +224,9 @@ Some important settings are: otherwise. -TODO: do these actually change optimization flags? A number of other useful debug settings are defined in `src/settings.js `_. For more information, search that file for the keywords "check" and "debug". -In addition to these settings, Emscripten also supports some of Clang's sanitizers, such as the Undefined Behaviour Sanitizer (UBSan) and the Address Sanitizer (ASan). For more information, see :ref:`Sanitizers`. .. _debugging-profiling: @@ -234,15 +250,15 @@ memory that the emscripten-compiled application uses is a single big allocation To get information about usage inside that object, you need other tools: * Emscripten supports the `mallinfo() `_, -API, which gives you information from ``dlmalloc`` about current allocations. + API, which gives you information from ``dlmalloc`` about current allocations. * Emscripten also has a ``--memoryprofiler`` option that displays memory usage in a visual manner. -Note that you need to emit HTML (e.g. with a command like -``emcc test/hello_world.c --memoryprofiler -o page.html``) as the memory profiler -output is rendered onto the page. To view it, load ``page.html`` in your -browser (remember to use a :ref:`local webserver `). The display -auto-updates, so you can open the devtools console and run a command like -``_malloc(1024 * 1024)``. That will allocate 1MB of memory, which will then show -up on the memory profiler display. + Note that you need to emit HTML (e.g. with a command like + ``emcc test/hello_world.c --memoryprofiler -o page.html``) as the memory profiler + output is rendered onto the page. To view it, load ``page.html`` in your + browser (remember to use a :ref:`local webserver `). The display + auto-updates, so you can open the devtools console and run a command like + ``_malloc(1024 * 1024)``. That will allocate 1MB of memory, which will then show + up on the memory profiler display. .. _other-debugging-tools: @@ -252,7 +268,7 @@ Other Debugging Tools and Techniques .. _debugging-EMCC_DEBUG: Debugging the compiler driver ------------------------ +----------------------------- Compiling with the :ref:`emcc -v ` will cause emcc to output the sub-commands that it runs as well as passes ``-v`` to Clang. @@ -316,7 +332,8 @@ Function Pointer Issues ----------------------- If you get an ``abort()`` from a function pointer call to ``nullFunc`` or ``b0`` or ``b1`` (possibly with an error message saying "incorrect function pointer"), the problem is that the function pointer was not found in the expected function pointer table when called. --.. note:: ``nullFunc`` is the function used to populate empty index entries in the function pointer tables (``b0`` and ``b1`` are shorter names used for ``nullFunc`` in more optimized builds). A function pointer to an invalid index will call this function, which simply calls ``abort()`` + +.. note:: ``nullFunc`` is the function used to populate empty index entries in the function pointer tables (``b0`` and ``b1`` are shorter names used for ``nullFunc`` in more optimized builds). A function pointer to an invalid index will call this function, which simply calls ``abort()``. There are several possible causes: diff --git a/site/source/docs/tools_reference/emcc.rst b/site/source/docs/tools_reference/emcc.rst index 266b3d5ed6ba6..b423b90774bfd 100644 --- a/site/source/docs/tools_reference/emcc.rst +++ b/site/source/docs/tools_reference/emcc.rst @@ -54,7 +54,7 @@ Options that are modified or new in *emcc* are listed below: ``-O1`` [compile+link] - Simple optimizations. During the compile step these include LLVM ``-O1`` optimizations. During the link step this does not include various runtime assertions in JS that `-O0` would do. + Simple optimizations. During the compile step these include LLVM ``-O1`` optimizations. During the link step this omits various runtime assertions in JS that `-O0` would include. .. _emcc-O2: @@ -68,7 +68,7 @@ Options that are modified or new in *emcc* are listed below: ``-O3`` [compile+link] - Like ``-O2``, but with additional optimizations that may take longer to run. + Like ``-O2``, but with additional optimizations that may take longer to run and may increase code size. .. note:: This is a good setting for a release build. @@ -244,7 +244,8 @@ Options that are modified or new in *emcc* are listed below: Save a map file between function indexes in the Wasm and function names. By storing the names on a file on the side, you can avoid shipping the names, and can still reconstruct meaningful stack traces by translating the indexes back - to the names. + to the names. This is a simpler format than source maps, but less detailed + because it only describes function names and not source locations. .. note:: When used with ``-sWASM=2``, two symbol files are created. ``[name].js.symbols`` (with WASM symbols) and ``[name].wasm.js.symbols`` (with ASM.js symbols) diff --git a/test/test_other.py b/test/test_other.py index 31b9eeed1e332..c937a8ab8d5c1 100644 --- a/test/test_other.py +++ b/test/test_other.py @@ -3161,6 +3161,7 @@ def test_dwarf_sourcemap_names(self): (['-g2', '-gsource-map'], False, True, True), (['-gsplit-dwarf', '-gsource-map'], True, True, True), (['-gsource-map', '-sERROR_ON_WASM_CHANGES_AFTER_LINK'], False, True, True), + (['-gsource-map', '-Og', '-sERROR_ON_WASM_CHANGES_AFTER_LINK'], False, True, True), (['-Oz', '-gsource-map'], False, True, True), ]: print(flags, expect_dwarf, expect_sourcemap, expect_names) From 4f27d9f36f2de6ef9c5d90c56bdd92404fc7e459 Mon Sep 17 00:00:00 2001 From: Derek Schuff Date: Fri, 22 Aug 2025 00:03:47 +0000 Subject: [PATCH 03/12] add examples, remove overview, add tests --- site/source/docs/porting/Debugging.rst | 99 +++++++++-------------- site/source/docs/tools_reference/emcc.rst | 9 ++- test/test_other.py | 3 + 3 files changed, 49 insertions(+), 62 deletions(-) diff --git a/site/source/docs/porting/Debugging.rst b/site/source/docs/porting/Debugging.rst index 8aec9b9dff2ef..0007940e7f80f 100644 --- a/site/source/docs/porting/Debugging.rst +++ b/site/source/docs/porting/Debugging.rst @@ -8,47 +8,8 @@ One of the main advantages of debugging cross-platform Emscripten code is that t This article describes the main tools and settings provided by Emscripten for debugging, organized by common developer use cases. -Overview of Debugging Flags -=========================== - -Emscripten offers a variety of flags to control the generation of debug information. Here is a summary of the most common ones: - -.. list-table:: - :header-rows: 1 - :widths: 20 60 20 - :class: wrap-table-content - - * - Flag - - Primary Use Case - - More Info - * - ``-g`` - - Interactive, source-level debugging with full DWARF information. May disable some optimizations. - - :ref:`emcc-g` - * - ``-gsource-map`` - - Symbolicating production crash logs with source maps. Designed to work with optimizations. - - :ref:`emcc-gsource-map` - * - ``-fsanitize=address`` - - Detecting memory errors (buffer overflows, use-after-free, memory leaks). - - `Clang ASan docs `_ - * - ``-fsanitize=undefined`` - - Detecting undefined behavior (e.g., null pointer dereferences, integer overflow). - - `Clang UBSan docs `_ - * - ``-sSAFE_HEAP=1`` - - Checking for memory access errors like null pointer dereferences and unaligned access. - - :ref:`SAFE_HEAP` - * - ``-sASSERTIONS=1`` - - Enabling runtime checks for common errors and incorrect program flow. - - :ref:`ASSERTIONS` - * - ``--profiling`` - - Building with information for speed profiling in the browser's devtools. - - :ref:`emcc-profiling` - * - ``--memoryprofiler`` - - Embedding a visual memory allocation tracker in the generated HTML. - - :ref:`emcc-profiling` - - - -Emitting and controlling debug information + +Overview: Emitting and Controlling Debug Information ========================================== Debugging-related information comes in several forms: in Wasm object and binary files (DWARF sections, Wasm name section), side output files (source maps, symbol maps, DWARF sidecar and package files), @@ -61,22 +22,11 @@ developer tools. (TODO more info?). Source maps are also supported (see :ref:`be This document contains an overview of the flags used to emit and control debugging behavior, and use-case-based examples. -DWARF: -Amount of debug information generated: ``-g``, ``-g``, ``-gline-tables-only`` -Type of debug information in the binary: ``-gdwarf-5`` (others?) -Where DWARF is written: ``-gsplit-dwarf``, ``-gseparate-dwarf`` - -Type of debug information generated: (dwarf flags), ``-gname``, ``--profiling-funcs``, ``--profiling`` -Type of debug information generated alongside: ``-gsource-maps``, ``--emit-symbol-map`` - -JS Minification: ``--profiling``, ``--minify=0`` - -Runtime safety and bug detection: ``-fsanitize=address|undefined|leak``, ``-sASSERTIONS`` - -Flags that cause DWARF generation also generate a name section in the binary and suppress -minification of the JS glue file (since most DWARF use cases are for interactive debugging). -Other flags should affect only a specific behavior or type of debug info, and are generally -composable. +Flags that cause DWARF generation (e.g. `-g3`, `-gline-tables-only`) also generate a name section +in the binary and suppress minification of the JS glue file (since most DWARF use cases are for +interactive debugging or where the binary will be stripped). +Other flags (e.g. `-g2`, `-gsource-map`) should affect only a specific behavior or type of debug info, +and are generally composable. @@ -113,10 +63,17 @@ and optimization flags at link time also disable Emscripten's runtime :ref:`ASSE Passing a ``-g`` flag at link time also affects the generated JavaScript code (preserving white-space, function names, and variable names). The ``-g`` flag can also be specified with integer levels: :ref:`-g0 `, :ref:`-g1 `, :ref:`-g2 `, -and :ref:`-g3 ` (default with ``-g``). At compile time these flags control the amount of DWARF in the object files. +and :ref:`-g3 ` (equivalent to ``-g``). At compile time these flags control the amount of DWARF in the object files. At link time, each adds sucessively more kinds of information in the wasm and JS files (DWARF is only retained after linking when using ``-g`` or ``-g3``). +Example: + +.. code-block:: bash + + emcc source.c -c -o source.o -g # source.o has DWARF sections + emcc source.o -o program.js -g # program.wasm has DWARF and a name section + .. tip:: Even for medium-sized projects, DWARF debug information can be large. Debug information can be emitted in a separate file with the :ref:`-gseparate-dwarf ` option. To speed up linking, the :ref:`-gsplit-dwarf ` option can be used at compile time. @@ -145,6 +102,8 @@ to JavaScript (mapping locations in the compiled JS output to locations in the o code), but WebAssembly is also supported. Emscripten can emit source maps with the :ref:`-gsource-map ` link-time flag. Source maps are preserved even with full post-link optimizations, so they work well for this use case. +Source maps are generated by Emscripten from DWARF information. Therefore the linked object +files must have DWARF. The final linked output will not have DWARF unless `-g` is also passed. DWARF can also be used for this purpose. Typically a binary containing DWARF would be generated at build time, and then stripped. The stripped copy would be served to users, and the original @@ -164,16 +123,32 @@ using several different kinds of debug info, including DWARF (in wasm object or and source maps for line/column info, and symbol maps (see :ref:`emcc-emit-symbol-map`), name sections and object file symbol tables for function names. +Examples: + +.. code-block:: bash + + emcc source.c -c -o source.o -g # source.o has DWARF sections (-gsource-map also works here) + emcc source.o -o program.js -gsource-map # program.wasm.map contains a source map + + emcc source.o -o program2.js -g # program2.wasm has DWARF + llvm-strip program2.wasm -o program2_stripped.wasm # program2_stripped.wasm has no debug info + Fast Edit+Compile with minimal debug information ================================================ When you want the fastest builds, you generally want to avoid generating large debug information during compile, because it takes time to link into the final binary. It is still worthwhile to use -the ``--profiling`` (TODO gnames?) +the ``--profiling`` (TODO gnames/g2?) flag (at link time only) because browsers understand the name section even when devtools are not in use, resulting in more useful stack traces at minimal cost. +Example: + +.. code-block:: bash + + emcc source.c -c -o source.o # source.o has no debug info + emcc source.o -o program.js -g2 # program.wasm has a name section, program.js is unminified Detecting Memory Errors and Undefined Behavior @@ -240,6 +215,12 @@ To profile your code for speed, build with :ref:`profiling info then run the code in the browser's devtools profiler. You should then be able to see in which functions is most of the time spent. +TODO: IIUC --profiling is the same as g2 (names+whitespace), but --profiling-funcs is names +only, while g1 is whitespace only. Is it really necessary to have both of these (i.e is +there any use for wasm names without JS whitespace?) +Can we just deprecate the profiling flags and recommend -g2 for profiling +(and maybe have --profiling be a legacy alias for -g2 --minify=0?) + Memory ------ diff --git a/site/source/docs/tools_reference/emcc.rst b/site/source/docs/tools_reference/emcc.rst index b423b90774bfd..f6cbd10d01bcf 100644 --- a/site/source/docs/tools_reference/emcc.rst +++ b/site/source/docs/tools_reference/emcc.rst @@ -173,9 +173,11 @@ Options that are modified or new in *emcc* are listed below: .. _emcc-gsource-map: ``-gsource-map[=inline]`` - [link] + [compile+link] + [same as -g3 if passed at compile time, otherwise applies at link] Generate a source map using LLVM debug information (which must - be present in object files, i.e., they should have been compiled with ``-g``). + be present in object files, i.e., they should have been compiled with ``-g`` + or ``-gsource-map``). When this option is provided, the **.wasm** file is updated to have a ``sourceMappingURL`` section. The resulting URL will have format: @@ -219,7 +221,8 @@ Options that are modified or new in *emcc* are listed below: ``--profiling`` [same as -g2 if passed at compile time, otherwise applies at link] - Use reasonable defaults when emitting JavaScript to make the build readable but still useful for profiling. This sets ``-g2`` (preserve whitespace and function names) and may also enable optimizations that affect performance and otherwise might not be performed in ``-g2``. + Use reasonable defaults when emitting JavaScript to make the build readable but still useful for profiling. This sets ``-g2`` (preserve whitespace and function names) and may also enable optimizations that affect performance and otherwise might not be performed in ``-g2``. TODO: does g2 actually suppress any optimizations? + .. _emcc-profiling-funcs: diff --git a/test/test_other.py b/test/test_other.py index c937a8ab8d5c1..efacba328ca53 100644 --- a/test/test_other.py +++ b/test/test_other.py @@ -9264,6 +9264,9 @@ def test_binaryen_debug(self): (['-O2', '-g'], False, True, False), (['-O2', '--closure=1'], True, False, True), (['-O2', '--closure=1', '-g1'], True, True, True), + (['-O2', '--minify=0'], False, True, False), + (['-O2', '--profiling-funcs'], True, False, False), + (['-O2', '--profiling'], False, True, False), ]: print(args, expect_clean_js, expect_whitespace_js, expect_closured) delete_file('a.out.wat') From 804f405daa8b5f98f18efacb853e17552d3513bc Mon Sep 17 00:00:00 2001 From: Derek Schuff Date: Fri, 22 Aug 2025 16:55:57 +0000 Subject: [PATCH 04/12] more tweaks --- site/source/docs/porting/Debugging.rst | 9 +++++---- site/source/docs/tools_reference/emcc.rst | 10 ++++++---- 2 files changed, 11 insertions(+), 8 deletions(-) diff --git a/site/source/docs/porting/Debugging.rst b/site/source/docs/porting/Debugging.rst index 0007940e7f80f..daf8e24e4e9a8 100644 --- a/site/source/docs/porting/Debugging.rst +++ b/site/source/docs/porting/Debugging.rst @@ -10,7 +10,7 @@ This article describes the main tools and settings provided by Emscripten for de Overview: Emitting and Controlling Debug Information -========================================== +==================================================== Debugging-related information comes in several forms: in Wasm object and binary files (DWARF sections, Wasm name section), side output files (source maps, symbol maps, DWARF sidecar and package files), and even in the code itself (assertions and instrumentation, whitespace). @@ -22,10 +22,11 @@ developer tools. (TODO more info?). Source maps are also supported (see :ref:`be This document contains an overview of the flags used to emit and control debugging behavior, and use-case-based examples. -Flags that cause DWARF generation (e.g. `-g3`, `-gline-tables-only`) also generate a name section + +Flags that cause DWARF generation (e.g. ``-g3``, ``-gline-tables-only``) also generate a name section in the binary and suppress minification of the JS glue file (since most DWARF use cases are for interactive debugging or where the binary will be stripped). -Other flags (e.g. `-g2`, `-gsource-map`) should affect only a specific behavior or type of debug info, +Other flags (e.g. ``-g2``, ``-gsource-map``) should affect only a specific behavior or type of debug info, and are generally composable. @@ -58,7 +59,7 @@ For example ``emcc`` strips out most of the debug information after linking if a flag is not provided at link time, even if the input object files contain DWARF. DWARF can be produced at compile time with the *emcc* :ref:`-g flag `. Optimization levels above -:ref:`-O1 ` or :ref:`-Og ` increasingly remove LLVM debug information (as with other architectures), +:ref:`-O1 ` or :ref:`-Og ` increasingly degrade LLVM debug information (as with other architectures), and optimization flags at link time also disable Emscripten's runtime :ref:`ASSERTIONS ` checks. Passing a ``-g`` flag at link time also affects the generated JavaScript code (preserving white-space, function names, and variable names). diff --git a/site/source/docs/tools_reference/emcc.rst b/site/source/docs/tools_reference/emcc.rst index f6cbd10d01bcf..648c3fa745c03 100644 --- a/site/source/docs/tools_reference/emcc.rst +++ b/site/source/docs/tools_reference/emcc.rst @@ -195,7 +195,9 @@ Options that are modified or new in *emcc* are listed below: ``-g`` [compile+link] - Controls the level of debuggability. Each level builds on the previous one: + If used at compile time, adds progressively more DWARF information to the object file, + according to the underlying behavior of clang. + If used at link time, controls the level of debuggability overall. Each level builds on the previous one: - .. _emcc-g0: @@ -205,17 +207,17 @@ Options that are modified or new in *emcc* are listed below: - .. _emcc-g1: - ``-g1``: When linking, preserve whitespace in JavaScript. + ``-g1``: Preserve whitespace in JavaScript. - .. _emcc-g2: - ``-g2``: When linking, preserve function names in compiled code. + ``-g2``: Also preserve function names in compiled code (via the wasm name section). - .. _emcc-g3: - ``-g3``: When compiling to object files, keep debug info, including JS whitespace, function names, and LLVM debug info (DWARF) if any (this is the same as :ref:`-g `). + ``-g3``: Also keep LLVM debug info (DWARF) if there is any in the object files (this is the same as :ref:`-g `). .. _emcc-profiling: From bd13b538924759e34bee78f5e5287108889f5771 Mon Sep 17 00:00:00 2001 From: Derek Schuff Date: Fri, 22 Aug 2025 17:01:46 +0000 Subject: [PATCH 05/12] update emcc.txt --- docs/emcc.txt | 77 +++++++++++++++++++++++++++------------------------ 1 file changed, 41 insertions(+), 36 deletions(-) diff --git a/docs/emcc.txt b/docs/emcc.txt index 520a56f55f9bf..61de243c06225 100644 --- a/docs/emcc.txt +++ b/docs/emcc.txt @@ -49,8 +49,8 @@ Options that are modified or new in *emcc* are listed below: "-O1" [compile+link] Simple optimizations. During the compile step these - include LLVM "-O1" optimizations. During the link step this does - not include various runtime assertions in JS that *-O0* would do. + include LLVM "-O1" optimizations. During the link step this omits + various runtime assertions in JS that *-O0* would include. "-O2" [compile+link] Like "-O1", but enables more optimizations. During @@ -61,14 +61,14 @@ Options that are modified or new in *emcc* are listed below: These JavaScript optimizations can reduce code size by removing things that the compiler does not see being used, in particular, parts of the runtime may be stripped if they are not exported on - the "Module" object. The compiler is aware of code in --pre-js - and --post-js, so you can safely use the runtime from there. + the "Module" object. The compiler is aware of code in –pre-js and + –post-js, so you can safely use the runtime from there. Alternatively, you can use "EXPORTED_RUNTIME_METHODS", see src/settings.js. "-O3" [compile+link] Like "-O2", but with additional optimizations that - may take longer to run. + may take longer to run and may increase code size. Note: @@ -127,7 +127,7 @@ Options that are modified or new in *emcc* are listed below: Note: For lists that include brackets or quote, you need quotation - marks (") around the list in most shells (to avoid errors being + marks (”) around the list in most shells (to avoid errors being raised). Two examples are shown below: -sEXPORTED_FUNCTIONS="['liblib.so']" @@ -154,7 +154,7 @@ Options that are modified or new in *emcc* are listed below: Options can be specified as a single argument with or without a space between the "-s" and option name. e.g. "-sFOO" or "-s - FOO". It's highly recommended you use the notation without space. + FOO". It’s highly recommended you use the notation without space. "-g" [compile+link] Preserve debug information. @@ -182,36 +182,39 @@ Options that are modified or new in *emcc* are listed below: with "-c". "-gsource-map[=inline]" - [link] Generate a source map using LLVM debug information (which - must be present in object files, i.e., they should have been - compiled with "-g"). + [compile+link] [same as -g3 if passed at compile time, otherwise + applies at link] Generate a source map using LLVM debug information + (which must be present in object files, i.e., they should have been + compiled with "-g" or "-gsource-map"). When this option is provided, the **.wasm** file is updated to have a "sourceMappingURL" section. The resulting URL will have format: "" + "" + ".map". "" defaults to being empty (which means the source map is served from the same - directory as the Wasm file). It can be changed using --source-map- + directory as the Wasm file). It can be changed using –source-map- base. Path substitution can be applied to the referenced sources using the "-sSOURCE_MAP_PREFIXES" (link). If "inline" is specified, the sources content is embedded in the source map (in this case you - don't need path substitution, but it comes with the cost of having + don’t need path substitution, but it comes with the cost of having a large source map file). "-g" - [compile+link] Controls the level of debuggability. Each level - builds on the previous one: + [compile+link] If used at compile time, adds progressively more + DWARF information to the object file, according to the underlying + behavior of clang. If used at link time, controls the level of + debuggability overall. Each level builds on the previous one: * "-g0": Make no effort to keep code debuggable. - * "-g1": When linking, preserve whitespace in JavaScript. + * "-g1": Preserve whitespace in JavaScript. - * "-g2": When linking, preserve function names in compiled code. + * "-g2": Also preserve function names in compiled code (via the + wasm name section). - * "-g3": When compiling to object files, keep debug info, - including JS whitespace, function names, and LLVM debug info - (DWARF) if any (this is the same as -g). + * "-g3": Also keep LLVM debug info (DWARF) if there is any in + the object files (this is the same as -g). "--profiling" [same as -g2 if passed at compile time, otherwise applies at link] @@ -219,7 +222,7 @@ Options that are modified or new in *emcc* are listed below: readable but still useful for profiling. This sets "-g2" (preserve whitespace and function names) and may also enable optimizations that affect performance and otherwise might not be performed in - "-g2". + "-g2". TODO: does g2 actually suppress any optimizations? "--profiling-funcs" [link] Preserve function names in profiling, but otherwise minify @@ -240,7 +243,9 @@ Options that are modified or new in *emcc* are listed below: [link] Save a map file between function indexes in the Wasm and function names. By storing the names on a file on the side, you can avoid shipping the names, and can still reconstruct meaningful - stack traces by translating the indexes back to the names. + stack traces by translating the indexes back to the names. This is + a simpler format than source maps, but less detailed because it + only describes function names and not source locations. Note: @@ -309,8 +314,8 @@ Options that are modified or new in *emcc* are listed below: "--pre-js" + "--post-js" to put all the output in an inner function scope (see "MODULARIZE" for that). - *--pre-js* (but not *--post-js*) is also useful for specifying - things on the "Module" object, as it appears before the JS looks at + *–pre-js* (but not *–post-js*) is also useful for specifying things + on the "Module" object, as it appears before the JS looks at "Module" (for example, you can define "Module['print']" there). "--post-js " @@ -360,7 +365,7 @@ Options that are modified or new in *emcc* are listed below: Note: - This option is similar to --embed-file, except that it is only + This option is similar to –embed-file, except that it is only relevant when generating HTML (it uses asynchronous binary *XHRs*), or JavaScript that will be used in a web page. @@ -374,13 +379,13 @@ Options that are modified or new in *emcc* are listed below: Packaging Files. "--exclude-file " - [link] Files and directories to be excluded from --embed-file and - --preload-file. Wildcards (*) are supported. + [link] Files and directories to be excluded from –embed-file and + –preload-file. Wildcards (*) are supported. "--use-preload-plugins" [link] Tells the file packager to run preload plugins on the files as they are loaded. This performs tasks like decoding images and - audio using the browser's codecs. + audio using the browser’s codecs. "--shell-file " [link] The path name to a skeleton HTML file used when generating @@ -443,7 +448,7 @@ Options that are modified or new in *emcc* are listed below: "--js-library " [link] A JavaScript library to use in addition to those in - Emscripten's core libraries (src/library_*). + Emscripten’s core libraries (src/library_*). "-v" [general] Turns on verbose output. @@ -457,7 +462,7 @@ Options that are modified or new in *emcc* are listed below: or without other arguments. "--check" - [general] Runs Emscripten's internal sanity checks and reports any + [general] Runs Emscripten’s internal sanity checks and reports any issues with the current configuration. "--cache " @@ -522,7 +527,7 @@ Options that are modified or new in *emcc* are listed below: file, and a separate **.js** file containing the JavaScript to be run in a worker. If emitting JavaScript, the target file name contains the part to be run on the main thread, while a second - **.js** file with suffix ".worker.js" will contain the worker + **.js** file with suffix “.worker.js” will contain the worker portion. "--emrun" @@ -549,7 +554,7 @@ Options that are modified or new in *emcc* are listed below: [general] Specifies the location of the **.emscripten** configuration file. If not specified emscripten will search for ".emscripten" first in the emscripten directory itself, and then in - the user's home directory ("~/.emscripten"). This can be overridden + the user’s home directory ("~/.emscripten"). This can be overridden using the "EM_CONFIG" environment variable. "--valid-abspath " @@ -573,7 +578,7 @@ Options that are modified or new in *emcc* are listed below: WebAssembly). * **.wasm** : WebAssembly without JavaScript support code - ("standalone Wasm"; this enables "STANDALONE_WASM"). + (“standalone Wasm”; this enables "STANDALONE_WASM"). These rules only apply when linking. When compiling to object code (See *-c* below) the name of the output file is irrelevant. @@ -584,9 +589,9 @@ Options that are modified or new in *emcc* are listed below: "--output-eol windows|linux" [link] Specifies the line ending to generate for the text files - that are outputted. If "--output-eol windows" is passed, the final - output files will have Windows "\r\n" line endings in them. With " - --output-eol linux", the final generated files will be written with + that are outputted. If “–output-eol windows” is passed, the final + output files will have Windows "\r\n" line endings in them. With + “–output-eol linux”, the final generated files will be written with Unix "\n" line endings. "--cflags" @@ -637,7 +642,7 @@ Environment variables * "_EMCC_CCACHE" [general] Internal setting that is set to 1 by emsdk when integrating with ccache compiler frontend -Search for 'os.environ' in emcc.py to see how these are used. The most +Search for ‘os.environ’ in emcc.py to see how these are used. The most interesting is possibly "EMCC_DEBUG", which forces the compiler to dump its build and temporary files to a temporary directory where they can be reviewed. From 23ab64558085b302ca7b9be931141780dc148e5e Mon Sep 17 00:00:00 2001 From: Derek Schuff Date: Fri, 22 Aug 2025 18:17:00 +0000 Subject: [PATCH 06/12] add note --- site/source/docs/porting/Debugging.rst | 3 +++ 1 file changed, 3 insertions(+) diff --git a/site/source/docs/porting/Debugging.rst b/site/source/docs/porting/Debugging.rst index daf8e24e4e9a8..5d7a947cbac68 100644 --- a/site/source/docs/porting/Debugging.rst +++ b/site/source/docs/porting/Debugging.rst @@ -22,6 +22,9 @@ developer tools. (TODO more info?). Source maps are also supported (see :ref:`be This document contains an overview of the flags used to emit and control debugging behavior, and use-case-based examples. +Unlike traditional Unix-style C toolchains, flags must be passed at link time to preserve or generate +debug information. The most common of these are the :ref:`-g flags `; see the flag +documentation or the use cases below for more detail. Flags that cause DWARF generation (e.g. ``-g3``, ``-gline-tables-only``) also generate a name section in the binary and suppress minification of the JS glue file (since most DWARF use cases are for From a2c6ec4a278b135ddd24bc7df7f99c2a8b51a7b7 Mon Sep 17 00:00:00 2001 From: Derek Schuff Date: Fri, 29 Aug 2025 22:33:34 +0000 Subject: [PATCH 07/12] tweak, fix some todos --- site/source/docs/porting/Debugging.rst | 48 +++++++++++++++----------- 1 file changed, 27 insertions(+), 21 deletions(-) diff --git a/site/source/docs/porting/Debugging.rst b/site/source/docs/porting/Debugging.rst index 5d7a947cbac68..38cebf3a77d82 100644 --- a/site/source/docs/porting/Debugging.rst +++ b/site/source/docs/porting/Debugging.rst @@ -11,13 +11,17 @@ This article describes the main tools and settings provided by Emscripten for de Overview: Emitting and Controlling Debug Information ==================================================== -Debugging-related information comes in several forms: in Wasm object and binary files (DWARF -sections, Wasm name section), side output files (source maps, symbol maps, DWARF sidecar and package files), -and even in the code itself (assertions and instrumentation, whitespace). +Debugging-related information comes in several forms: in Wasm object and binary files (as DWARF +sections or Wasm name section), side output files (as source maps, symbol maps, or DWARF sidecar or package files), +and even in the code itself (as assertions or instrumentation, or JS whitespace and comments). For information on DWARF, see :ref:`below `. -In addition to DWARF, wasm files may contain a name section (TODO link) which includes names for each -function; these function names are displayed by browsers when they generate stack traces and in -developer tools. (TODO more info?). Source maps are also supported (see :ref:`below `). +In addition to DWARF, wasm files may contain a +`name section `_ +which includes names for each +function; these function names are displayed by browsers when they generate +`stack traces `_ and in +developer tools. Source maps are also supported by Emscripten and by browser +DevTools (see :ref:`below `). This document contains an overview of the flags used to emit and control debugging behavior, and use-case-based examples. @@ -30,7 +34,7 @@ Flags that cause DWARF generation (e.g. ``-g3``, ``-gline-tables-only``) also ge in the binary and suppress minification of the JS glue file (since most DWARF use cases are for interactive debugging or where the binary will be stripped). Other flags (e.g. ``-g2``, ``-gsource-map``) should affect only a specific behavior or type of debug info, -and are generally composable. +and are generally composable. TODO: make this real, or change the text. @@ -120,7 +124,8 @@ Source maps are easier to parse and more widely supported by ecosystem tooling. above, preserving DWARF inhibits some Binaryen optimizations. However DWARF has the advantage that it includes information about inlining, which can result in more accurate stack traces. -(TODO: -g1 at compile time on native generates DWARF but not for emscripten) +(TODO: passing -g1 at compile time on native platforms generates (a reduced amount of) DWARF +but this doesn't work for emscripten). Emscripten includes a tool called ``emsymbolizer`` that can map wasm code addresses to sources using several different kinds of debug info, including DWARF (in wasm object or linked files) @@ -143,7 +148,7 @@ Fast Edit+Compile with minimal debug information When you want the fastest builds, you generally want to avoid generating large debug information during compile, because it takes time to link into the final binary. It is still worthwhile to use -the ``--profiling`` (TODO gnames/g2?) +the ``-g2`` flag (at link time only) because browsers understand the name section even when devtools are not in use, resulting in more useful stack traces at minimal cost. @@ -154,6 +159,11 @@ Example: emcc source.c -c -o source.o # source.o has no debug info emcc source.o -o program.js -g2 # program.wasm has a name section, program.js is unminified +Sometimes the use of the ``-O1`` or ``-Og`` flag at compile time can also result in faster +builds, because optimizations early in the pipeline can reduce the amount of IR that is +processed by later phases such as instruction selection and linking. It also of course +reduces test runtime. + Detecting Memory Errors and Undefined Behavior ============================================== @@ -216,14 +226,13 @@ Speed ----- To profile your code for speed, build with :ref:`profiling info `, -then run the code in the browser's devtools profiler. You should then be able to -see in which functions is most of the time spent. +(which is currently the same as `:ref`-g2 `), and then run the code in the browser's +devtools profiler. You should then be able to see in which functions most of the time is spent. TODO: IIUC --profiling is the same as g2 (names+whitespace), but --profiling-funcs is names only, while g1 is whitespace only. Is it really necessary to have both of these (i.e is there any use for wasm names without JS whitespace?) -Can we just deprecate the profiling flags and recommend -g2 for profiling -(and maybe have --profiling be a legacy alias for -g2 --minify=0?) +Is -g1 the same as --minify=0? Memory ------ @@ -293,18 +302,15 @@ Memory Alignment Issues ----------------------- The :ref:`Emscripten memory representation ` is compatible with C and C++. -However, when undefined behavior is involved you may see differences with native architectures: - -- In asm.js, unaligned loads and stores can fail silently (i.e. access the wrong address). -- In WebAssembly, unaligned loads and stores will work; each may be annotated with its expected - alignment. If the actual alignment does not match, it may be very slow on some systems. +In WebAssembly, unaligned loads and stores will work; each may be annotated with its expected +alignment. However if the actual alignment does not match, it may be very slow on some systems. .. tip:: :ref:`SAFE_HEAP ` can be used to reveal memory alignment issues. -Generally it is best to avoid unaligned reads and writesoften they occur as the result of -undefined behavior, as mentioned above. In some cases, however, they are unavoidable — for example +Generally it is best to avoid unaligned reads and writes. Often they occur as the result of +undefined behavior. In some cases, however, they are unavoidable — for example if the code to be ported reads an ``int`` from a packed structure in some pre-existing data format. -In that case, to make things work properly in asm.js, and be fast in WebAssembly, you must be sure +In that case, to as fast as possible in WebAssembly, you can make sure that the compiler knows the load or store is unaligned. To do so you can: - Manually read individual bytes and reconstruct the full value From 69ed6de5060859d85771ee8f660a61d903a28b76 Mon Sep 17 00:00:00 2001 From: Derek Schuff Date: Fri, 5 Sep 2025 23:55:16 +0000 Subject: [PATCH 08/12] resolve some TODOS --- docs/emcc.txt | 2 +- site/source/docs/porting/Debugging.rst | 10 ++-------- tools/link.py | 2 +- 3 files changed, 4 insertions(+), 10 deletions(-) diff --git a/docs/emcc.txt b/docs/emcc.txt index 61de243c06225..f076afc38da15 100644 --- a/docs/emcc.txt +++ b/docs/emcc.txt @@ -222,7 +222,7 @@ Options that are modified or new in *emcc* are listed below: readable but still useful for profiling. This sets "-g2" (preserve whitespace and function names) and may also enable optimizations that affect performance and otherwise might not be performed in - "-g2". TODO: does g2 actually suppress any optimizations? + "-g2". "--profiling-funcs" [link] Preserve function names in profiling, but otherwise minify diff --git a/site/source/docs/porting/Debugging.rst b/site/source/docs/porting/Debugging.rst index 38cebf3a77d82..bda834d99e584 100644 --- a/site/source/docs/porting/Debugging.rst +++ b/site/source/docs/porting/Debugging.rst @@ -34,7 +34,7 @@ Flags that cause DWARF generation (e.g. ``-g3``, ``-gline-tables-only``) also ge in the binary and suppress minification of the JS glue file (since most DWARF use cases are for interactive debugging or where the binary will be stripped). Other flags (e.g. ``-g2``, ``-gsource-map``) should affect only a specific behavior or type of debug info, -and are generally composable. TODO: make this real, or change the text. +and are generally composable. @@ -124,9 +124,6 @@ Source maps are easier to parse and more widely supported by ecosystem tooling. above, preserving DWARF inhibits some Binaryen optimizations. However DWARF has the advantage that it includes information about inlining, which can result in more accurate stack traces. -(TODO: passing -g1 at compile time on native platforms generates (a reduced amount of) DWARF -but this doesn't work for emscripten). - Emscripten includes a tool called ``emsymbolizer`` that can map wasm code addresses to sources using several different kinds of debug info, including DWARF (in wasm object or linked files) and source maps for line/column info, and symbol maps (see :ref:`emcc-emit-symbol-map`), @@ -229,10 +226,7 @@ To profile your code for speed, build with :ref:`profiling info (which is currently the same as `:ref`-g2 `), and then run the code in the browser's devtools profiler. You should then be able to see in which functions most of the time is spent. -TODO: IIUC --profiling is the same as g2 (names+whitespace), but --profiling-funcs is names -only, while g1 is whitespace only. Is it really necessary to have both of these (i.e is -there any use for wasm names without JS whitespace?) -Is -g1 the same as --minify=0? +TODO: -g1 is not the same as --minify=0. it's closer to g2 but not exactly. Memory ------ diff --git a/tools/link.py b/tools/link.py index ccaac6bae8106..a7a607d8e1378 100644 --- a/tools/link.py +++ b/tools/link.py @@ -1868,7 +1868,7 @@ def get_full_import_name(name): settings.PRE_JS_FILES = options.pre_js settings.POST_JS_FILES = options.post_js - settings.MINIFY_WHITESPACE = settings.OPT_LEVEL >= 2 and settings.DEBUG_LEVEL == 0 and not options.no_minify + settings.MINIFY_WHITESPACE = settings.OPT_LEVEL >= 2 and settings.DEBUG_LEVEL == 0 and not options.no_minify #shouldn't no_minify be equivalent to g1? # Closure might be run if we run it ourselves, or if whitespace is not being # minifed. In the latter case we keep both whitespace and comments, and the From 9a5ce4a5ba625a77ed831fe1c4f3dc68d548d89a Mon Sep 17 00:00:00 2001 From: Derek Schuff Date: Sat, 6 Sep 2025 00:08:19 +0000 Subject: [PATCH 09/12] update emcc.rst --- site/source/docs/tools_reference/emcc.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/site/source/docs/tools_reference/emcc.rst b/site/source/docs/tools_reference/emcc.rst index 648c3fa745c03..af3ab3a9317e2 100644 --- a/site/source/docs/tools_reference/emcc.rst +++ b/site/source/docs/tools_reference/emcc.rst @@ -223,7 +223,7 @@ Options that are modified or new in *emcc* are listed below: ``--profiling`` [same as -g2 if passed at compile time, otherwise applies at link] - Use reasonable defaults when emitting JavaScript to make the build readable but still useful for profiling. This sets ``-g2`` (preserve whitespace and function names) and may also enable optimizations that affect performance and otherwise might not be performed in ``-g2``. TODO: does g2 actually suppress any optimizations? + Use reasonable defaults when emitting JavaScript to make the build readable but still useful for profiling. This sets ``-g2`` (preserve whitespace and function names) and may also enable optimizations that affect performance and otherwise might not be performed in ``-g2``. .. _emcc-profiling-funcs: From 0da8473774405b5154ea570835c0315720eb4906 Mon Sep 17 00:00:00 2001 From: Derek Schuff Date: Mon, 8 Sep 2025 21:55:02 +0000 Subject: [PATCH 10/12] tweaks, remove TODOs --- docs/emcc.txt | 42 +++++++++++++------------- site/source/docs/porting/Debugging.rst | 13 +++++--- 2 files changed, 30 insertions(+), 25 deletions(-) diff --git a/docs/emcc.txt b/docs/emcc.txt index f076afc38da15..697e49b5bad81 100644 --- a/docs/emcc.txt +++ b/docs/emcc.txt @@ -61,8 +61,8 @@ Options that are modified or new in *emcc* are listed below: These JavaScript optimizations can reduce code size by removing things that the compiler does not see being used, in particular, parts of the runtime may be stripped if they are not exported on - the "Module" object. The compiler is aware of code in –pre-js and - –post-js, so you can safely use the runtime from there. + the "Module" object. The compiler is aware of code in --pre-js + and --post-js, so you can safely use the runtime from there. Alternatively, you can use "EXPORTED_RUNTIME_METHODS", see src/settings.js. @@ -127,7 +127,7 @@ Options that are modified or new in *emcc* are listed below: Note: For lists that include brackets or quote, you need quotation - marks (”) around the list in most shells (to avoid errors being + marks (") around the list in most shells (to avoid errors being raised). Two examples are shown below: -sEXPORTED_FUNCTIONS="['liblib.so']" @@ -154,7 +154,7 @@ Options that are modified or new in *emcc* are listed below: Options can be specified as a single argument with or without a space between the "-s" and option name. e.g. "-sFOO" or "-s - FOO". It’s highly recommended you use the notation without space. + FOO". It's highly recommended you use the notation without space. "-g" [compile+link] Preserve debug information. @@ -191,13 +191,13 @@ Options that are modified or new in *emcc* are listed below: a "sourceMappingURL" section. The resulting URL will have format: "" + "" + ".map". "" defaults to being empty (which means the source map is served from the same - directory as the Wasm file). It can be changed using –source-map- + directory as the Wasm file). It can be changed using --source-map- base. Path substitution can be applied to the referenced sources using the "-sSOURCE_MAP_PREFIXES" (link). If "inline" is specified, the sources content is embedded in the source map (in this case you - don’t need path substitution, but it comes with the cost of having + don't need path substitution, but it comes with the cost of having a large source map file). "-g" @@ -314,8 +314,8 @@ Options that are modified or new in *emcc* are listed below: "--pre-js" + "--post-js" to put all the output in an inner function scope (see "MODULARIZE" for that). - *–pre-js* (but not *–post-js*) is also useful for specifying things - on the "Module" object, as it appears before the JS looks at + *--pre-js* (but not *--post-js*) is also useful for specifying + things on the "Module" object, as it appears before the JS looks at "Module" (for example, you can define "Module['print']" there). "--post-js " @@ -365,7 +365,7 @@ Options that are modified or new in *emcc* are listed below: Note: - This option is similar to –embed-file, except that it is only + This option is similar to --embed-file, except that it is only relevant when generating HTML (it uses asynchronous binary *XHRs*), or JavaScript that will be used in a web page. @@ -379,13 +379,13 @@ Options that are modified or new in *emcc* are listed below: Packaging Files. "--exclude-file " - [link] Files and directories to be excluded from –embed-file and - –preload-file. Wildcards (*) are supported. + [link] Files and directories to be excluded from --embed-file and + --preload-file. Wildcards (*) are supported. "--use-preload-plugins" [link] Tells the file packager to run preload plugins on the files as they are loaded. This performs tasks like decoding images and - audio using the browser’s codecs. + audio using the browser's codecs. "--shell-file " [link] The path name to a skeleton HTML file used when generating @@ -448,7 +448,7 @@ Options that are modified or new in *emcc* are listed below: "--js-library " [link] A JavaScript library to use in addition to those in - Emscripten’s core libraries (src/library_*). + Emscripten's core libraries (src/library_*). "-v" [general] Turns on verbose output. @@ -462,7 +462,7 @@ Options that are modified or new in *emcc* are listed below: or without other arguments. "--check" - [general] Runs Emscripten’s internal sanity checks and reports any + [general] Runs Emscripten's internal sanity checks and reports any issues with the current configuration. "--cache " @@ -527,7 +527,7 @@ Options that are modified or new in *emcc* are listed below: file, and a separate **.js** file containing the JavaScript to be run in a worker. If emitting JavaScript, the target file name contains the part to be run on the main thread, while a second - **.js** file with suffix “.worker.js” will contain the worker + **.js** file with suffix ".worker.js" will contain the worker portion. "--emrun" @@ -554,7 +554,7 @@ Options that are modified or new in *emcc* are listed below: [general] Specifies the location of the **.emscripten** configuration file. If not specified emscripten will search for ".emscripten" first in the emscripten directory itself, and then in - the user’s home directory ("~/.emscripten"). This can be overridden + the user's home directory ("~/.emscripten"). This can be overridden using the "EM_CONFIG" environment variable. "--valid-abspath " @@ -578,7 +578,7 @@ Options that are modified or new in *emcc* are listed below: WebAssembly). * **.wasm** : WebAssembly without JavaScript support code - (“standalone Wasm”; this enables "STANDALONE_WASM"). + ("standalone Wasm"; this enables "STANDALONE_WASM"). These rules only apply when linking. When compiling to object code (See *-c* below) the name of the output file is irrelevant. @@ -589,9 +589,9 @@ Options that are modified or new in *emcc* are listed below: "--output-eol windows|linux" [link] Specifies the line ending to generate for the text files - that are outputted. If “–output-eol windows” is passed, the final - output files will have Windows "\r\n" line endings in them. With - “–output-eol linux”, the final generated files will be written with + that are outputted. If "--output-eol windows" is passed, the final + output files will have Windows "\r\n" line endings in them. With " + --output-eol linux", the final generated files will be written with Unix "\n" line endings. "--cflags" @@ -642,7 +642,7 @@ Environment variables * "_EMCC_CCACHE" [general] Internal setting that is set to 1 by emsdk when integrating with ccache compiler frontend -Search for ‘os.environ’ in emcc.py to see how these are used. The most +Search for 'os.environ' in emcc.py to see how these are used. The most interesting is possibly "EMCC_DEBUG", which forces the compiler to dump its build and temporary files to a temporary directory where they can be reviewed. diff --git a/site/source/docs/porting/Debugging.rst b/site/source/docs/porting/Debugging.rst index bda834d99e584..9b3f3fb46e49f 100644 --- a/site/source/docs/porting/Debugging.rst +++ b/site/source/docs/porting/Debugging.rst @@ -37,6 +37,7 @@ Other flags (e.g. ``-g2``, ``-gsource-map``) should affect only a specific behav and are generally composable. +.. _debugging-interactive: Interactive, Source-Level Debugging ============================================= @@ -161,17 +162,18 @@ builds, because optimizations early in the pipeline can reduce the amount of IR processed by later phases such as instruction selection and linking. It also of course reduces test runtime. +.. _debugging-memory-safety: Detecting Memory Errors and Undefined Behavior ============================================== The best tools for detecting memory safety and undefined behavior issues. are Clang's sanitizers, -such as the Undefined Behaviour Sanitizer (UBSan) and the Address Sanitizer (ASan). +such as the Undefined Behavior Sanitizer (UBSan) and the Address Sanitizer (ASan). For more information, see :ref:`Sanitizers`. Emscripten has several other compiler settings that can be useful for catching errors at runtime. -These are set using the :ref:`emcc -s` option, and will override any optimization flags (TODO is this true?). For example: +These are set using the :ref:`emcc -s` option. For example: .. code-block:: bash @@ -192,7 +194,10 @@ Some important settings are: ``SAFE_HEAP=1`` adds additional memory access checks with a Binaryen pass, and will give clear errors for problems like dereferencing 0 and memory alignment issues. - You can also set ``SAFE_HEAP_LOG`` to log ``SAFE_HEAP`` operations. (TODO: any advantages over ASan?) + You can also set ``SAFE_HEAP_LOG`` to log ``SAFE_HEAP`` operations. :ref:`ASan` + provides most of the functionality of this pass (plus some extras) and is generally preferred to + try first unless :ref:`alginment issues` + are important for your platform. - .. _debugging-STACK_OVERFLOW_CHECK: @@ -223,7 +228,7 @@ Speed ----- To profile your code for speed, build with :ref:`profiling info `, -(which is currently the same as `:ref`-g2 `), and then run the code in the browser's +(which is currently the same as :ref:`-g2 `), and then run the code in the browser's devtools profiler. You should then be able to see in which functions most of the time is spent. TODO: -g1 is not the same as --minify=0. it's closer to g2 but not exactly. From 25c015649bf6f0f2d7589fafefda57da4a00a759 Mon Sep 17 00:00:00 2001 From: Derek Schuff Date: Wed, 10 Sep 2025 14:17:49 -0700 Subject: [PATCH 11/12] Apply suggestions from code review Co-authored-by: Alon Zakai --- site/source/docs/porting/Debugging.rst | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/site/source/docs/porting/Debugging.rst b/site/source/docs/porting/Debugging.rst index 9b3f3fb46e49f..8bccceec20ae6 100644 --- a/site/source/docs/porting/Debugging.rst +++ b/site/source/docs/porting/Debugging.rst @@ -27,7 +27,7 @@ This document contains an overview of the flags used to emit and control debuggi use-case-based examples. Unlike traditional Unix-style C toolchains, flags must be passed at link time to preserve or generate -debug information. The most common of these are the :ref:`-g flags `; see the flag +debug information (these defaults aim to avoid unintended bloat in production builds). The most common of these are the :ref:`-g flags `; see the flag documentation or the use cases below for more detail. Flags that cause DWARF generation (e.g. ``-g3``, ``-gline-tables-only``) also generate a name section @@ -69,7 +69,7 @@ flag is not provided at link time, even if the input object files contain DWARF. DWARF can be produced at compile time with the *emcc* :ref:`-g flag `. Optimization levels above :ref:`-O1 ` or :ref:`-Og ` increasingly degrade LLVM debug information (as with other architectures), and optimization flags at link time also disable Emscripten's runtime :ref:`ASSERTIONS ` checks. -Passing a ``-g`` flag at link time also affects the generated JavaScript code (preserving white-space, function names, and variable names). +Passing a ``-g`` flag at link time also affects the generated JavaScript code (preserving white-space, function names, and variable names, which makes the JavaScript debuggable). The ``-g`` flag can also be specified with integer levels: :ref:`-g0 `, :ref:`-g1 `, :ref:`-g2 `, and :ref:`-g3 ` (equivalent to ``-g``). At compile time these flags control the amount of DWARF in the object files. @@ -106,13 +106,13 @@ Even when not using an interactive debugger, it's valuable to have source inform code locations, particularly for stack traces or crash logs. This is also true for fully-optimized production builds. -`Source maps `_ are commonly used for langauges that compile +`Source maps `_ are commonly used for languages that compile to JavaScript (mapping locations in the compiled JS output to locations in the original source code), but WebAssembly is also supported. Emscripten can emit source maps with the :ref:`-gsource-map ` link-time flag. Source maps are preserved even with full post-link optimizations, so they work well for this use case. Source maps are generated by Emscripten from DWARF information. Therefore the linked object -files must have DWARF. The final linked output will not have DWARF unless `-g` is also passed. +files must have DWARF. The final linked output will not have DWARF unless `-g` is also passed at link time. DWARF can also be used for this purpose. Typically a binary containing DWARF would be generated at build time, and then stripped. The stripped copy would be served to users, and the original @@ -196,7 +196,7 @@ Some important settings are: errors for problems like dereferencing 0 and memory alignment issues. You can also set ``SAFE_HEAP_LOG`` to log ``SAFE_HEAP`` operations. :ref:`ASan` provides most of the functionality of this pass (plus some extras) and is generally preferred to - try first unless :ref:`alginment issues` + try first unless :ref:`alignment issues` are important for your platform. - @@ -227,7 +227,7 @@ Profiling Performance Speed ----- -To profile your code for speed, build with :ref:`profiling info `, +To profile your code for speed, build with :ref:`profiling info ` using ``--profiling``, (which is currently the same as :ref:`-g2 `), and then run the code in the browser's devtools profiler. You should then be able to see in which functions most of the time is spent. @@ -276,7 +276,7 @@ Manual print debugging You can also manually instrument the source code with ``printf()`` statements, then compile and run the code to investigate issues. The output from the `stdout` and `stderr` streams is copied to the browser console by default. Note that ``printf()`` is -line-buffered, make sure to add ``\n`` to see output in the console. The functions +line-buffered, so make sure to add ``\n`` to see output in the console. The functions in the :ref:`console.h ` header can also be used to access the console more directly. From 37546a74957931e521398c90718c6e36195f4c8c Mon Sep 17 00:00:00 2001 From: Derek Schuff Date: Wed, 10 Sep 2025 21:26:58 +0000 Subject: [PATCH 12/12] remove TODOs move example block --- site/source/docs/porting/Debugging.rst | 12 +++++------- tools/link.py | 2 +- 2 files changed, 6 insertions(+), 8 deletions(-) diff --git a/site/source/docs/porting/Debugging.rst b/site/source/docs/porting/Debugging.rst index 8bccceec20ae6..8890b8504f7b0 100644 --- a/site/source/docs/porting/Debugging.rst +++ b/site/source/docs/porting/Debugging.rst @@ -125,11 +125,6 @@ Source maps are easier to parse and more widely supported by ecosystem tooling. above, preserving DWARF inhibits some Binaryen optimizations. However DWARF has the advantage that it includes information about inlining, which can result in more accurate stack traces. -Emscripten includes a tool called ``emsymbolizer`` that can map wasm code addresses to sources -using several different kinds of debug info, including DWARF (in wasm object or linked files) -and source maps for line/column info, and symbol maps (see :ref:`emcc-emit-symbol-map`), -name sections and object file symbol tables for function names. - Examples: .. code-block:: bash @@ -140,6 +135,11 @@ Examples: emcc source.o -o program2.js -g # program2.wasm has DWARF llvm-strip program2.wasm -o program2_stripped.wasm # program2_stripped.wasm has no debug info +Emscripten includes a tool called ``emsymbolizer`` that can map wasm code addresses to sources +using several different kinds of debug info, including DWARF (in wasm object or linked files) +and source maps for line/column info, and symbol maps (see :ref:`emcc-emit-symbol-map`), +name sections and object file symbol tables for function names. + Fast Edit+Compile with minimal debug information ================================================ @@ -231,8 +231,6 @@ To profile your code for speed, build with :ref:`profiling info (which is currently the same as :ref:`-g2 `), and then run the code in the browser's devtools profiler. You should then be able to see in which functions most of the time is spent. -TODO: -g1 is not the same as --minify=0. it's closer to g2 but not exactly. - Memory ------ diff --git a/tools/link.py b/tools/link.py index a4373894b6e41..b033285047857 100644 --- a/tools/link.py +++ b/tools/link.py @@ -1878,7 +1878,7 @@ def get_full_import_name(name): settings.PRE_JS_FILES = options.pre_js settings.POST_JS_FILES = options.post_js - settings.MINIFY_WHITESPACE = settings.OPT_LEVEL >= 2 and settings.DEBUG_LEVEL == 0 and not options.no_minify #shouldn't no_minify be equivalent to g1? + settings.MINIFY_WHITESPACE = settings.OPT_LEVEL >= 2 and settings.DEBUG_LEVEL == 0 and not options.no_minify # Closure might be run if we run it ourselves, or if whitespace is not being # minifed. In the latter case we keep both whitespace and comments, and the