summaryrefslogtreecommitdiff
path: root/posts/jemalloc-hpa-reference/jemalloc-hpa-reference.md
diff options
context:
space:
mode:
authorDmitry Ilvokhin <d@ilvokhin.com>2025-03-23 12:48:39 +0000
committerDmitry Ilvokhin <d@ilvokhin.com>2025-03-23 12:55:22 +0000
commite614d4b21ad4a48e05525137c5fd9f850d4cbe65 (patch)
tree13602970fb9541f40aba03fe7c301e425caf6116 /posts/jemalloc-hpa-reference/jemalloc-hpa-reference.md
parent58b241a4df46f66e9d95a98645b23af71292a814 (diff)
downloadblog-e614d4b21ad4a48e05525137c5fd9f850d4cbe65.tar.gz
blog-e614d4b21ad4a48e05525137c5fd9f850d4cbe65.tar.bz2
blog-e614d4b21ad4a48e05525137c5fd9f850d4cbe65.zip
Add Jemalloc HPA Reference post
There is no available documentation about HPA options at the moment. Best documentation is a code itself, but it might require too much knowledge about jemalloc internals to follow it. This reference should provide a starting point of what can be tweaked and what meaning available options have.
Diffstat (limited to 'posts/jemalloc-hpa-reference/jemalloc-hpa-reference.md')
-rw-r--r--posts/jemalloc-hpa-reference/jemalloc-hpa-reference.md249
1 files changed, 249 insertions, 0 deletions
diff --git a/posts/jemalloc-hpa-reference/jemalloc-hpa-reference.md b/posts/jemalloc-hpa-reference/jemalloc-hpa-reference.md
new file mode 100644
index 0000000..0d5238d
--- /dev/null
+++ b/posts/jemalloc-hpa-reference/jemalloc-hpa-reference.md
@@ -0,0 +1,249 @@
+## Intro
+
+Jemalloc HPA is hugepages aware implementation of pages allocator. HPA
+leverages hugepages to reduce cost of TLB misses and thereby improve
+application performance.
+
+## Glossary
+
+### Pageslab
+
+Pageslab is hugepage aligned and sized memory range. You can think of it as a
+set of pages packed together into a hugepage. Pageslab is not necessary backed
+up by a hugepage.
+
+
+### Active Page
+
+Active page is a memory page that potentially stores application data.
+
+### Dirty Page
+
+Dirty page is a memory page that might have had application data on it in the
+past, but has no application data now. It can be reused (became active again)
+or returned back to the OS anytime.
+
+### Purge
+
+Purge is a process of returning dirty pages back to the OS.
+
+### Hugification
+
+Hugification is a request to the OS back pageslab by a hugepage.
+
+## Constants
+
+### `PAGE`
+
+`PAGE` is the size of a jemalloc page. By default it is `4096` bytes on x86\_64.
+
+### `HUGEPAGE`
+
+`HUGEPAGE` is the size of the OS hugepage. By default it is `2097152` bytes (2
+MiB) on x86\_64.
+
+### `HUGEPAGES_PAGES`
+
+Number of pages in a single hugepage: `HUGEPAGE / PAGE`. By default it is `512`
+on x86\_64.
+
+## Documentation
+
+HPA is under active development and options are not described in documentation
+yet. Below is a brief description of currently available options.
+
+### hpa
+
+HPA enabled/disabled.
+
+Master switch to enable HPA. This option should be enabled to make other
+options to work.
+
+Boolean. Default value: `false`.
+
+### hpa_slab_max_alloc
+
+Maximum allocation size in bytes allowed to be served from HPA page allocator.
+
+Allocations of greater size will be served from a more general (for now)
+classic page allocator (PAC), which can handle allocation requests of any size.
+Slab allocations will always be served out of HPA, even when the
+`hpa_slab_max_alloc` option is set to a small value like `PAGE` due to
+implementation quirks. This implementation quirks can be leveraged to serve out
+of HPA only small allocations (small in jemalloc definition is allocation less
+than 16 KiB).
+
+Unsigned integer. Default value: `65536` bytes. Minimum value: `PAGE` bytes.
+Maximum value: `HUGEPAGE` bytes.
+
+### hpa_hugification_threshold
+
+Minimum number of active bytes in a pageslab necessary for pageslab to be
+placed into hugification queue.
+
+Pageslab always produced in a non-huge state. Over time, when number of active
+bytes became greater or equal than `hpa_hugification_threshold`, jemalloc puts
+pageslab into hugification queue.
+
+Unsigned integer. Default value: `0.95 * HUGEPAGE` bytes. Minimum value: `PAGE`
+bytes. Maximum value: `HUGEPAGE` bytes.
+
+### hpa_hugification_threshold_ratio
+
+Minimum percent of active bytes in a pageslab necessary for pageslab to be
+placed into hugification queue.
+
+This option has the same semantic as `hpa_hugification_threshold`, but in
+percent notation.
+
+Fixed-point fractional. Default value: `0.95`. Minimum value: `0`. Maximum
+value: `1.0`.
+
+### hpa_hugify_delay_ms
+
+Time in milliseconds required for pageslab to spent in hugification queue,
+before jemalloc requests OS to back pageslab by a hugepage.
+
+Hugification queue is ordered by timestamp, when pageslab was placed into the
+queue, with head of the queue being pageslab placed into the queue earliest and
+tail of the queue being pageslab there latest. When pageslab stops meeting
+hugification criteria: number of active bytes is less than
+`hpa_hugification_threshold` it is **not** removed from hugification queue.
+Only purge can remove pageslab from hugification queue.
+
+Unsigned integer. Default value: `10000` milliseconds.
+
+### hpa_hugify_sync
+
+Switch to use synchronous hugification requests.
+
+Use `madvise(..., MADV_COLLAPSE)` to request OS back up pageslab by a hugepage
+alongside `madvise(..., MADV_HUGEPAGE)`. Increments
+`stats.arenas.<i>.hpa_shard.nhugify_failures` counter on failure.
+
+Usual asynchronous hugification introduces delay of unknown length, between
+request to OS has been made to hugify a pageslab and OS actually backs up
+pageslab by a hugepage. This option allows to eliminate this delay. Requires
+Linux 6.1 or higher.
+
+Boolean. Default value: `false`.
+
+### hpa_min_purge_interval_ms
+
+Minimum time between two consecutive purge phases in milliseconds.
+
+Each `hpa_min_purge_interval_ms` jemalloc will check if purging criteria are
+met and if they are, it will purge as much pageslabs as needed until purging
+criteria are no longer met. Minimal unit of purging is pageslab, meaning all
+dirty pages will be returned back to the OS from chosen pageslab, even if less
+pages required to be purged to reach purging target. If there are few
+consecutive dirty pages, one syscall will be issued to purge them together in
+one go.
+
+Unsigned integer. Default value: 5000 milliseconds.
+
+### hpa_peak_demand_window_ms
+
+Length of peak demand sliding window in milliseconds.
+
+Time component of purging criteria. Jemalloc will track the maximum number of
+active pages used within `hpa_peak_demand_window_ms` milliseconds sliding
+window. Jemalloc will purge dirty pages above that peak usage.
+
+It is easier to explain in an example. Suppose `ncurrent` is the number of
+active pages currently in use and `npeak` is the peak (maximum) number of
+active pages within the last 10 seconds. Then jemalloc is allowed to keep
+`npeak - ncurrent` dirty pages and will purge the rest of them if there are
+any.
+
+Option `hpa_peak_demand_window_ms` works in combination with `hpa_dirty_mult`.
+
+Unsigned integer. Default value: 0 milliseconds (disabled by default).
+
+### hpa_dirty_mult
+
+Maximum of dirty to active pages ratio jemalloc is allowed to keep.
+
+Ratio based component of purging criteria.
+
+Jemalloc is trying to estimate the maximum amount of active memory application
+might likely need in the near future. It does so by projecting future active
+memory demand (based on peak active memory usage observed in the past within a
+sliding window) and adds slack on top of it (an overhead it is reasonable to
+have in exchange on higher hugepages coverage). When peak demand tracking is
+off, projection of future active memory is current active memory usage.
+
+Estimation is essentially the same as `npeak * (1 + hpa_dirty_mult)`. In case,
+when `hpa_peak_demand_window_ms` is set to `0`, then `npeak` equals to
+`ncurrent` and expression became `ncurrent * hpa_dirty_mult`. When
+`hpa_dirty_mult` is `0`, then the expression becomes just `npeak`.
+
+Option `hpa_dirty_mult` works in combination with `hpa_peak_demand_window_ms`.
+
+Fixed-point fractional or `-1`. Default value is `0.25` (not a great default).
+When set to `-1` disables purging completely.
+
+### hpa_sec_nshards
+
+Number of small extent cache (SEC) shards.
+
+SEC is a cache layer above the HPA page allocator. Requests are distributed
+across small extent cache shards `[0, nshards - 1)`. If a request can not be
+served out of SEC, it will be forwarded to the HPA page allocator.
+
+I can not say I saw cases when the SEC helped much. Probably, more work is
+required to make SEC useful.
+
+Unsigned integer. Default value: 4 shards. When set to `0` disables small
+extent cache (SEC).
+
+### hpa_sec_max_alloc
+
+Maximum size of allocation in bytes, that can be served out of SEC.
+
+Jemalloc will refuse to cache any objects if their size is greater than
+`hpa_sec_max_alloc` and forward such objects to the HPA page allocator.
+
+Unsigned integer. Default value: 32768 bytes. Minimum value: `PAGE` bytes.
+Maximum value: `32768` bytes.
+
+### hpa_sec_max_bytes
+
+Maximum number of bytes small extent cache shard allowed to cache.
+
+When shard cached bytes size exceeds `hpa_sec_max_bytes`, jemalloc will flush
+bins until the number of cached bytes falls below `hpa_sec_bytes_after_flush`.
+
+Unsigned integer. Default value: `262144` bytes. Minimum value: `PAGE`.
+
+### hpa_sec_bytes_after_flush
+
+Maximum number of bytes SEC is allowed to have after flush caused by exceeding
+`hpa_sec_max_bytes`.
+
+This option should be less than `hpa_sec_max_bytes` for SEC to be useful.
+
+Unsigned integer. Default value: `131072` bytes. Minimum value: `PAGE`.
+
+### hpa_sec_batch_fill_extra
+
+Number of extra objects to fill on SEC miss.
+
+When allocation request can not be satisfied out of SEC, because there are no
+available ones cached, jemalloc brings `hpa_sec_batch_fill_extra` additional
+objects to SEC out of HPA page allocator.
+
+Unsigned integer. Default value: `0`. Maximum value: `HUGEPAGES_PAGES`.
+
+### experimental_hpa_max_purge_nhp
+
+Maximum number of pageslabs to purge on each purging phase.
+
+Experimental option that likely will be removed soon. Limits number of pageslab
+to purge on each purging phase.
+
+Signed integer. Default value: `-1` (disabled by default).
+
+## Acknowledgements
+
+Thanks to Kevin Svetlitski, whose note introduced me to the HPA world.