1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
|
## Intro
Jemalloc HPA is hugepages aware implementation of pages allocator. HPA
leverages hugepages to reduce cost of TLB misses and thereby improve
application performance.
## Glossary
### Pageslab
Pageslab is hugepage aligned and sized memory range. You can think of it as a
set of pages packed together into a hugepage. Pageslab is not necessary backed
up by a hugepage.
### Active Page
Active page is a memory page that potentially stores application data.
### Dirty Page
Dirty page is a memory page that might have had application data on it in the
past, but has no application data now. It can be reused (became active again)
or returned back to the OS anytime.
### Purge
Purge is a process of returning dirty pages back to the OS.
### Hugification
Hugification is a request to the OS back pageslab by a hugepage.
## Constants
### `PAGE`
`PAGE` is the size of a jemalloc page. By default it is `4096` bytes on x86\_64.
### `HUGEPAGE`
`HUGEPAGE` is the size of the OS hugepage. By default it is `2097152` bytes (2
MiB) on x86\_64.
### `HUGEPAGES_PAGES`
Number of pages in a single hugepage: `HUGEPAGE / PAGE`. By default it is `512`
on x86\_64.
## Documentation
HPA is under active development and options are not described in official
documentation yet. Below is a brief description of currently available options.
### hpa
HPA enabled/disabled.
Master switch to enable HPA. This option should be enabled to make other
options to work.
Boolean. Default value: `false`.
### hpa_slab_max_alloc
Maximum allocation size in bytes allowed to be served from HPA page allocator.
Allocations of greater size will be served from a more general (for now)
classic page allocator (PAC), which can handle allocation requests of any size.
Slab allocations will always be served out of HPA, even when the
`hpa_slab_max_alloc` option is set to a small value like `PAGE` due to
implementation quirks. This implementation quirks can be leveraged to serve out
of HPA only small allocations (small in jemalloc definition is allocation less
than 16 KiB).
Unsigned integer. Default value: `65536` bytes. Minimum value: `PAGE` bytes.
Maximum value: `HUGEPAGE` bytes.
### hpa_hugification_threshold
Minimum number of active bytes in a pageslab necessary for pageslab to be
placed into hugification queue.
Pageslab always produced in a non-huge state. Over time, when number of active
bytes became greater or equal than `hpa_hugification_threshold`, jemalloc puts
pageslab into hugification queue.
Unsigned integer. Default value: `0.95 * HUGEPAGE` bytes. Minimum value: `PAGE`
bytes. Maximum value: `HUGEPAGE` bytes.
### hpa_hugification_threshold_ratio
Minimum percent of active bytes in a pageslab necessary for pageslab to be
placed into hugification queue.
This option has the same semantic as `hpa_hugification_threshold`, but in
percent notation.
Fixed-point fractional. Default value: `0.95`. Minimum value: `0`. Maximum
value: `1.0`.
### hpa_hugify_delay_ms
Time in milliseconds required for pageslab to spent in hugification queue,
before jemalloc requests OS to back pageslab by a hugepage.
Hugification queue is ordered by timestamp, when pageslab was placed into the
queue, with head of the queue being pageslab placed into the queue earliest and
tail of the queue being pageslab there latest. When pageslab stops meeting
hugification criteria: number of active bytes is less than
`hpa_hugification_threshold` it is **not** removed from hugification queue.
Only purge can remove pageslab from hugification queue.
Unsigned integer. Default value: `10000` milliseconds.
### hpa_hugify_sync
Switch to use synchronous hugification requests.
Use `madvise(..., MADV_COLLAPSE)` to request OS back up pageslab by a hugepage
alongside `madvise(..., MADV_HUGEPAGE)`. Increments
`stats.arenas.<i>.hpa_shard.nhugify_failures` counter on failure.
Usual asynchronous hugification introduces delay of unknown length, between
request to OS has been made to hugify a pageslab and OS actually backs up
pageslab by a hugepage. This option allows to eliminate this delay. Requires
Linux 6.1 or higher.
Boolean. Default value: `false`.
### hpa_min_purge_interval_ms
Minimum time between two consecutive purge phases in milliseconds.
Each `hpa_min_purge_interval_ms` jemalloc will check if purging criteria are
met and if they are, it will purge as much pageslabs as needed until purging
criteria are no longer met. Minimal unit of purging is pageslab, meaning all
dirty pages will be returned back to the OS from chosen pageslab, even if less
pages required to be purged to reach purging target. If there are few
consecutive dirty pages, one syscall will be issued to purge them together in
one go.
Unsigned integer. Default value: 5000 milliseconds.
### hpa_peak_demand_window_ms
Length of peak demand sliding window in milliseconds.
Time component of purging criteria. Jemalloc will track the maximum number of
active pages used within `hpa_peak_demand_window_ms` milliseconds sliding
window. Jemalloc will purge dirty pages above that peak usage.
It is easier to explain in an example. Suppose `ncurrent` is the number of
active pages currently in use and `npeak` is the peak (maximum) number of
active pages within the last 10 seconds. Then jemalloc is allowed to keep
`npeak - ncurrent` dirty pages and will purge the rest of them if there are
any.
Option `hpa_peak_demand_window_ms` works in combination with `hpa_dirty_mult`.
Unsigned integer. Default value: 0 milliseconds (disabled by default).
### hpa_dirty_mult
Maximum of dirty to active pages ratio jemalloc is allowed to keep.
Ratio based component of purging criteria.
Jemalloc is trying to estimate the maximum amount of active memory application
might likely need in the near future. It does so by projecting future active
memory demand (based on peak active memory usage observed in the past within a
sliding window) and adds slack on top of it (an overhead it is reasonable to
have in exchange on higher hugepages coverage). When peak demand tracking is
off, projection of future active memory is current active memory usage.
Estimation is essentially the same as `npeak * (1 + hpa_dirty_mult)`. In case,
when `hpa_peak_demand_window_ms` is set to `0`, then `npeak` equals to
`ncurrent` and expression became `ncurrent * hpa_dirty_mult`. When
`hpa_dirty_mult` is `0`, then the expression becomes just `npeak`.
Option `hpa_dirty_mult` works in combination with `hpa_peak_demand_window_ms`.
Fixed-point fractional or `-1`. Default value is `0.25` (not a great default).
When set to `-1` disables purging completely.
### hpa_sec_nshards
Number of small extent cache (SEC) shards.
SEC is a cache layer above the HPA page allocator. Requests are distributed
across small extent cache shards `[0, nshards - 1)`. If a request can not be
served out of SEC, it will be forwarded to the HPA page allocator.
I can not say I saw cases when the SEC helped much. Probably, more work is
required to make SEC useful.
Unsigned integer. Default value: 4 shards. When set to `0` disables small
extent cache (SEC).
### hpa_sec_max_alloc
Maximum size of allocation in bytes, that can be served out of SEC.
Jemalloc will refuse to cache any objects if their size is greater than
`hpa_sec_max_alloc` and forward such objects to the HPA page allocator.
Unsigned integer. Default value: 32768 bytes. Minimum value: `PAGE` bytes.
Maximum value: `32768` bytes.
### hpa_sec_max_bytes
Maximum number of bytes small extent cache shard allowed to cache.
When shard cached bytes size exceeds `hpa_sec_max_bytes`, jemalloc will flush
bins until the number of cached bytes falls below `hpa_sec_bytes_after_flush`.
Unsigned integer. Default value: `262144` bytes. Minimum value: `PAGE`.
### hpa_sec_bytes_after_flush
Maximum number of bytes SEC is allowed to have after flush caused by exceeding
`hpa_sec_max_bytes`.
This option should be less than `hpa_sec_max_bytes` for SEC to be useful.
Unsigned integer. Default value: `131072` bytes. Minimum value: `PAGE`.
### hpa_sec_batch_fill_extra
Number of extra objects to fill on SEC miss.
When allocation request can not be satisfied out of SEC, because there are no
available ones cached, jemalloc brings `hpa_sec_batch_fill_extra` additional
objects to SEC out of HPA page allocator.
Unsigned integer. Default value: `0`. Maximum value: `HUGEPAGES_PAGES`.
### experimental_hpa_max_purge_nhp
Maximum number of pageslabs to purge on each purging phase.
Experimental option that likely will be removed soon. Limits number of pageslab
to purge on each purging phase.
Signed integer. Default value: `-1` (disabled by default).
## Acknowledgements
Thanks to Kevin Svetlitski, whose note introduced me to the HPA world.
|