-
Notifications
You must be signed in to change notification settings - Fork 14
/
THOUGHTS
184 lines (144 loc) · 5.82 KB
/
THOUGHTS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
GHC Inefficiencies:
Suspended functions (thunks) are padded with 1 word. 2016-10-28
This padding is necessary to atomically update thunks with an indirection.
let x = succ 10 in ....
GHC
┌──────┬─────┬──┐ ┌────┬────┐
│ succ │ pad │ ├─┤ I# │ 10 │
└──────┴─────┴──┘ └────┴────┘
3 words
After eval + update:
GHC
┌─────┬───┬──┐ ┌────┬────┐
│ Ind │ │ ├─┤ I# │ 10 │
└─────┴─┬─┴──┘ └────┴────┘
┌──┴─┬────┐
│ I# │ 11 │
└────┴────┘
Graph compression: (written 2016-10-30)
Inlining of data 32bit or less:
Object tags fit in 32bits which leaves us 32bit for extra data. For example,
characters only take up a single word compared to two words in GHC.
Consider the memory layout (each box is one word) of 'a' in LHC and GHC:
GHC LHC
┌────┬─────┐ ┌────────┐
│ C# │ 'a' │ │ C# 'a' │
└────┴─────┘ └────────┘
2 words 1 word
Inlining fixed-sized objects:
Objects with a fixed size (ie. not containing any heap pointers) can be
inlined into their parent object. A bitmap in the parent tag indicates
which objects have been inlined.
Memory layout of (Just ()):
GHC LHC
┌──────┬───┐ ┌──────┬──────┐
│ Just │ │ │ Just │ Unit │
└──────┴─┬─┘ └──────┴──────┘
┌───┴──┐
│ Unit │
└──────┘
Inlining doesn't affect sharing. Other objects can still refer to an object
that has been inlined.
Inlining tail pointers:
The last pointer in an object can always be inlined, even if the pointee
itself contains heap pointers.
Consider the memory layout of 'h':'i':[]:
GHC (11 words) LHC (5 words)
┌───┬───┬───┐┌───┬───┬───┐┌────┐ ┌───┬────────┬───┬────────┬────┐
│ : │ │ ├┤ : │ │ ├┤ [] │ │ : │ C# 'h' │ : │ C# 'i' │ [] │
└───┴─┬─┴───┘└───┴─┬─┴───┘└────┘ └───┴────────┴───┴────────┴────┘
┌──┴─┬─────┐ ┌──┴─┬─────┐
│ C# │ 'h' │ │ C# │ 'i' │
└────┴─────┘ └────┴─────┘
Pretty code lowering:
GHC's type-checker likes to re-order definitions. The `haskell-tc` library
guarantees to add type-annotations without re-ordering anything.
Desugaring:
Complicated pattern matching:
GHC tends to use this pattern:
fn X Y = branchCodeTop
fn _ _ = branchCodeBottom```
let fail = branchCodeBottom
in case a of
X -> case b of
Y -> branchCodeTop
_ -> fail
_ -> fail
This in undesirable because it rearranges the order of the code.
It IS acceptable to re-order the branches when the order of the checks is actually changed. Consider:
fn (Just True) = branch1
fn Nothing = branch2
fn (Just False) = branch3
It is OK to desugar like this:
case arg of
Just bool -> case bool of
True -> branch1
False -> branch3
Nothing -> branch2
True for all GC strategies, handled by LHC:
Heap pointers are tagged
1 bit for inlinable (always false on 32bit systems)
2 bits for Generation #
Heap objects are tagged
18 bits for the node ID
14 bits for strategy, for example:
3 bits for stepping
1 bit for marking
10 bits for bitmap of inlined children
optional 32 bits for small field
Node API
Node Layouts:
(,) ptr ptr
(,) ptr node
(,) (static node) node
getTag :: Ptr Node -> Tag
getSmallField :: Ptr Node -> I32
getAddressOf :: Ptr Node -> Int -> Ptr Node
evacuate
scavenge
Semi space GC:
init:
allocate space
set hp
set hp_limit
start_gc:
allocate to-space (same size as from-space)
mark:
semi_space_evacuate(ptr, n_prims, n_ptrs)
mark_frame:
semi_space_evacuate_frame(ptr, n_prims, n_ptrs)
end_gc:
allocate space (2 x live data)
scavenge:
semi_space_evacuate(ptr, n_prims, n_ptrs):
if already in to-space, return;
copy over object
How to add special closures?
unsafePerformIO
Need to use CAS to guarantee it is only executed once even with multiple threads.
MVar
IORef
Array
ByteArray
Garbage collection:
Three generations:
nursery
semi-space
mark-and-sweep (and compact? immix?)
Nursery
Fixed size. Space reused. Lives entirely in cache.
Per-OS-thread
Semi-space
Large allocation area size
Incremental
Concurrent
Mark-and-sweep
Incremental
Concurrent
Let's say we have 4 threads. Semi-space gen decides it is time for a collection.
It sets a flag and waits. One thread fills its nursery, copies over the live
objects and evacuates any object from gen 1 that are reachable.
Garbage collector thread can begin scavenging. Scavenging is not done until
all threads have finished evacuating.
Only use per-object lock for mutable objects. Evacuation might duplicate some
objects.