THOUGHTS

GHC Inefficiencies:
  Suspended functions (thunks) are padded with 1 word. 2016-10-28
    This padding is necessary to atomically update thunks with an indirection.

    let x = succ 10 in ....

       GHC
    ┌──────┬─────┬──┐ ┌────┬────┐
    │ succ │ pad │  ├─┤ I# │ 10 │
    └──────┴─────┴──┘ └────┴────┘
          3 words

    After eval + update:

      GHC
    ┌─────┬───┬──┐ ┌────┬────┐
    │ Ind │   │  ├─┤ I# │ 10 │
    └─────┴─┬─┴──┘ └────┴────┘
         ┌──┴─┬────┐
         │ I# │ 11 │
         └────┴────┘


Graph compression: (written 2016-10-30)
  Inlining of data 32bit or less:
    Object tags fit in 32bits which leaves us 32bit for extra data. For example,
    characters only take up a single word compared to two words in GHC.

    Consider the memory layout (each box is one word) of 'a' in LHC and GHC:

         GHC             LHC
    ┌────┬─────┐      ┌────────┐
    │ C# │ 'a' │      │ C# 'a' │
    └────┴─────┘      └────────┘
       2 words          1 word

  Inlining fixed-sized objects:
    Objects with a fixed size (ie. not containing any heap pointers) can be
    inlined into their parent object. A bitmap in the parent tag indicates
    which objects have been inlined.

    Memory layout of (Just ()):

        GHC                 LHC
    ┌──────┬───┐      ┌──────┬──────┐
    │ Just │   │      │ Just │ Unit │
    └──────┴─┬─┘      └──────┴──────┘
         ┌───┴──┐
         │ Unit │
         └──────┘

    Inlining doesn't affect sharing. Other objects can still refer to an object
    that has been inlined.

  Inlining tail pointers:
    The last pointer in an object can always be inlined, even if the pointee
    itself contains heap pointers.

    Consider the memory layout of 'h':'i':[]:

                 GHC (11 words)                 LHC (5 words)
    ┌───┬───┬───┐┌───┬───┬───┐┌────┐  ┌───┬────────┬───┬────────┬────┐
    │ : │   │   ├┤ : │   │   ├┤ [] │  │ : │ C# 'h' │ : │ C# 'i' │ [] │
    └───┴─┬─┴───┘└───┴─┬─┴───┘└────┘  └───┴────────┴───┴────────┴────┘
       ┌──┴─┬─────┐ ┌──┴─┬─────┐
       │ C# │ 'h' │ │ C# │ 'i' │
       └────┴─────┘ └────┴─────┘


Pretty code lowering:
  GHC's type-checker likes to re-order definitions. The `haskell-tc` library
  guarantees to add type-annotations without re-ordering anything.

  Desugaring:
    Complicated pattern matching:
      GHC tends to use this pattern:
        fn X Y = branchCodeTop
        fn _ _ = branchCodeBottom```

        let fail = branchCodeBottom
        in case a of
             X -> case b of
                    Y -> branchCodeTop
                    _ -> fail
             _ -> fail

        This in undesirable because it rearranges the order of the code.

        It IS acceptable to re-order the branches when the order of the checks is actually changed. Consider:
        fn (Just True) = branch1
        fn Nothing = branch2
        fn (Just False) = branch3

        It is OK to desugar like this:
        case arg of
          Just bool -> case bool of
            True  -> branch1
            False -> branch3
          Nothing -> branch2


True for all GC strategies, handled by LHC:
  Heap pointers are tagged
    1 bit for inlinable (always false on 32bit systems)
    2 bits for Generation #
  Heap objects are tagged
    18 bits for the node ID
    14 bits for strategy, for example:
      3 bits for stepping
      1 bit for marking
      10 bits for bitmap of inlined children
    optional 32 bits for small field

Node API
  Node Layouts:
      (,) ptr ptr
      (,) ptr node
      (,) (static node) node
  getTag :: Ptr Node -> Tag
  getSmallField :: Ptr Node -> I32
  getAddressOf :: Ptr Node -> Int -> Ptr Node


evacuate
scavenge


Semi space GC:
  init:
    allocate space
    set hp
    set hp_limit
  start_gc:
    allocate to-space (same size as from-space)
  mark:
    semi_space_evacuate(ptr, n_prims, n_ptrs)
  mark_frame:
    semi_space_evacuate_frame(ptr, n_prims, n_ptrs)
  end_gc:
    allocate space (2 x live data)
  scavenge:


  semi_space_evacuate(ptr, n_prims, n_ptrs):
    if already in to-space, return;
    copy over object


How to add special closures?
  unsafePerformIO
    Need to use CAS to guarantee it is only executed once even with multiple threads.
  MVar
  IORef
  Array
  ByteArray


Garbage collection:
  Three generations:
    nursery
    semi-space
    mark-and-sweep (and compact? immix?)

  Nursery
    Fixed size. Space reused. Lives entirely in cache.
    Per-OS-thread
  Semi-space
    Large allocation area size
    Incremental
    Concurrent
  Mark-and-sweep
    Incremental
    Concurrent

  Let's say we have 4 threads. Semi-space gen decides it is time for a collection.
  It sets a flag and waits. One thread fills its nursery, copies over the live
  objects and evacuates any object from gen 1 that are reachable.
  Garbage collector thread can begin scavenging. Scavenging is not done until
  all threads have finished evacuating.
  Only use per-object lock for mutable objects. Evacuation might duplicate some
  objects.