How much overhead is associated with management of a ThreadLocal object ?

Doug Bell

A ThreadLocal is implemented using a synchronized WeakHashMap with the Thread object as the key.

The time overhead for calling ThreadLocal.get() is essentially the synchronization overhead (with potential for blocking other threads attempting to access the same ThreadLocal) plus the WeakHashMap.get() overhead. A WeakHashMap is implemented using a HashMap and the HashMap documentation describes the performance characteristics of its methods.

The space overhead for a ThreadLocal can be divided into two categories: per ThreadLocal overhead and per Thread overhead. The following probably tells you more than you need or want to know regarding the size overhead for a thread local, but since you asked...

When a ThreadLocal is instantiated, it creates the underlying synchronized WeakHashMap with an initial capacity of 53. The synchronized WeakHashMap is a wrapper Map class--which has five reference fields, including a mutex lock--wrapped around a WeakHashMap. The WeakHashMap -- which has five reference fields -- in turn creates a HashMap with the same capacity and a ReferenceQueue to keep track of any GC'ed hash entries. The ReferenceQueue needs an object to use as a mutex lock and a Reference field to hold the head of the queue. The HashMap--which has six reference, three int and a float field--allocates an array of 53 initially null hash entries. So assuming a 32-bit reference type on a 32-bit processor and a resulting 4-byte overhead per object, instantiating a ThreadLocal should total:

    ThreadLocal                         4
        synchronized wrapper Map        4
            5 references               20
            mutex lock object           4 + native mutex lock
            WeakHashMap                 4
                5 references           20
                ReferenceQueue          4
                    mutex lock object   4 + native mutex lock
                    2 references        8
                HashMap                 4
                    6 references       24
                    3 ints, 1 float    16
                    entry[53]         220
                                    = 336 bytes + 2 native mutex locks

The initial capacity of 53 means that the WeakHashMap will not need to be rehashed to a larger size until 38 Threads have stored a value in the ThreadLocal. So the 39th Thread (assuming none have been GC'ed) will increase the per-ThreadLocal overhead. Each of the first 38 threads that access a value in the ThreadLocal will have an entry added to the hash. ThreadLocal wraps each value before inserting it in the hash so that the wrapper's hashCode() and equals() methods are used instead of the value's. To be inserted in the weak hash the wrapper needs a WeakKey reference and a hash entry. The WeakKey has a reference and an int and the hash entry contains three references and an int. So for each thread that accesses the ThreadLocal, an additional:

    Value wrapper           4
        value reference     4
    WeakKey                 4
        1 int               4
        key reference       4
    Hash entry              4
        3 references       12
        1 int               4
                         = 40 bytes

There is some additional overhead as threads are garbage collected in terms of both space and time, but it's probably fairly inconsequential unless there are lots of short-lived threads accessing the ThreadLocal.