How much overhead is associated with management of a ThreadLocal object ?
A ThreadLocal is implemented using a synchronized WeakHashMap with the Thread object as the key.
The time overhead for calling ThreadLocal.get() is essentially the synchronization overhead (with potential for blocking other threads attempting to access the same ThreadLocal) plus the WeakHashMap.get() overhead. A WeakHashMap is implemented using a HashMap and the HashMap documentation describes the performance characteristics of its methods.
When a ThreadLocal is instantiated, it creates the underlying synchronized WeakHashMap with an initial capacity of 53. The synchronized WeakHashMap is a wrapper Map class--which has five reference fields, including a mutex lock--wrapped around a WeakHashMap. The WeakHashMap -- which has five reference fields -- in turn creates a HashMap with the same capacity and a ReferenceQueue to keep track of any GC'ed hash entries. The ReferenceQueue needs an object to use as a mutex lock and a Reference field to hold the head of the queue. The HashMap--which has six reference, three int and a float field--allocates an array of 53 initially null hash entries. So assuming a 32-bit reference type on a 32-bit processor and a resulting 4-byte overhead per object, instantiating a ThreadLocal should total:
ThreadLocal 4 synchronized wrapper Map 4 5 references 20 mutex lock object 4 + native mutex lock WeakHashMap 4 5 references 20 ReferenceQueue 4 mutex lock object 4 + native mutex lock 2 references 8 HashMap 4 6 references 24 3 ints, 1 float 16 entry 220 = 336 bytes + 2 native mutex locks
The initial capacity of 53 means that the WeakHashMap will not need to be rehashed to a larger size until 38 Threads have stored a value in the ThreadLocal. So the 39th Thread (assuming none have been GC'ed) will increase the per-ThreadLocal overhead. Each of the first 38 threads that access a value in the ThreadLocal will have an entry added to the hash. ThreadLocal wraps each value before inserting it in the hash so that the wrapper's hashCode() and equals() methods are used instead of the value's. To be inserted in the weak hash the wrapper needs a WeakKey reference and a hash entry. The WeakKey has a reference and an int and the hash entry contains three references and an int. So for each thread that accesses the ThreadLocal, an additional:
Value wrapper 4 value reference 4 WeakKey 4 1 int 4 key reference 4 Hash entry 4 3 references 12 1 int 4 = 40 bytes
There is some additional overhead as threads are garbage collected in terms of both space and time, but it's probably fairly inconsequential unless there are lots of short-lived threads accessing the ThreadLocal.