9/18/2013

09-18-13 - Per-Thread Global State Overrides

I wrote about this before ( cbloom rants 11-23-12 - Global State Considered Harmful ) but I'm doing it again because I think almost nobody does it right, so I'm gonna be really pedantic.

For concreteness, let's talk about a Log system that is controlled by bit-flags. So you have a "state" variable that is an or of bit flags. The flags are things like where do you output to (LOG_TO_FILE, LOG_TO_OUTPUTDEBUGSTRING, etc.) and maybe things like subsection enablements (LOG_SYSTEM_IO, LOG_SYSTEM_RENDERER, ...) or verbosity (LOG_V0, LOG_V1, ...). Maybe some bits of the state are an indent level. etc.

So clearly you have a global state where the user/programmer have set the options they want for the log.

But you also need a TLS state. You want to be able to do things like disable the log in scopes :


..

U32 oldState = Log_SetState(0);

FunctionThatLogsTooMuch();

Log_SetState(oldState);

..

(and in practice it's nice to use a scoper-class to do that for you). If you do that on the global variable, your thread is fucking up the state of other threads, so clearly it needs to be per-thread, eg. in the TLS. (similarly, you might want to inc the indent level for a scope, or change the verbosity level, etc.).

(note of course this is the "system has a stack of states which is implemented in the program stack").

So clearly, those need to be Log_SetLocalState. Then the functions that are used to set the overall options should be something like Log_SetGlobalState.

Now some notes on how the implementation works.

The global state has to be thread safe. It should just be an atomic var :


static U32 s_log_global_state;

U32 Log_SetGlobalState( U32 state )
{
    // set the new state and return the old; this must be an exchange

    U32 ret = Atomic_Exchange(&s_log_global_state, state , mo_acq_rel);

    return ret;
}

U32 Log_GetGlobalState( )
{
    // probably could be relaxed but WTF let's just acquire

    U32 ret = Atomic_Load(&s_log_global_state, mo_acquire);

    return ret;
}

(note that I sort of implicitly assume that there's only one thread (a "main" thread) that is setting the global state; generally it's set by command line or .ini options, and maybe from user keys in a HUD; the global state is not being fiddled by lots of threads at program time, because that creates races. eg. if you wanted to do something like turn on the LOG_TO_FILE bit, it should be done with a CAS loop or an Atomic OR, not by doing a _Get and then _Set).

Now the Local functions need to set the state in the TLS and *also* which bits are set in the local state. So the actual function is like :


per_thread U32_pair tls_log_local_state;

U32_pair Log_SetLocalState( U32 state , U32 state_set_mask )
{
    // read TLS :

    U32_pair ret = tls_log_local_state;

    // write TLS :

    tls_log_local_state = U32_pair( state, state_set_mask );

    return ret;
}

U32_pair Log_GetLocalState( )
{
    // read TLS :

    U32_pair ret = tls_log_local_state;

    return ret;
}

Note obviously no atomics or mutexes are need in per-thread functions.

So now we can get the effective combined state :


U32 Log_GetState( )
{
    U32_pair local = Log_GetLocalState();
    U32 global = Log_GetGlobalState();

    // take local state bits where they are set, else global state bits :

    U32 state = (local.first & local.second) | (global & (~local.second) );

    return state;
}

So internally to the log's operation you start every function with something like :

static bool NoState( U32 state )
{
    // if all outputs or all systems are turned off, no output is possible
    return ((state & LOG_TO_MASK) == 0) ||
        ((state & LOG_SYSTEM_MASK) == 0);
}

void Log_Printf( const char * fmt, ... )
{
    U32 state = Log_GetState();

    if ( NoState(state) )
        return;

    ... more here ...

}

So note that up to the "... more here ..." we have not taken any mutexes or in any way synchronized the threads against each other. So when the log is disabled we just exit there before doing anything painful.

Now the point of this post is not about a log system. It's that you have to do this any time you have global state that can be changed by code (and you want that change to only affect the current thread).

In the more general case you don't just have bit flags, you have arbitrary variables that you want to be per-thread and global. Here's a helper struct to do a global atomic with thread-overridable value :

            
struct tls_intptr_t
{
    int m_index;
    
    tls_intptr_t()
    {
        m_index = TlsAlloc();
        ASSERT( get() == 0 );
    }
    
    intptr_t get() const { return (intptr_t) TlsGetValue(m_index); }

    void set(intptr_t v) { TlsSetValue(m_index,(LPVOID)v); }
};

struct intptr_t_and_set
{
    intptr_t val;
    intptr_t set; // bool ; is "val" set
    
    intptr_t_and_set(intptr_t v,intptr_t s) : val(v), set(s) { }
};
    
struct overridable_intptr_t
{
    atomic<intptr_t>    m_global;
    tls_intptr_t    m_local;    
    tls_intptr_t    m_localset;
        
    overridable_intptr_t(intptr_t val = 0) : m_global(val)
    {
        ASSERT( m_localset.get() == 0 );
    }       
    
    //---------------------------------------------
    
    intptr_t set_global(intptr_t val)
    {
        return m_global.exchange(val,mo_acq_rel);
    }
    intptr_t get_global() const
    {
        return m_global.load(mo_acquire);
    }
    
    //---------------------------------------------
    
    intptr_t_and_set get_local() const
    {
        return intptr_t_and_set( m_local.get(), m_localset.get() );
    }
    intptr_t_and_set set_local(intptr_t val, intptr_t set = 1)
    {
        intptr_t_and_set old = get_local();
        m_localset.set(set);
        if ( set )
            m_local.set(val);
        return old;
    }
    intptr_t_and_set set_local(intptr_t_and_set val_and_set)
    {
        intptr_t_and_set old = get_local();
        m_localset.set(val_and_set.set);
        if ( val_and_set.set )
            m_local.set(val_and_set.val);
        return old;
    }
    intptr_t_and_set clear_local()
    {
        intptr_t_and_set old = get_local();
        m_localset.set(0);
        return old;
    }
    
    //---------------------------------------------
    
    intptr_t get_combined() const
    {
        intptr_t_and_set local = get_local();
        if ( local.set )
            return local.val;
        else
            return get_global();
    }
};

//=================================================================         

// test code :  

static overridable_intptr_t s_thingy;

int main(int argc,char * argv[])
{
    argc; argv;
    
    s_thingy.set_global(1);
    
    s_thingy.set_local(2,0);
    
    ASSERT( s_thingy.get_combined() == 1 );
    
    intptr_t_and_set prev = s_thingy.set_local(3,1);
    
    ASSERT( s_thingy.get_combined() == 3 );

    s_thingy.set_global(2);
    
    ASSERT( s_thingy.get_combined() == 3 );
    
    s_thingy.set_local(prev);
    
    ASSERT( s_thingy.get_combined() == 2 );
        
    return 0;
}

Or something.

Of course this whole post is implicitly assuming that you are using the "several threads that stay alive for the length of the app" model. An alternative is to use micro-threads that you spin up and down, and rather than inheriting from a global state, you would want them to inherit from the spawning thread's current combined state.

2 comments:

bionic said...

I've used this approach and it works really well until you start building stuff using jobs/tasks. At that point what you really want is state associated with the job group since the scheduler will most likely reuse threads to join in on other work while a task is blocked.

Unfortunately this is not always easy to implement efficiently, like when you're using a thirdparty framework like ConcRT (see http://stackoverflow.com/questions/12882719/implementing-task-local-variables-for-concurrency-runtime for example)

Obviously you can "solve" this by explicitly passing all state around, but it gets tedious real quick for certain stuff (like the log in your example).

cbloom said...

I think that in these task-based frameworks one of the first things you have to do is make a TLS replacement.

I guess the most general way to do that is to have every Task take a "State *" as argument, and then every single function in your code has to take "State *" as one of its arguments. That you can find local vars in the State.

old rants