Most of your programmer time during a project is spent researching, making decisions and then scrapping all those decisions for some good reason or another. For every project i've worked on, i've usually kept some sort of daily linear journal in electronic form that describes issues / concepts / decisions about a given research / problem and describes the results. These types of logs are massively important for a number of reasons, the most dominant being the ability to correlate information so that I can walk back through niche ideas that may have been discarded for prior constraints, but may now be applicable. I also can store information for specific decisions (ie "we chose 256 as a texture size due to disk read speed on an XBOX360, here's the stats:" ).
Prior forms of this used GoogleDocs but as the amount of information grew, it became massively difficult to actually manage the dataset over time, especially if you wanted to find everything related to "terrain."
I've been doing consulting for a company that goes somewhat overboard with their online wiki process to describe code changes. Effectively any change must be discussed, validated, added to the wiki, discussed again, and then done. Now, in particular, that's overkill, but it did spark a light in my head about how a custom wiki would help me keep track of my daily journals much better than google docs.
So I set off in search of a way to get access to a free wiki to install on my webserver. Unfortintally, most of the wikis require Apache, or Ruby etc etc, which my webserver doesn't do for me at the current cost per month.
And then i stumbled across one of the coolest things I'd seen: TiddlyWiki. The entire wiki itself is contained in a single .html file, so it's easy to keep track of, move, or carry around with you on a USB stick. Apparently I'm a little late to the party, as it was voted one of the "best tools of 2007."
In addition to that, it seems that quite a few number of people have decided to script / modify the system with some rather cool results:
TiddlyWiki FTRUS : A simply wiki skin designed for ease of use
TiddlyThemes: a massive site divoted to custom skins of tiddlys
TiddlyDesktop: Control content via movable windows in your browser.
LATEX: Add mathmatical symbols to your wiki entries
Wiki On A Stick : A much more slimmed down, notes based version of things. By far the most bare-bones version of this type of thing.
TiddlyBackpack : is a nicely slimmed down and cleaner version of things.
If you like keeping these types of data logs, or you think it might be up your alley, then i highly suggest looking at this tool.
~Main
5.28.2009
5.04.2009
A different scalable threaded architecture?
DISCLAIMER : This is a wandering brain dump of thoughts. If it doesn't lead anywhere, you can't be disseminated.. you've been warned..
A lot of C++ style API's are out there now to deal with things revolving around task-based decomposition (IE mass-threaded processing of small job packets) :
Intel Threaded Building Blocks
Job Swarm
OpenMP
Click++
All of these models handle the batch-job processing system the same way; the thread manager owns a number of threads, in which each thread will (quickly) run through the cycle of poll-acquire-work-release job tasks.
Although task-based decomposition is really the only way to ensure scalability across increasing cores, in the games industry, we still need the ability to work on Functional-based decomposition (ie the "sim" gets thread 0, the "renderer" gets thread 1) as well as task-based. Specifically when dealing with non-thread safe Graphics APIS, you need a linear execution of graphics actions to occur in sequence. Especially in your main render loop, which might look something like this:
compute shadow maps
sort visible geometry
skin animated characters
render opaque geometry
render transparent geometry
render HUD
Or whatever. The main point here is that this is a very linear process that NEEDS to be linear in order to interact with the depth rendering process of the GPU. As such, it makes sense that for atleast the inner most render loop, you'll need some sort of functional based decomposition of things. And since most of the APIs require the device access to be single-threaded, means that you either need to have a single, long "job" that has a thread affinity to a single thread, or you need to just lock down that thread as the "render thread"
With that concept, it's easy to see how most current generation titles hobble together their threading systems : 1 sim thread, 1 render thread, +N Thread Job Pool.
In practice, this seems to scale pretty well at this point, as you can at least ensure the capability of 2 threads on hyperthreaded machines. The downfall though, is that the Job-Pool threads are effectively just slaves of swarm computing. For instance, if you want your networking systems to receive information on a separate thread, with the above model, you'd have to spawn a job to listen / poll for that work into the pool, which could be difficult with timing (ie are you ALWAYS sure that the next network poll will be 16ms from now?) Additionally, this type of setup only allows you to communicate to the functional-threads through message passing. IE you can't really have a system that 'belongs' on a specific thread, since you only have 2 explicit threads, and N job-pool threads, meaning you can't send a message to a system on a specific thread, without thread affinity and advanced knowledge of the number of threads on the system. The main point here, is that the job-pool threads will never be anything BUT job-pool threads; Slaves tasked to adorn their shackles of mindless job computing.
In HALO WARS, we had a similar functionally threaded system, except that our JOB POOL didn't OWN the threads. Instead, we allowed ANY THREAD to query / wait on the job pool to acquire work. So there really was no 'manager' to the threads; Anyone could fire up a thread and control it's own processing loop, and when it wanted, grab work to do from the thread pool.
The excellent part about this, was it allowed us to bridge that gap between functional and task based decomposition. IE you could easily spawn a thread that now becomes the 'physics thread' and allow it to query the job pool whenever it wants. (Additionally you could perform waitForSingle/MultipleObject calls on the pool, and as you were waiting for your event, you'd grab work and process it..). In addition, it also allowed us some interesting concepts, like cloud processing, where a given thread could have a TCP/IP connection to another machine, grab a packet of work (and the marked memory) and send it off to "the cloud" to be computed. The Job-Pool didn't care about this; It just has jobs, and hopes someone will service them.
The down side of this, was that each thread was effectively bottle-necked by the speed and memory of how fast they could grab work from the job pool. In addition, there wasn't a way to load-balance the number of jobs against available workers; Effectively the thread issuing the jobs was responsible for determining proper job-size, which gets problematic when you consider that it forces the logic for thread-count and job-size determination to the caller, rather than the manager. (Which is not good if you have a team that really doesn't "get" multithreading..)
I've been pushing thoughts around for some time on a decent bridge between these two concepts. I don't really think the linear-dependency-in-job-pools issue has been solved in a 'right' way that doesn't make your brain bleed when you consider thousands of jobs being interdependent. I think things like the D langauage, and "Atomic Function Programming" are better ideas to solve the problem, but asking the current generation of game programmers to develop applications in either would be suicide. Which forces me to keep considering that a mix of hybrid and functional based decomposition is still the best thing out there.
The depressing thing about that, is that it stalls us from actually moving forward with adapting more design-specific elements into threading, as opposed to experience-specific elements. I suppose the bigger problem is how do you make a 'design concept' still play the same on a box with 2 cores, and a box with 20 cores. The first step to fixing this problem, I think, is already on the table in terms of DX11 /360 style multithreading. The DCB / PCB generation process effectively lets you 'render' across multiple threads in parallel, and then submit them to the GPU linearly. I can only see this type of model currently working for GPUs though, where the 'setup' and 'state' setting for a given draw call can be even more of a burden than actually drawing a given object. In my experience, this doesn't translate over to sim related tasks, where usually the operation (not the setup) is the more expensive part.
So it seems that as we continue on this path towards parallel graphics / hetrogenious computing, that the client side tasks (graphics, physics etc) will become more and more parallel and scale better with new hardware, but the sim side tasks (limiting number of units in the world depending on minimum expected experience) will seemingly always stay linear, and a function of single threaded execution. Meaning that we'll atleast always need a few functionally designated threads.
Damn.
~Main
A lot of C++ style API's are out there now to deal with things revolving around task-based decomposition (IE mass-threaded processing of small job packets) :
All of these models handle the batch-job processing system the same way; the thread manager owns a number of threads, in which each thread will (quickly) run through the cycle of poll-acquire-work-release job tasks.
Although task-based decomposition is really the only way to ensure scalability across increasing cores, in the games industry, we still need the ability to work on Functional-based decomposition (ie the "sim" gets thread 0, the "renderer" gets thread 1) as well as task-based. Specifically when dealing with non-thread safe Graphics APIS, you need a linear execution of graphics actions to occur in sequence. Especially in your main render loop, which might look something like this:
Or whatever. The main point here is that this is a very linear process that NEEDS to be linear in order to interact with the depth rendering process of the GPU. As such, it makes sense that for atleast the inner most render loop, you'll need some sort of functional based decomposition of things. And since most of the APIs require the device access to be single-threaded, means that you either need to have a single, long "job" that has a thread affinity to a single thread, or you need to just lock down that thread as the "render thread"
With that concept, it's easy to see how most current generation titles hobble together their threading systems : 1 sim thread, 1 render thread, +N Thread Job Pool.
In practice, this seems to scale pretty well at this point, as you can at least ensure the capability of 2 threads on hyperthreaded machines. The downfall though, is that the Job-Pool threads are effectively just slaves of swarm computing. For instance, if you want your networking systems to receive information on a separate thread, with the above model, you'd have to spawn a job to listen / poll for that work into the pool, which could be difficult with timing (ie are you ALWAYS sure that the next network poll will be 16ms from now?) Additionally, this type of setup only allows you to communicate to the functional-threads through message passing. IE you can't really have a system that 'belongs' on a specific thread, since you only have 2 explicit threads, and N job-pool threads, meaning you can't send a message to a system on a specific thread, without thread affinity and advanced knowledge of the number of threads on the system. The main point here, is that the job-pool threads will never be anything BUT job-pool threads; Slaves tasked to adorn their shackles of mindless job computing.
In HALO WARS, we had a similar functionally threaded system, except that our JOB POOL didn't OWN the threads. Instead, we allowed ANY THREAD to query / wait on the job pool to acquire work. So there really was no 'manager' to the threads; Anyone could fire up a thread and control it's own processing loop, and when it wanted, grab work to do from the thread pool.
The excellent part about this, was it allowed us to bridge that gap between functional and task based decomposition. IE you could easily spawn a thread that now becomes the 'physics thread' and allow it to query the job pool whenever it wants. (Additionally you could perform waitForSingle/MultipleObject calls on the pool, and as you were waiting for your event, you'd grab work and process it..). In addition, it also allowed us some interesting concepts, like cloud processing, where a given thread could have a TCP/IP connection to another machine, grab a packet of work (and the marked memory) and send it off to "the cloud" to be computed. The Job-Pool didn't care about this; It just has jobs, and hopes someone will service them.
The down side of this, was that each thread was effectively bottle-necked by the speed and memory of how fast they could grab work from the job pool. In addition, there wasn't a way to load-balance the number of jobs against available workers; Effectively the thread issuing the jobs was responsible for determining proper job-size, which gets problematic when you consider that it forces the logic for thread-count and job-size determination to the caller, rather than the manager. (Which is not good if you have a team that really doesn't "get" multithreading..)
I've been pushing thoughts around for some time on a decent bridge between these two concepts. I don't really think the linear-dependency-in-job-pools issue has been solved in a 'right' way that doesn't make your brain bleed when you consider thousands of jobs being interdependent. I think things like the D langauage, and "Atomic Function Programming" are better ideas to solve the problem, but asking the current generation of game programmers to develop applications in either would be suicide. Which forces me to keep considering that a mix of hybrid and functional based decomposition is still the best thing out there.
The depressing thing about that, is that it stalls us from actually moving forward with adapting more design-specific elements into threading, as opposed to experience-specific elements. I suppose the bigger problem is how do you make a 'design concept' still play the same on a box with 2 cores, and a box with 20 cores. The first step to fixing this problem, I think, is already on the table in terms of DX11 /360 style multithreading. The DCB / PCB generation process effectively lets you 'render' across multiple threads in parallel, and then submit them to the GPU linearly. I can only see this type of model currently working for GPUs though, where the 'setup' and 'state' setting for a given draw call can be even more of a burden than actually drawing a given object. In my experience, this doesn't translate over to sim related tasks, where usually the operation (not the setup) is the more expensive part.
So it seems that as we continue on this path towards parallel graphics / hetrogenious computing, that the client side tasks (graphics, physics etc) will become more and more parallel and scale better with new hardware, but the sim side tasks (limiting number of units in the world depending on minimum expected experience) will seemingly always stay linear, and a function of single threaded execution. Meaning that we'll atleast always need a few functionally designated threads.
Damn.
~Main
Subscribe to:
Posts (Atom)