Walking a list is a common practice for every programming task. Even the most boring, trivial tasks requrie you traverse some sort of grouping of data in order to find data.
Now, contrary to what your CS1 teacher taught you, this is not as simplistic as creating a FOR loop and throwing in some IF statements. Lets take, for example:
You're creating software for a building with 1 million rooms. Each room has 1 light swtich in it. You need to create software that holds the states of every lightswitch in the building : ON, or OFF.
Lets say that at any time, you ned to traverse the list, and find the first ON swtich. What's the most memory efficent, and performance efficent manner to do this?
cs 101 : for(i=0;i< 1 million;i++) if(lightSwitches[i]==on) return i;
what works : Uhh, it works...
what doesn't : What if there's no ON element? we'll walk 1mm objects. That sucks.
Better idea?
cs 102 : Keep a structure of your data that allows you to traverse it in some sort of faster, more efficient manner.
what works : reduces time to find an object greatly (depending on the algorithm)
what doesnt : usually requires more memory to keep all the information for relationships. Also, takes more time to process the data in recursive systems. (this is usually offset by the speedup of faster traversal)
Better idea?
cs 103 : Keep a 2ndary listing of data that will allow you to immedialy identify the first of a node you're looking for. Think of it as a FIFO structure for your data.
what works : finding the first of a node, REALLY FAST.
what doesnt : more memory again
We can keep going on like this forever. Read any ALGORITHMS book by Knuth, and you'll see what i mean.
There are some things though, that programmers need to be aware of before just writing a simplistic walking algorithm.
1) Cache efficiency
Most PC programmers don't have to be aware of this, but console programmers do. Processing chips come with a temporary area of memory called a 'cache' that data is stored into upon access, and kept there until it's time to free it again. What the cache is for, is to improve seek and read times over long amounts of data access by frontloading all that time at first. Any subsiquent accesses check the cache first, before going back to main memory. When your controller goes to the cache to find a piece of data, and it's in the cache, this is called a 'cache hit' This is a good thing! reading memory from the cache is super fast! When you go to the cache for data, and it's not there, this is called a 'cache miss' THIS BAD! GROG HATE THIS! Why? Wellll, because of the time to access data from main memory is usually pretty slow, and that data has to be loaded into the cache. If your cache is full, you also incur the overhead of determining what cacheline to free, and can overwrite something that another function will need in a few milliseconds, thus incurring a cache hit there. (see my previous post about LRU/MRU efficiency for more on this) Planning for this occurance is actually not too difficult, you just need to be aware of your memory access patterns, and ensure that you're doing prefecting correctly.
Firstly, ensure that the data you're accessing can fit into a cacheline. On the 360, a cache line is 128 bytes. This means that you can load that data into the cache, and access all of it before incurring a cache miss again. So, if you have a structure that's 64bytes, you can only fit 2 of those in a cacheline before you get a miss. This degenerates REALLY quick when walking structures of these types. Just like the inibitions of your sister when she's out with that guy who has a fast car and 'nice hair.' slut....
There's some things that make sense about this, that can help.
1) Ensure your structures are aligned properly. Any sort of padding inserted into your structure will add un-needed bytes into the structure class. This takes up precious space in your cache, and can cause a great deal of problems. (I won't go into all the details on how to ensure alignment and padding, you can google that)
2) Try to keep your structures as small as possible. Infact, if needed, break down your structures into smaller data types. Think about 'HOT' and 'COLD' sets. That is, a HOT set is a group of data in your class that's accessed frequently. Say the state, or number of elements it contains. These things are constantly looked up in list walks. the COLD set is things that are only looked at every now and again. If you have a 64byte class, but 40bytes of it can be considered COLD, then it would help you to instead, place all the HOT data into your list, with linkage back to the main data container class. This way, you only have 24bytes per struct of data to queary as you walk the list, and can fit more of these data classes into memory.
3) Try to create 'Structures-of-Arrays' instead of 'Arrays-of-structures' This is just the next stage of (2) above. If your'e not accessing all the data in your HOT set at the same time (ie you'll only ever check the state, OR the number of containments, but rarely both at the same time) it would be smarter to make a list made entirely of STATE, and a seperate list entirely of NUMCONTAINERS. That way, you can fit even MORE information into a cacheline, and you're not wasting cycles moving your memory pointer around for data you don't want.
Of course these optomizations come with some engineering cost, and maybe a bit of overhead in memory for linkage or explicit defintion of workings of the function, but in general, these are the things that can GREATLY improve your coding speed.
2) Memory fragmentation.
True, this is more like a memory management issue, but it is important to know, none the less.
By now, everyone should know that dynamic runtime allocations are the DEVIL. Why? Well, every time you call 'new' the memory allocation function trots off into memory, finds a free spot, and gives you memory back that you asked for. Over time, this can cause lots of spaces to be freed, or used, giving the physical apperance of a cratered surface of the moon. What starts to occur, is that when you've got lots of little pieces of memory free everywhere, and you go to allocate a large block, well, the memory has to defragment in order to accomidate your request.
Not only that, but lets think about jumping around in memory. We already know that when you access spot X in memory, you can get X + 128 in the cache. Well, what if the next data element you're looking for isn't within the next 128 bytes? What if its 2 mb away in memory somewhere? Well, then you have to jump all the way there, cause a cache hit, and load the data. Memory seek times usually aren't that bad, but it can add up the more you start bouncing around. Lets say you've got a linked list, that will dynamically allocate over a number of frames. Well, you've got no insurance that those nodes are conguious in memory. And as you walk that list, you'll find yourself jumping around everywhere, causing cache misses, and incurring perf burden all over the place. This really comes to light in places like file IO, where you have to incurr seek time costs for every read you do off the disk. Especially if you're reading from slow-as-shit optical disks!
What's better? Known thine data! It is much better to allocate large blocks of memory up front, and manage them yourself over time, rather than constantly rely on dynamic allocations and potentials for memory fragmentation. There's some hueristics that go along with this, namely 'how much do i need to allocate before i run out of room, and how do i handle that?' But those are topics for a different post. What you need to know now, is that handling your own memory by hand is tricky, but can cause massive perf and mental wins over time.
Especially for console development, or large project settings. It gets a great deal easier when you know that your AI system will ONLY ever use memory between addresses 0x61227372 and 0x81227372. So, you know if for some reason your XML system is trying to read memory at location 0x71111111, you've got a problem (unless you're talking to the AI). Not only this, but it maks cleanup a great deal easier. Linked lists are the worst for this, because it takes a great deal longer to free tons of tiny blocks, than it does one big one.
3) PRECACHE
if you know your memory is conguious, and you know that you're potetially going to have to walk to the next 128byte block, go ahead and issue a precache command on the block to get all the pipes churning so the data will be loaded in there by the time you get there. The reason cache misses hurt so bad, is that the thread your on WAITS until the cache data gets loaded, then returns the data requested. If you precache, then the cache can start loading while your thread is doing it's dirty business. This is good shit.
4) PRECALCULATE
If, for some ungodly reason, you need to sample a calculated variable while you walk a list (say you need to find the smallest value of a grouping, that requires some math to be done every time its sampled) it's better to NOT do that every time you touch that array element. Instead, precaclulate that stuff as much as you can, even if it's only for this one frame. Remember, list walking is seldom ever a 1 time thing. Most of the time, you'll have to search the same list, a number of times, for different criteria, for different results. Having to caclulate that data every frame, wow, that can suck dick.
So, all in all, make sure that you keep some of these things in mind when designing youre classes and your list searching structures. Look around your code and you'll be suprised how many violations of these patterns you're doing. And as always, don't be afraid to profile your code. it's always good to know how fast an algorithm is in relation to how fast it could be with the proper attention.
~Main
9.01.2006
Subscribe to:
Post Comments (Atom)
0 comments:
Post a Comment