12.01.2008

Lets talk about assert

Gather around children, uncle Roach has something to teach you all.

The assert code path should be defined in non-assert code
Can someone tell me what's WRONG with the following chunk of code?

int customVector::getValueAtIndex(int idx)
{
   assert(idx > 0 && idx < mNumElements);
   return mValues[idx];
}


"Why no uncle Roach, that sure looks fine to me!"


Well it's not; Can you tell me what happens when that assert gets compiled out in a final build, and the index value is out of range? That's right children; Problems happen.


Firstly, we need to understand that an assert is a great debugging tool, because of it's usage and ability to alert the programmer to specific boundary cases of our coding patterns. Checking that a value is null is a great example:


void deleteThis(object* obj)
{
   assert(obj != NULL);
   customDelete(obj);//** edited.. see comment below
}

The real question here though, is what SHOULD the code do in the case that the assert doesn't exist? If you take that into account, than this is not a safe snippet of code; and definitely not something you'd like to ship in your $20million dollar project.The issue is that the code path to be handled when the assert is compiled out is missing; which is crucial, as you'll get a random crash in final builds, that is damn difficult to rack down if you don't properly check for this condition. Below is the proper snippet :


void deleteThis(object* obj)
{
   assert(obj != NULL);
   if(obj == NULL) return;
   customDelete(obj); //**edited.. See comment below
}


The rule of thumb here is that you should properly define the accurate code path to occur when asserts are removed.


Don't use an assert when you meant to use fail
Assert != fail. Let me just get that out in the open. When you want an ASSERT, you're generally looking for information about a specific conditional that can allow your application to hault operation in order to present some information to the user, before deciding if it would like to stop execution (fatal assert), continue on, or ignore that assert in the future.

When you want a FAIL though, you are looking that the code is in a state that things are generally wonky and recovery will be an issue. You'd generally like all the logging information and what-not that an assert has, but a fail is generally more of a hard-termination.

"Well golly uncle Roach, a fail sounds like a fatal assert to me. Why can't I just use assert and stop execution?"

Excellent question Cleetus. The problem is that as a programmer, you should be using the proper syntax for the proper result. Throwing assert as a catchall seems like a good idea, but quite frankly, there's positions where you WANT a failure to occur, so that you can accurately catch it.

A great example is loading data from disk :

void loadAssetArchiveFromDisk(const char* archiveName)
{
   const retCode = loadArchive(archiveName);
   assert(retCode == LOAD_OK); //Dirty disk error?
   if(retCode == LOAD_OK)
      OpenAndProcessArchive(archiveName);
}


Let's say the above code fails to load the archive. The assert fires, and we have the option to continue forward. But is that really what we want to occur? Let's say that this is a critical archive that contains all the strings for the entire game. Would we really want to gracefully bail out like this with a simple assert?


void loadAssetArchiveFromDisk(const char* archiveName, bool failOnNotLoad)
{
   const retCode = loadArchive(archiveName);
   if( failOnNotLoad && retCode != LOAD_OK)
      Fail("retCode != LOAD_OK");// NON RECOVERABLE

      OpenAndProcessArchive(archiveName);
}

Now the code will properly fail if the archive doesn't load (and it's a core archive). This allows you to properly call dirtyDiskError on the console, or whatever other handling you want. But not ASSERT!

So what?
The point here is that assert is a good idea, but it's sometimes used as a catch-all drunk drawer by programmers when it should really be used in proper places at proper times. It is not a silver bullet of debugging, but most programmers use it like it is.

Be careful though, because silver bullets can still shoot you in the foot.

~Main

** changed Originally from "delete obj" to "customDelete(obj)" in order to continue to provide my point without everyone getting caught up with the fact that "delete NULL" is valid in C++. Thank you for your feedback.

8 comments:

hcpizzi said...

I don't think the three examples are the same kind of error.

In the first two, you are checking for invalid inputs. By asserting, you are telling that nobody should call those functions with arguments that don't satisfy the asserted conditions. Doing so would be a code error, and the assert is there to point this out. The final code should trigger no assertions.

On the other hand, reading a file can fail, but that's not a code error: it's a data error. So a non-existent or unreadable or corrupt file is a valid input to the program (in that it it can happen in a bug-free program).

I don't think that testing and asserting should go together. If you are testing for invalid inputs, that makes that input valid by definition (it's just an special case, but valid).

Of course, the array that you are bound-testing may have been read from a file along with the indices. In this case I think the best is to test outside the vector functions, not into. Also this way you avoid testing the same conditions at multiple levels, and the upper the level you test, the better.

So basically my point of view is that you should assert on code errors, and check for data errors.

But, again, this is just my opinion. I'm sure many will disagree :)

Goosey said...

Gotta take issue with the sentiment of this post Roach.

I largely agree with what hcpizzi said: if it is a code error and not the result of bad user input, IE a genuine honest to god bug, the only sane thing to do is terminate immediately. The 'bad data read example' can occur without any code defects, making it an input error and necessitating some recovery scheme.. but the focus here is on asserts and bugs so back to that topic.

So first I have to take issue with the idea that assert == debug only == termination == second solution for release mode. We do not enter this world with a language provided assert keyword: it is a library element and therefore can be made to work how we need. When any other library component doesn't meet our needs we alter or replace it and so it shall be for assert.

So an assert can happen in release to help prevent a class of release-only heisenbugs. I like to have at least 3 build targets: debug, optimized-debug, and release. In this scheme the optimized debug has symbols and enables asserts and things, it just also uses the optimization settings that release does.

I also like to have an assert that executes even when most asserts are disabled (usually I see this called verify). This is invaluable to increase code readability when verifying a function that should succeed does (IE: GG_VERIFY(ThisShouldNotFailSoIReallyWantToKnowWhenItDoes() == GG_success);)

All this assumes that when an error happens you need to either: recover from it or stop IMMEDIATELY. Trying to continue in an invalid state is just madness: the problem will just resurface at a later point and be that much more difficult to track down. And yes I believe this also applies to shipped user code.

My personal code's assert has pluggable behavior: an assert in debug does trigger a breakpoint, but it also logs the failure with as much debug info as possible (stack trace, core dump, etc). In a unit test executable it is hijacked to alert the testing framework of failure so that other tests can run. If I was so inclined it's halt behavior could be disabled such to only log the error in some release build (I am not, though).

The fact is that if there is a bug in code then there is a bug. Users be damned: the error will be handled or cause an issue. The best we can do is be as aggressive as possible in ensuring these is none. DBC (heavy heavy assert usage) helps a lot here, but it is only one side of the coin. The other side is testing (preferably including automated testing in the form of both comprehensive unit tests and game wide functional tests) to ensure good code coverage.. that is to say that all valid code paths be exercised to be verified strictly with assetions of correctness.

Although I don't have it running in my code yet I am a huge fan of journalling systems as well. To me the ideal setup is that after every build a comprehensive set of tests is run, multiple instances of semi-random 'monkey' users are spun up, and they just hammer away at things on and on.. if an error happens the monkey halts immediately with a breakpoint in memory ready for me as well as a full journal I can replay in case it is a multi-frame bug. Add to this monkeys which replay previous journals and compare state data along the way to ensure new builds don't cause play throughs to diverge if they were not supposed to... Well then you have a lot of engineering effort expended to gain a lot of confidence on the correctness of your code. :)

MaciejS said...

In the first case, in release build you're most probably screwed anyway, if someone's trying to index out of bounds. It'll crash sooner or later. In the second case - it's perfectly legal to delete NULL ptr :)

Colt said...

"Doing so would be a code error, and the assert is there to point this out. The final code should trigger no assertions.....So basically my point of view is that you should assert on code errors, and check for data errors."

Your code should be safe in final builds for data checks w/o throwing an assert. At the same time, standard operating procedure of a programmer defines that you shouldn't be doing something like indexing out of bounds. Which means you'd like to be alerted of it in case someone IS indexing out of bounds. Simply checking for that problem, but not accurately alerting the user is asking for weird errors to propagate up the chain, as most people don't understand the proper default return values of functions when they access them. So IMHO it's better to alert the user of incorrect data referencing ASAP so we can get it fixed.

Colt said...

Re : Goosey

What you've said is directly the same thing I said in my post : Assert is a tool, and should be used differently depending on the situation. A fail is a fail, not an assert. And an assert is not a termination clause for the application (because you can ignore it) which means you need safety nets around your code.

I'm not sure why you're saying you "disagree;" We've said the same thing.

Colt said...

"it's perfectly legal to delete NULL ptr"

That doesn't mean it's perfectly legal to allow your program to attempt to delete memory that may already be freed. Your program flow should be safer & smarter about memory allocation, such that news & deletes are handled one step above the delete function, so the programmer knows exactly where there memory is going, and can easily look at the pattern to validate it.

~Main said...

"it's perfectly legal to delete NULL ptr"

Also, it should be noted that it's perfectly legal to override the delete keyword to be redirected to a custom memory manager. In which case, you're most likely going to check for null anyhow. So it's a moot point weather the iso standard for C++ allows a null pointer delete; my point is still that you should use assert when appropriate.

hcpizzi said...

"Your code should be safe in final builds for data checks w/o throwing an assert."

But, what does it mean to be safe? For example, what do you return when you access an array out of bounds? Should we allow the program to continue running returning some default value?