Archive for the ‘code’ Category.

Fibonacci Numbers and Bad Teachers

Recursion is a fascinating and essential idea in mathematics, programming and design (among other things.) As a programmer, if you don’t understand this very simple idea (and I mean really understand it) you are pretty much done for. While it’s not a hard concept for any half-intelligent person to grasp, I don’t exactly know how many of us really do grasp it. But it’s not what I’m going to bitch about today.

In programming courses, one of the first examples used to introduce the idea of recursion is the Fibonacci sequence. You know, the series of integers starting with 0 and 1, where every next number is the sum of the two previous ones. (OK, I know that you do know. I was just covering all my bases.)

Anyways, the most simple and elegant introductory way of implementing a function that returns the nth number in the Fibonacci series goes something like this:

1
2
3
4
5
6
7
// Just making things clear!
typedef unsigned long long Integer;
 
Integer Fib (unsigned n)
{
	return n < 2 ? n : Fib(n - 2) + Fib(n - 1);
}

And every half-descent programmer knows that that’s probably one of the worst ways you can implement the idea. I’m not kidding. The teachers just throw this at the students, and the brighter of the students see the elegance and simple beauty of it and it will take years for most of them to realize the problems with this particular implementation. Some of them never do. This implementation performs very poorly for even the small values of n, which of course should be apparent when you contemplate its time complexity and recursion tree (I won’t go into space considerations, because they are not much of a problem until n goes into (many) thousands, at which point the value of the function becomes larger than you can store naively in most languages, which means you should have already given space considerations a serious thought.)

But even later, when the students learn about complexity analysis and Big O notation and crap like that, more often than not, they fail to apply the newly acquired knowledge to the old beliefs. If the teacher is good, and the student is bright and lucky enough, they walk out of their algorithm and data structure course with the new belief that bubble sort is bad, hash tables are good, and disk seek times dominate everything else (all of which are obviously bullshit in the absolute sense.)

For the more astute reader, I need to clarify that the badness of the mentioned implementation stems from the fact that most common languages (including C++) are mainly side-effect-driven languages, in that they rely mostly on the side-effects of expressions and statements to get the intended job done (the usage of the term “statement” is a clear sign of that.) Otherwise, you wouldn’t see so much assignment in our code. A functional and side-effect-free programming language (or more accurately, programming model) could easily cache the result of each invocation of the Fib function for any particular n (because the function call would be without side effects and therefore time-invariant) and would not need to evaluate any Fib(n) more than once. This is something that is being done in some of the better implementations of functional languages, AFAIK.

To emulate this behavior in this case, one can use a hand-coded version of what is known as Memoization, a la:

1
2
3
4
5
6
7
8
9
10
11
12
13
Integer Fib (unsigned n)
{
	static std::vector<integer> Mem;
	if (n < Mem.size() && 0 != Mem[n]) return Mem[n];  // The second condition is redundant, right?
 
	Integer ret = n < 2 ? n : Fib(n - 1) + Fib(n - 2);
 
	if (n >= Mem.size()) Mem.resize (n + 1, 0);
	Mem[n] = ret;
 
	return ret;
}
</integer>

This is not a bad implementation. It does have some issues in multithreaded applications, but I'm willing to overlook that for now. This implementation is linear in both time and space, and it can be faster than simple non-recursive implementations because it never calculates any Fib(n) more than once over the whole lifetime of the process. But it's not pretty (it has its charms, though!) and this way of doing things is error-prone (because you are adding memory and bookkeeping to a mathematical construct that is otherwise free of this stuff.) Not to mention that it is basically impossible to use as an introductory example; because it needs knowledge from all over the map.

A better implementation might be:

1
2
3
4
5
6
7
8
9
10
11
12
13
std::pair<integer , Integer> _fib (unsigned n)
{
	if (n < 2) return std::make_pair(n, 0);
 
	std::pair<Integer, Integer> Fn_1 = _fib(n - 1);
	return std::make_pair(Fn_1.first + Fn_1.second, Fn_1.first);
}
 
Integer Fib (unsigned n)
{
	return _fib(n).first;
}
</integer>

Of course, the meat of this implementation (such as it is) is the _fib function. The other one is just there to present a nicer and more consistent interface. Let me be the first one to say that there are more ways to implement a better Fibonacci and still stay essentially recursive. This is just one way, and not a particularly great one. But it's not bad either.

Now, how come no programming instructor ever teaches the likes of this implementation? Not the ones I studied under, at least. Are they afraid that their students' heads are going to explode? PROGRAMMING IS HARD PEOPLE! If a programmer can't handle a substantial improvement over a bad implementation of a simple idea, they are in for a big (and not at all nice) surprise. I hereby beg any teachers, instructors, professors and whatnot that have anything to do with teaching starting programmers to quit treating them like idiots, because idiots won't make good programmers. The idiots are not going to understand what they need anyways, so why take the chance of profound understanding from those who have the capacity to learn, but might not if you treat them like babies?

For the sake of completeness, here's a non-recursive implementation:

1
2
3
4
5
6
7
8
9
10
11
12
13
Integer Fib (unsigned n)
{
	if (0 == n) return 0;
 
	Integer last = 1, penultimate = 0;
	for (unsigned i = 1; i < n; ++i)
	{
		penultimate += last;
		std::swap (last, penultimate);
	}
 
	return last;
}

This is the fastest of implementations offered here, with the exception of the memoizing one (and only in amortized sense.) But nothing prevents us from applying the memoization technique to an iterative implementation as well, albeit not in a general and automatic way. It might not be apparent or significant in such a simple sample, but for some class of problems, recursive solutions are usually easier to devise and understand, and therefore easier to get right. In any case, nothing is clear-cut when it comes to programming, and I'm almost offended when people (specially people who should know better) behave as if almost everything is.

VN:F [1.9.13_1145]
Rating: 8.0/10 (6 votes cast)
VN:F [1.9.13_1145]
Rating: +2 (from 4 votes)

“Planning for Debugging Day” Presentation

Here are my presentation slides from the talk I gave this past March in Sharif University, at the first Iranian computer game developer’s mini-conference.

Here I talk (mostly out of personal experience) about things that we should pay attention to, at the beginning of a game development project, so our debugging and issue tracking and solving experience becomes less irritating and more effective. I’d be happy to hear any suggestions, corrections and discussions.

Oh, and I have expanded the talk a bit, from what I actually presented! Also I’d be happy to explain my views on any of the particular points if anyone actually bothers to read through the thing.

VN:F [1.9.13_1145]
Rating: 8.2/10 (5 votes cast)
VN:F [1.9.13_1145]
Rating: +4 (from 6 votes)

Fun with C++: new, delete and Some of the Rest of the Story

Every C++ programmer knows new and delete and how they work. At least it must be so. I sure as hell didn’t know all the theory and detail behind C++ memory management facilities until 3-4 years ago, and I’m obviously still learning the practical details. So, please bear with me and see if there are things that you can learn about these old pals of ours, new and delete.

First, we all should know that new and delete are C++ operators, with all their facilities (and shortcomings, of course.) But not exactly like other operators, you can override them at a global level for every type that does not provide its own type-specific such operators. These global operators are provided as part of the C++ runtime library and are easily overridden. Their declarations are:

1
2
3
4
5
6
7
// The single-object versions
void * operator new (size_t mem_size);
void operator delete (void * mem_block);
 
// The array versions
void * operator new[] (size_t mem_size);
void operator delete[] (void * mem_block);

What new does is allocate a block of memory, and then call the constructor for the type with the address of the newly allocated block passed in as the this pointer. A delete call does the reverse; calling the destructor and then de-allocating the memory. The difference between the single-object versions and the array versions is only in the number of c’tor/d’tors they call. It amazes me how many C++ programmers don’t know and don’t care about details such as this (if you are programming in C++, these kind of details can and will bite you in the places you don’t want to be bitten!) If you fail to match them correctly, they allocate and de-allocate the correct amount of memory for the array or the single object, they just might not call constructors and destructors for all the objects being allocated or freed. That’s it.

Also, there is that small detail about exception-handling handling (yeah, two “handling”s!) The new operators may only throw an object of a sub-type of std::bad_alloc and only in the event that the requested amount of memory cannot be allocated. delete operators should never throw any exceptions (just like destructors. Remember that!) So, the correcter declaration for these operators would be:

1
2
3
4
5
6
7
// The single-object versions
void * operator new (size_t mem_size) throw (std::bad_alloc);
void operator delete (void * mem_block) throw ();
 
// The array versions
void * operator new[] (size_t mem_size) throw (std::bad_alloc);
void operator delete[] (void * mem_block) throw ();

Again, it is amazing how many programmers either don’t know about function exception specification and exception safety or just don’t use them (that includes me.) Of course, compiler providers are at least a little to blame here too. For example, Microsoft C++ compiler only distinguishes the empty exception list after a function declaration (meaning it doesn’t throw anything.) Anything else put there, just is taken to mean the default behavior is used (i.e. this function does throw something sometimes.)
In the meantime, the relationship between C++ programmers and exception handling remains in the love/hate/ignorance/hate/apathy/hate stage.

Later on, I’m going to talk abut overloading these global operators for fun and profit. Stay tuned! ;)

Obviously, a related problem to memory management is calling the c’tor and d’tor for an object directly. Uses for this may not be immediately apparent, but as a few examples I could name implementing good smart pointers, memory pools, memory managers, garbage collectors, generic object containers (e.g. std::vector) and such.
You probably already know how to call the destructor on an instance directly. If you have a pointer x to an object of type T, you can call its d’tor like this: x->~T() (note that you should not generally call the d’tor in this way, unless you yourself have called the c’tor directly on that instance as well.) Calling the constructor is a bit trickier though (not really; I’m just being foreboding!)

What you should realize is that you can overload new and delete with different signatures that the ones above. You can add arguments and of different types. There are a few other signatures for these two provided by the standard C++ library (yeah, there are others!) The less interesting of them are:

8
9
10
11
12
void * operator new (size_t mem_size, std::nothrow_t const & please_dont) throw ();
void operator delete (void * mem_block, std::nothrow_t const & please_dont) throw();
 
void * operator new[] (size_t mem_size, const std::nothrow_t & please_dont) throw ();
void operator delete[] (void * mem_block, std::nothrow_t const & please_dont) throw();

Forget about the deletes for a minute. The additional parameters to the new calls above are actually not used inside of the functions. Any object of type std::nothrow_t will suffice as the second parameter; it will be there just to signal the compiler to use this particular overload of the operator, which doesn’t throw any exceptions whatsoever. I just need to emphasize again that the regular new never returns a NULL pointer. It just throws an exception. But this one returns a 0 pointer upon failure and never throws anything, being it rocks, shoes or exceptions. The syntax for calling them, as you might suspect, is peculiar:

13
14
15
16
17
18
19
20
21
#include <new>  // for std::nothrow
//...
T * p = new (std::nothrow) T (/* usual constructor parameters */);
// The line above calls a particular "new" overload with two
// parameters: a size_t and a std::nothrow_t.
// Oh, and std::nothrow is just an object of type std::nothrow_t.
//... 
delete p;
</new>

Notice that I didn’t call the delete with any extra parameters or anything. In fact, there is no syntax in C++ for calling delete with any parameters! Besides, delete is already a non-throwing function. So what gives?! Why is there a paired delete for every frakking new when there is no frakking way of calling them?! You should keep your cool. I may explain them (if you don’t already know,) or we can leave the subject as an exercise. I would just say that the paired delete is called by the code generated by the compiler in a very specific situation.

Note that anything other than the straightforward, unary new and delete is called a “placement new/delete“. However, I’ve heard the term be used for a specific overload, which is more interesting and looks like this:

21
22
23
24
25
void * operator new (size_t mem_size, void * mem_ptr) throw ();
void operator delete (void * mem_block, void * mem_ptr) throw ();
 
void * operator new[] (size_t mem_size, void * mem_ptr) throw ();
void operator delete[] (void * mem_block, void * mem_ptr) throw ();

The implementations for the above operators are really simple. Here’s a complete listing:

26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
void * operator new (size_t mem_size, void * mem_ptr) throw ()
{
    return mem_ptr;
}
 
void operator delete (void * mem_block, void * mem_ptr) throw ()
{
}
 
void * operator new[] (size_t mem_size, void * mem_ptr) throw ()
{
    return mem_ptr;
}
 
void operator delete[] (void * mem_block, void * mem_ptr) throw ()
{
}

Note that although I haven’t written it, the constructor and destructor calls happen outside of my control. These versions of new and delete are used when we don’t want to allocate or free any memory, and just want the constructors and destructors to be called. For new, we just pass in a pointer to another sufficiently-sized memory location and ask the compiler to generate the code to call the c’tor upon that area of memory. That’s how we call a c’tor directly. We procure some memory area from somewhere and use that, like this:

43
44
45
T * x = (T *)malloc (10 * sizeof(T));
for (unsigned i = 0; i < 10; ++i)
    x[i] = new (x + i) T (/*usual c'tor params. */);

These standard placement operators cannot be hidden or overridden in user code, but there is still a ton of fun to be had.

You can very easily replace the old, simple and default operators with an implementation of yours, a la:

46
47
48
49
50
51
52
53
54
55
56
57
58
59
void * operator new (size_t mem_size)
{
    void * ret = malloc (mem_size);
    if (0 == ret)
        throw std::bad_alloc ();
    return ret;
}
 
void delete (void * mem_block)
{
    free (mem_block);
}
 
// the array versions are exactly the same

As I have stated earlier, the c'tor/d'tor calls are generated automatically for you by the compiler. So now you are free to write your own memory manager!

The way that memory manager/debugger/helper/whatevers usually work under the hood is that they allocate more memory than you have requested, and put their own junk right before and/or right after the area that gets passed back to the user (that's a bad way to do memory management, but that one is a long story.) Some of the stuff that are usually kept there include a pointer to the next and/or previous allocated block of memory (so all the blocks form a linked list,) sentinel values right before and right after the user area to detect buffer overruns (e.g. 0xdeadbabe,) the size of the memory block, the code file/line/function/module that allocated this particular block and so on and so forth. Actually, your default memory manager in the CRT is doing this right now. Just new two large-enough blocks of memory and compare their address differences with the size of the first block. The runtime accompanying some compilers (like VC++) even exposes their internal data structures and means to work with the memory manager (although rather passively.)

You need to keep in mind though, that what I have discussed so far barely scratches the surface of writing memory managers. These are just practicalities and implementation details; the state of the art on the theory of the matter and memory allocation algorithms, policies and mechanisms can fill several books. Even on the implementation side, there are really serious issues with performance, cache-friendliness, thread-safety, multiple thread support, etc. need taking care of.
Besides, much (if not most) of the memory (de)allocation going on in a C++ program these days go through C or operation system API, shared object files (DLLs,) through third part code or through STL, all of which bypass the basic technique discussed above. So, if you really are serious, you should investigate the existing memory debuggers or memory leak detectors or memory managers. There are several open source ones out there, with various degrees of sophistication and complexity. Have fun! (But for what it's worth, I should mention that we have used a memory leak detector using nothing but this technique in Garshasp and a similar project before, and in both projects it has been a great help.)

VN:F [1.9.13_1145]
Rating: 8.0/10 (11 votes cast)
VN:F [1.9.13_1145]
Rating: +3 (from 7 votes)

Most Important Factor

I hate to be just parroting XKCD, but I’m also unbelievably lazy and unimaginative so here it goes:

Always factor in the cost of the annihilation of mankind!

Just make sure you are having it minimize the cost!

Heed this warning Amiross!

VN:F [1.9.13_1145]
Rating: 9.4/10 (5 votes cast)
VN:F [1.9.13_1145]
Rating: 0 (from 0 votes)

Fun with C++: Singletons

I just realized that I’ve not posted anything technical in a long while. Here’s my effort to remedy the situation.
This is a modified version of the straightforward implementation of the “singleton” pattern (STFW yourself.)

Let me add this. I’m rather proud of this code, so I ask you please to give this code a read if you are even a little interested.

First the definition of the singleton (which I named Zingleton to be used in Zorvan) and the handle class:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
//==========================================================
 
#include <cassert>
 
//==========================================================
 
#define DEF_ZINGLETON(T)  \
        template<> T * Zingleton<t>::ms_theone = NULL; \
        template<> int Zingleton</t><t>::ms_refcount = 0
 
//==========================================================
 
template <typename T>
class ZingletonHandle
{
public :
  inline ZingletonHandle ();
  inline ~ZingletonHandle ();
 
  inline T const * operator -> () const {return m_ptr;}
  inline T * operator -> () {return m_ptr;}
 
  inline T const & operator * () const {return *m_ptr;}
  inline T & operator * () {return *m_ptr;}
 
protected :
  T * m_ptr;
};
 
//==========================================================
 
template </typename><typename T>
class Zingleton
{
  friend class ZingletonHandle<t>;
 
public :
  static ZingletonHandle</t><t> getZingleton ()
  {
    return ZingletonHandle</t><t>();
  }
 
protected :
  Zingleton ()
  {
    assert ((NULL == ms_theone && ms_refcount == 0) ||
	        (NULL != ms_theone && ms_refcount > 0));
    assert ((NULL == ms_theone && ms_refcount == 0));
 
    ms_theone = static_cast</t><t *>(this);
    ms_refcount = 1;
  }
 
  ~Zingleton ()
  {
    --ms_refcount;
 
    if (ms_refcount < = 0)
    {
      ms_theone = NULL;
      ms_refcount = 0;
    }
  }
 
private :
  static T * zingletonAcquire ()
  {
    if (NULL == ms_theone)
    {
      assert (0 == ms_refcount);
      zingletonCreate ();
    }
 
    assert (ms_refcount >= 0);
    ++ms_refcount;
 
    return ms_theone;
  }
 
  static void zingletonRelease (T const * instance)
  {
    assert (instance == ms_theone);
    if (instance != ms_theone)
      return;
 
    assert (NULL != ms_theone);
 
    --ms_refcount;
    assert (ms_refcount >= 0);
 
    if (ms_refcount < = 0)
      zingletonDestroy ();
  }
 
  static void zingletonCreate ()
  {
    assert (NULL == ms_theone);
    if (NULL == ms_theone)
    {
      ms_theone = new T ();
      ms_refcount = 0;
    }
  }
 
  static void zingletonDestroy ()
  {
    if (NULL == ms_theone)
      return;
 
    assert (0 == ms_refcount);
    delete ms_theone;
    ms_refcount = 0;
  }
 
protected :
  static T * ms_theone;
  static int ms_refcount;
};
 
//==========================================================
 
template<typename T>
inline ZingletonHandle</t><t>::ZingletonHandle ()
  : m_ptr (Zingleton</t><t>::zingletonAcquire())
{
  assert (NULL != m_ptr);
}
 
//----------------------------------------------------------
 
template<typename T>
inline ZingletonHandle<t>::~ZingletonHandle ()
{
  assert (NULL != m_ptr);
  Zingleton</t><t>::zingletonRelease (m_ptr);
}
 
//==========================================================
</t></typename></t></typename></t></cassert>

I think the code is clear enough, so I’m not gonna explain it any more! Let me just say that the handle class is there as a specific kind of smart pointer.
Here’s one way to use it:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
//==========================================================
 
#include <iostream>
 
//==========================================================
 
// Either include or paste the above code snippet here.
 
//==========================================================
 
class TestZ : Zingleton<testz>  // "CRTP"? Say what?
{
public :
  TestZ () {cout < < "+It's alive!" << endl;}
  ~TestZ () {cout << "+I'm dying here..." << endl;}
 
  void Print () {cout << "+Hello, world." << endl;}
};
 
//----------------------------------------------------------
 
DEF_ZINGLETON (TestZ);
 
//==========================================================
 
void use_zingleton ()
{
  cout << "-Start of use_zingleton()..." << endl;
 
  ZingletonHandle<TestZ> h;
  h->Print ();
 
  cout < < "-End of use_zingleton()." << endl;
}
 
//----------------------------------------------------------
 
int main ()
{
  cout << "-Start of main()..." << endl;
 
  {
    cout << "-Entered inner scope..." << endl;
 
    ZingletonHandle<TestZ> h;
    h->Print ();
 
    use_zingleton ();
 
    cout < < "-End of inner scope." << endl;
  }
 
  cout << "-End of main()." << endl;
  return 0;
}
 
//==========================================================

When (if) you run the above code, it should produce something like this:

1
2
3
4
5
6
7
8
9
10
-Start of main()...
-Entered inner scope...
+It's alive!
+Hello, world.
-Start of use_zingleton()...
+Hello, world.
-End of use_zingleton().
-End of inner scope.
+I'm dying here...
-End of main().

As you can see, only one instance of the TestZ class is created and it is destroyed only after all the handles are destroyed. Basically, the handles are objects that guarantee that the singleton is alive and valid in their lifetime.

Homework: There is at least one very serious bug in the above code (that I’m unaware of.) What is it? It’s not the thread-safety issues because I’m aware of that and may fix and post again (that’s a great opportunity to talk about policy-based design in C++, so it demands another post or two.

VN:F [1.9.13_1145]
Rating: 10.0/10 (7 votes cast)
VN:F [1.9.13_1145]
Rating: 0 (from 0 votes)

Lisp in Python

This is definitely the coolest programming related text I’ve read in a long while: Church Encoding in Python. I even almost understood all of it! :D
I saw this on the Planet Python feed aggregation(?) (it is the second entry right now) which I got introduced to today.
But seriously, how cool is the CONS construction and number encoding stuff?!

VN:F [1.9.13_1145]
Rating: 8.5/10 (6 votes cast)
VN:F [1.9.13_1145]
Rating: 0 (from 0 votes)

Fun with C++: Useless Consts?

Many C++ programmers see the keyword “const” as something redundant that only gets in the way. They curse Stroustrup and his ancestors each time they encounter a const-related compile error because of that view and fail to use it correctly quite often (note that I’m not saying that I’m not one of these people, but I curse neither Stroustrup nor anyone else.)

If you ask them about the purpose of the const keyword, they would give you the classic answer that any C++ textbook gives: variables declared as consts cannot be changed by the programmer, and const class methods cannot modify their instance data (and can be called on const class instances), etc.

The programmer may give you the textbook answer (or some mutilated version of it, depending on which textbook she read and how much of it she actually remembers,) but she is wondering with herself who will stop the other programmer to change the header file and remove the const from the declaration? Who is there to prevent her from doing so?!

She may even make a conjecture that if she removes all the consts from all her source files and headers (including standard and 3rd party headers she uses,) then her programs will continue to compile and work just fine (after all, the const keyword has no runtime effect. It’s an exclusively compile-time construct, is it not? So if her programs used to work, and they compile after removing all consts, they should continue to run without problem.)

If she is a little bit more sophisticated and skilled in C++, she knows that the C++ compilers differentiates between functions that take const arguments and those that take exactly the same arguments, but not const. This fact lets you have function overloads that only differ in constness of their arguments. The same is also true for template specializations (which is a form of overload anyway.)

So this more sophisticated C++ programmer will conjecture that if she removes all the consts and eliminates the redundant overloads (which were nothing but nuisances to begin with) then her programs will compile.

Now, is she right? Is this the case with const?
(I should say that I know of one case that is not so (the program will not compile without the the const) but it’s not a common case. I’ll write about that in a later post (if no one guesses it first.) I’m genuinely interested in other such cases.)
(Another note. The use of const frees the compiler to do some optimizations that may not be safe with a non-const object. My question is about correctness and not about performance.)

VN:F [1.9.13_1145]
Rating: 8.2/10 (5 votes cast)
VN:F [1.9.13_1145]
Rating: 0 (from 0 votes)

Fun with C++: When Parentheses Don’t Call!

Just a quick one. Most of us have come to think of parentheses as sometimes necessary but always harmless little creatures. Our view is that unless parentheses change the order of evaluation (and therefore the meaning) of an expression, they are completely harmless.
Well, while that is true almost by definition, there are cases that we might not realize the meaning of an expression is being changed. One such case is in the presence of the versatile and extremely useful (and possibly massively pain inducing) comma operator (‘,‘.)
You see, the comma has two uses in C++ language: as an operator and as a separator for many different lists of things in the language grammar (function arguments, object declaration, etc.) In the first context, a pair of parentheses work exactly as anyone would expect; in the second, probably not. The problem is that the use of commas in the first context is not common at all, while the commas of second kind riddle any and every C++ source file. It’s mixing the meaning of the uses that’ll lead to potentially cryptic compile errors (what a shocker! Cryptic compile error messages in C++?!), or worse, logical bugs that will make you wish you were coding in machine language.

Just for the sake of comprehensiveness, while I know that you all know how the comma operator operates, I’m gonna describe its workings anyway. The comma operator has the lowest precedence among all the operators in C++ and it is a left-associative operator (meaning “a, b, c” will be evaluated as “(a, b), c“.)
An expression of the form “a, b” (where a and b are expressions themselves (and not statements)) will always evaluate a first, and then b and will return the evaluated value of b.
Please believe me when I say that overloading this operator can achieve magic and convenience one can only dream of in C++! Just see the Boost.Assignment library for a sample.

One place that the implicit mixing up of the meaning of comma could cause trouble is in function calls. Suppose that a function X accepts two arguments. Now, if instead of X (a, b) someone writes X ((a, b)) then they will be given a compile error saying that function X does not take one argument(s);-) . This is because the extra pair of parens have changed the meaning of the comma from a simple argument list separator to a C++ operator.
This will not confuse any C++ programmer out of her first few months with the language, and will be obvious and easy to find and fix. But what if the extra pair of parentheses is the result of a macro expansion? Then you’ll have to chase down the macro definition. Still not very hard, because the compiler will catch you. But what happens if the X function has an overload with a single argument, incidentally with a type compatible with b? What happens if the function takes more than two arguments and has many overloads and it’s called like this: X (a, (b, c), d, (e, f, g))?
I should probably list this as another macro pitfall. Take care where you put parentheses! Don’t just slap them whenever you’re in doubt!

VN:F [1.9.13_1145]
Rating: 8.2/10 (5 votes cast)
VN:F [1.9.13_1145]
Rating: 0 (from 0 votes)

Fun with C++: Metaprogramming, Part 1

Metaprogramming can mean many things, but all of them can be made to (loosely) fit under the description of “writing programs that write or rather describe (a class of) other programs.” A classic and somewhat useless sample of this is writing a program that prints it’s own source code. My first real contact with this field came from working with YACC. The rule file one writes to be fed to YACC, is kind of a metaprogram.
As a better example, if you could write a program that would accept a set of strings and generate a C++ program that would implement a “perfect hash function” that would hash those N input strings onto N collision-less values, then you’d have written a metaprogram. (Note that you are not implementing a hash function; you are writing a program that implements many hash functions based on the input.)

Now, these metaprograms can be in any language and output any language. If you write one in C++ that produces C++, then you have to run your compiler twice to get to the final usable program. Not really convenient, but it is used in the real world. For example, that perfect hash function generator example is exactly how gperf works.
On a cooler side, there’s already a couple of other mini-languages that any C++ compiler package can understand, e.g. the preprocessor language. I’m the first one to admit that writing a program only in preprocessor is like having trying to open a locked door with your head, but it’s not entirely without merits and fantastic experiences.

OK, so let’s get down to it. Before we can program in any language, we should know how some basic ideas like logic, conditions, repetition, etc. are expressed in that language. First off, doing loops a la “for” and gang is really really hard to do in preprocessor (I’d say impossible, but I’ve seen Boost.Preprocessor!) Fortunately, C++ preprocessor language does support a form of recursion (guess how?) and we can use that facility both as loops and as subroutines.
If you think about it, you’ll find out that the preprocessor’s programming model is not unlike digital logic circuits, with all the variables, inputs and outputs being single bits (whether a specific macro has been #defined or not) and the only operations being ANDs, ORs and NOTs. Although the preprocessor can assign arbitrary integers to macros and can compare them, but cannot do arithmetic on them.
As a first example, let’s see how an XOR logical function with two one-bit inputs can be implemented.

1
2
3
4
5
#if (defined(IN_a) && !defined(IN_b)) || (!defined(IN_a) && defined(IN_b))
    #define OUT_a
#else
    #undef OUT_a
#endif

That’s it. Not very complex or revolutionary, but this idea can be taken far. What if we put the above code in a header file, and #include that header whenever and wherever we need a 1-bit XOR? Of course, the parameters to that XOR are always the same (think of them as global variables) but this concept can be use to implement functions and therefore recursion. I’m not going to go into more elaborate examples of recursion. Just let me demonstrate a little more of the preprocessor with another example that actually does something and generates output. I’ve implemented a 1-but full-adder (three inputs: A, B and Carry, and two outputs: A and Carry. The input carry is for reusability.) This example consists of two files; a “OneBitAdder.h” like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
/*
	OneBitAdder.h
 
	Inputs: "IN_X" and "IN_Y" and "IN_C"
	Outputs: "OUT_X" and "OUT_C"
 
	Returns the sum of the inputs. Note that each input and
	output here is one bit: either a macro is defined (it's 1)
	or not (it's 0.)
*/
 
  #if !defined(IN_X) && !defined(IN_Y) && !defined(IN_C)	// 0
	#undef OUT_X
	#undef OUT_C
#elif  defined(IN_X) &&  defined(IN_Y) &&  defined(IN_C)	// 3
	#define OUT_X
	#define OUT_C
#elif  defined(IN_X) && !defined(IN_Y) && !defined(IN_C)	// 1
	#define OUT_X
	#undef OUT_C
#elif !defined(IN_X) &&  defined(IN_Y) && !defined(IN_C)	// 1
	#define OUT_X
	#undef OUT_C
#elif !defined(IN_X) && !defined(IN_Y) &&  defined(IN_C)	// 1
	#define OUT_X
	#undef OUT_C
#else						// 2
	#undef OUT_X
	#define OUT_C
#endif

and the “PreprocessorAdderTest.cpp“:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// Set input
#define IN_X
#undef IN_Y
#define IN_C
 
// Compute
#include "OneBitAdder.h"
 
// Display(!) output
#  if !defined(OUT_X) && !defined(OUT_C)
	#pragma message ("--- The answer is 0")
#elif  defined(OUT_X) && !defined(OUT_C)
	#pragma message ("--- The answer is 1")
#elif !defined(OUT_X) &&  defined(OUT_C)
	#pragma message ("--- The answer is 2")
#elif  defined(OUT_X) &&  defined(OUT_C)
	#pragma message ("--- The answer is 3")
#endif	
 
// main is mainly cosmetic.
int main () {return 0;}

I’ve used #pragma message for output here. I’m not sure which compilers support it (I suspect all that matter) but if you want you can use other methods, like using #error.
Create these two files and compile the “.cpp” file (note: just compile it.) There should be a string “--- The answer is 2” in your compiler output! See what we did? We just run a program by compiling a C++ source file! Cool, eh?

Of course, getting any real use out of the preprocessor (beside the obvious) is really is not worth the effort, but if you want to see whether you have the hang of it, try writing a 2-bit adder using the above 1-bit adder. It’s not as easy (nor as hard) as it may sound. The samples here were not really good examples for metaprogramming, but I think you can extrapolate from here on if you want to.

I originally (i.e when I started writing this post!) wanted to write about template metaprogramming, but I went in a completely different direction. Maybe next time then.

VN:F [1.9.13_1145]
Rating: 8.2/10 (5 votes cast)
VN:F [1.9.13_1145]
Rating: 0 (from 0 votes)

Fun with C++: Macro Pitfalls, Batch 1!

(UPDATE: Added the 6th item.)

I’m a fan of C/C++ preprocessor macros. I would be very happy if C++ received macros on par in power with those of Common Lisp macros, but that’s only a dream! We have to work with what we have.
Anyways, with use of macros come a lot of potential pitfalls. I’m going to go over a couple of them. I’m sure any C++ programmer is aware of these, but the reiteration would hurt now, would it? I’m also hoping that with your feedback, we can complete these into something referable and usable.

  1. Operator Mishaps From Hell!
    This is a classic example.

    #define SQR(x)    x * x

    Imagine what happens when you use something like SQR(a + b). But as I said, this is a classic example, and any C programmer with half a brain should know enough to avoid this (doesn’t mean it always happens though. We all have our shameful secrets!)
    The moral is: always enclose every usage of macro parameters in parentheses as well as the whole thing. (#define SQR(x) ((x) * (x)))

  2. The Space From Hell!
    #define SQR (x)   ((x) * (x))

    Notice the space between SQR and the (x)? The programmer didn’t! It usually results in a cryptic compile error about something not being a function or about incorrect number of arguments. Easy for more experienced programmers to understand and rectify (usually with a bang to their heads.)
    Also, it’s not hard to imagine cases without any compile errors and severe oddities in runtime behavior. Remember: Compile-time errors and warnings are your friends as long as you pay attention to them! They might be a little hard to understand, but a project with 100 pain-in-the-neck compile errors is better than 100 hidden runtime bugs just waiting to cut your neck off.

  3. The Semicolon From Hell!
    Let me demonstrate the perils of the semicolon.

    // A macro to increment a counter, and wrap around to zero is a limit is reached.
    #define ADVANCE(x, wrap)    ((x) = ((x)+1)%(wrap));
     
    // Usage 1
    while (should_go_on)
        ADVANCE(i, 100);
     
    // Usage 2
    for ( ; should_go_on; ADVANCE(i, 100))
    {}
     
    // Usage 3
    while (should_go_on)
        p = ADVANCE(i, 100) + 100;

    As long as you stick to the first usage (i.e. as a statement,) you get away with your mistake (the spurious semicolon at the end of the macro definition.) However, usage 2 won’t compile. But this is a good thing, since you will know that you have an error in your macro (of course, if you’re a newbie in C++, it may take you quite a few minutes to figure this one out.)
    Usage 3 is where things get interesting. The code will compile (at most with a mysterious and seemingly unrelated warning,) but it won’t do what you think it does. That can be a bitch to track down.
    So, the moral of this story is: don’t put semicolons in macros!

  4. The Multiple Evaluations From Hell!
    #define SQR(x)    ((x) * (x))

    What happens when you evaluate the value of SQR(++a)? Does that seem alright to you? (By the way, any one who tells you a += b is equivalent with a = a + b is either a moron, or thinks you are a moron!)
    The lesson of this item is: don’t use macros as fast, generic functions! Use inline templates (which are even potentially faster (do you know when? :D )) Like this:

    template <typename T>
    inline const T & sqr (const T & x) {return x * x;}
    </typename>
  5. The Multiple Statements From Hell!
    The following macro deletes a pointer and sets it to 0, a very sensible thing to do nearly all the time.

    // This is not OK. Don't use this macro like this. Keep reading down this article!
    // Also note that I've not placed a semicolon at the end.
    #define SAFE_DELETE(ptr)  delete (x); (x) = 0
     
    // Usage 1
    char * s = new char [42 + 1];
    ...
    SAFE_DELETE(s);
     
    // Usage 2
    if (some_condition) SAFE_DELETE(s);

    But it doesn’t work, does it? I mean, usage 1 is OK, but you’ll wish you were never born when you have to stay up late into the night and debug the results of the second usage, won’t you? So, what do we do? I imagine everybody’s first response would be:

    // This is not OK, still.
    #define SAFE_DELETE(ptr)  {delete (x); (x) = 0;}
     
    // Usage 3
    if (some_condition)
        SAFE_DELETE(s);
    else
        doSomeWork (s);

    See the problem? No? The problem is in the semicolon before else. C++ doesn’t let you put a semicolon after an closing curly brace (‘}’) and before “else”. So what do we do? This is one of the solutions:

    // This is almost OK, and usable if you are not very picky.
    #define SAFE_DELETE(ptr)  do { delete (x); (x) = 0; } while (false)

    That’s right! We put the multiple statements inside a do/while block that is guaranteed to be executed only once! Brilliant, isn’t it? Well, I didn’t come up with it. I wish I could say that I though of it independently, but that’s not the case. I saw it in kernel code even before I had run into this problem! But the code didn’t say why it was used this way, and analyzing and discovering the purpose of this small piece of code was one of my biggest “Aha!” moments in C programming.
    So, we are good to go, right? Well, not quite. If you don’t care much about warnings the compiler gives off, you are good to go. But the above code results in a warning (only when you have turned the compiler warning level to the highest degree,) that the used condition is constant. Although the occasions for this warning are intentional most of the time (like here,) it may be the case that the compiler is warning us about a typo. Because of this, I don’t want to disable this warning (unlike those blasted 4996 warnings Visual C++ gives off like candy on Halloween, which I shut off with a wide and hysterical grin, while caressing my ax handle!)
    So what do we do? The best solution that occurs to me is like this:

    // We define a function that always returns false
    namespace __useless {
    inline bool i_am_false () {return false;}
    }
     
    // Now we use the above function instead of hard-coding the constant expression
    #define SAFE_DELETE(ptr)  do { delete (x); (x) = 0; } while (__useless::i_am_false())

    We trick the compiler here. The compiler is wise enough (or not smart enough) to give a warning no more, but any optimizer would optimize the overhead of the function call away completely.

  6. The “if” From Hell!
    Suppose we “optimize” the above SAFE_DELETE macro, to check whether the argument is NULL, and do nothing in that case.

    #define SAFE_DELETE(ptr)    if(ptr) delete (ptr), ((ptr) = 0)
    // Note that I've incorporated the comments from Hadi (comment 2) here, to make a point.
     
    // Usage 1
    if (some_condition)
        SAFE_DELETE (s);
    else
        return 0;

    Ouch! That hurts. I don’t think there’s anyone who doesn’t know this already, but in case there is (almost no shame in not knowing,) the “else” in the first usage will be the else clause for the “if” inside the macro! You are going to hate Dennis Ritchie after the first two hours of bewilderment over this (but of course, that would be out of frustration, and without logical cause!)
    You may be tempted to solve the problem like this:

    #define SAFE_DELETE(ptr)    if(ptr) {delete (ptr); (ptr) = 0;} else {}

    But don’t! Do this:

    #define SAFE_DELETE(ptr)    do { if(ptr) {delete (ptr); (ptr) = 0;} } while (__uselss::i_am_false())

    The moral? If you write “if” clauses in macros, be extra careful.

The above were only some of the pitfalls, but I can’t continue right now. Looking forward to your suggestions. Don’t be shy!

VN:F [1.9.13_1145]
Rating: 9.3/10 (6 votes cast)
VN:F [1.9.13_1145]
Rating: -1 (from 1 vote)

KOPCS Distribution 1371

This is the long overdue package (and “ymeta” file,) containing KOPCS source code (verion 1.0.0 (codenamed Rincewind) build 1371.) The package also contains pre-built binaries mostly for Windows, but “Zero” is provided for 32-bit x86 Linux as well.
The package contains KOPCS (plus less-than-minimal documentation,) JFKOPCS scripts, yCrypto (plus documentation,) Zero, the web page templates, some tests and VS2005 project files and Makefiles.
There’s no guarantee that anything in it works or even compiles. Do not trust anything you hear or receive from me! The only external dependency is MySQL (preferably 4.x or 5.x, but 3.2x may work) to build and run and Apache 2.0.x or 2.2.x to run it.
There’s no setup guide whatsoever. Good luck!
BTW, I’m going to setup a subversion repository for KOPCS soon. The only problem I have is migrating from the existing local CVS (on Mike) to a remote SVN on yaserzt.com.
UPDATE 2007-11-04: The authors of KOPCS are (in no particular order) Ehsan Adeli Mosabbeb, B. Maryam Elahi and myself. The source code has been released with their expressed permission. An implementation of the SHA-256 in JavaScript language by Angel Marin has been used. Please refer to the file “webfiles/media/sha256.js” in the distro for more information.
I guess a quick note about licensing is in order. All the work done for KOPCS by the above-mentioned authors are released under the terms of the Lesser General Public License version 3 (a.k.a. LGPLv3) with the exception of the IAUM-CCC website templates and the JavaScript code used in it, which are under the terms of the General Public License version 3. The website templates include and is limited to any file with the extension of “.htmltemp”, “.js”, “.css”, “.png” and “.ico”. Note that KOPCS is released AS IS in the minuscule hope that it would be useful, but without any guarantee (even implied) that it will be suitable for any purpose. That means, if KOPCS destroyed your hard drives or burned down your house, it’s your fault not ours!

VN:F [1.9.13_1145]
Rating: 8.2/10 (5 votes cast)
VN:F [1.9.13_1145]
Rating: +1 (from 1 vote)

C++: Fun With Comments

There’s much fun to be had with C++ comments, and a lot of seriously usable tricks there as well. Of course, most of programmers already know all I could tell about the subject, but nonetheless, I think I’ll take a shot at bringing all the techniques I think might be useful together, and I might be able to use a few possible tricks that a possible visitor would possibly leave on this post!
The first thing any programmer has to realize about comments is that (as their name suggests)

Comments are not the code that the compiler doesn’t see, but the sentences that a human would read.

Write comments (and code in general) for another human (or you yourself) to read. This generally means you should generally avoid too clever constructs that have no other advantage. I say it again: Write code for other humans to read, not the compiler to compile.
You can use special comments and tools like doxygen to generate useful, beautiful, always-up-to-date and comprehensive documentation of your source codes automatically, with little extra work (e.g. with putting a line with three slashes (“///“) at the beginning and describing the purpose of a function before its prototype.) These kinds of tools make you appreciate the value of comments!
My first advice is never use “//” or “/*” to comment out a piece of code. Those characters are reserved for “comments”, i.e for writing descriptions of your code! Always put another character (or sequence of characters) after them, and use different characters to distinguish the different situations in which you’ve commented out the code. For example, use “//0” and “/*0” for code that you suspect is incorrect, use “//1” and “/*1” for test code, and so on and so forth. Keep in mind that these are only suggestions. Any system you opt to use, must be documented somewhere. Write 10 lines to describe the meaning and connotation of the 10 different types of comments you use. Don’t take this lightly! As time goes by and you write more code, you’ll accumulate these rules and split or merge them, but your commenting rules become more comprehensive, more concise and more useful.
Anyway, suppose you want to comment a multi-line piece of code. Always put the comment openning and closing on a line by themselves. No code on those lines! Also, use “//*/” for the closing marker (instead of just “*/“.) This way, you can uncomment and recomment that part with only a single character; a ‘/‘ that you add to or remove from the beginning of the openning marker. Look at snippets 1 and 2.
There are times that you want to switch between commenting two parts of a single line. In these situations, you can put “/**” before the first part, put “/*/” between the two parts and “/**/” after the second part. Like snippets 3 and 4, you can then switch between the first part being commented out and the second one by adding a single ‘/‘ after the comment before the first part. (Also note the parsing/formatting bug of jEdit on line 18!)
You can also do it in the slightly prettier way that snippets 5 and 6 demonstrate (I really think so!) but only if you want to switch between two multi-line segments of code.
Commenting large blocks with “/*” and “*/” has a major problem. The commented blocks don’t nest. For this reason, I suggest the cleaner method of using “#if 0“, “#else” and “#endif” blocks (snippets 7 and 8, but they lack the coloring.) This way, not only you can nest commented out blocks inside of each other and enable/disable them individually just changing the ‘0‘ to ‘1‘ (provided the outer blocks are not disabled) but also you can comment the commented-outness(!) of these peices, and you can use preprocessor macros instead of just ‘0‘ to be able to switch these segments on and off from elsewhere. The only problem is that many editors and IDEs don’t process preprocessor directives and they won’t colorify the disabled parts as inactive.
In your “comment” comments, you can use well-recognized keywords like “NOTE“, “WARNING“, “TODO” and “FIXME” to make their meanings more apparent and to searching for them easier. Just remember to use them neatly and consistently.
If you have more ideas for using comments, I’d be happy to hear about them and to include them here. By the way, anybody has a clean and pretty way of switching (commenting/uncommenting) among three or more segments of code?

   1://{Snippet 1}
   2:/*0
   3:    for (unsigned i = 0; i < v.size(); ++i)
   4:        swap (v[i], v[v.size() - 1 - i]);
   5://*/
   6:
   7://{Snippet 2}
   8://*0
   9:    for (unsigned i = 0; i < v.size(); ++i)
  10:        swap (v[i], v[v.size() - 1 - i]);
  11://*/
  12:
  13://{Snippet 3}
  14:    for (unsigned i = 0; i < /**/ v.size() /*/ v.size() / 2 /**/; ++i)
  15:        swap (v[i], v[v.size() - 1 - i]);
  16:        
  17://{Snippet 4}
  18:    for (unsigned i = 0; i < /** v.size() /*/ v.size() / 2 /**/; ++i)
  19:        swap (v[i], v[v.size() - 1 - i]);
  20:        
  21://{Snippet 5}
  22:/*0
  23:    for (unsigned i = 0; i < v.size(); ++i)
  24:        swap (v[i], v[v.size() - 1 - i]);
  25:/*/
  26:    for (unsigned i = 0; i < v.size() / 2; ++i)
  27:        swap (v[i], v[v.size() - 1 - i]);
  28://*/
  29:        
  30://{Snippet 6}
  31://*0
  32:    for (unsigned i = 0; i < v.size(); ++i)
  33:        swap (v[i], v[v.size() - 1 - i]);
  34:/*/
  35:    for (unsigned i = 0; i < v.size() / 2; ++i)
  36:        swap (v[i], v[v.size() - 1 - i]);
  37://*/
  38:        
  39://{Snippet 7}
  40:#if 0
  41:    for (unsigned i = 0; i < v.size(); ++i)
  42:        swap (v[i], v[v.size() - 1 - i]);
  43:#else
  44:    for (unsigned i = 0; i < v.size() / 2; ++i)
  45:        swap (v[i], v[v.size() - 1 - i]);
  46:#endif
  47:        
  48://{Snippet 8}
  49:#if 1
  50:    for (unsigned i = 0; i < v.size(); ++i)
  51:        swap (v[i], v[v.size() - 1 - i]);
  52:#else
  53:    for (unsigned i = 0; i < v.size() / 2; ++i)
  54:        swap (v[i], v[v.size() - 1 - i]);
  55:#endif

UPDATE:I was wrong about the parsing/formatting bug in jEdit I mentioned above. That’s a Doxygen comment and jEdit handles it correctly. Oops!

VN:F [1.9.13_1145]
Rating: 7.8/10 (4 votes cast)
VN:F [1.9.13_1145]
Rating: 0 (from 0 votes)

A Breach in the Hull

The Convex Hull code I posted earlier on this blog has a bug. When the points are all have the same X component, the implementation will misbehave. I may post a fix later; I’m off to work now, but I had meant to write and mention this for so long that when I remembered the matter in the shower I didn’t postpone the writing in fear of forgetting it once more.

VN:F [1.9.13_1145]
Rating: 7.8/10 (4 votes cast)
VN:F [1.9.13_1145]
Rating: 0 (from 0 votes)