شرمِ مردبودن

 

از یک موجود مذکر، به زن‌های کشورم،

از روی شما شرمنده‌ام. به خاطر نگاه‌های هرزه در کوچه و خیابان از روی شما شرمنده‌ام. به خاطر سوت و متلک‌های گستاخانه از روی شما شرمنده‌ام. به خاطر ناامنی در محل کار از روی شما شرمنده‌ام. به خاطر ناامنی در خانه و شهر از روی شما شرمنده‌ام. به خاطر بد نشستنم در تاکسی از روی شما شرمنده‌ام. به خاطر تلفن‌ها و پیغام‌های ناخواسته از روی شما شرمنده‌ام. به خاطر پیشنهادهای دعوت نشده از روی شما شرمنده‌ام. به خاطر ترمز و بوق‌های پیاپی از روی شما شرمنده‌ام. به خاطر پوزخندهای معنی‌دار از روی شما شرمنده‌ام. به خاطر باور نهادینه‌ام به نابرابری جنسی از روی شما شرمنده‌ام. به خاطر اعتقاد راسخم به فروتر بودن زن‌ها از روی شما شرمنده‌ام. به خاطر تنفرم از جنس مؤنث به دلیل سرخوردگی‌های جنسی‌ام از روی شما شرمنده‌ام. به خاطر خشونت‌های خانگی و خانوادگی از روی شما شرمنده‌ام. به خاطر سکوتم در برابر قوانین ناعادلانه از روی شما شرمنده‌ام. به خاطر دنباله‌روی بی‌فکرم از یک طرزفکر زن‌ستیز از روی شما شرمنده‌ام. به خاطر خاموشیم وقتی هر زنی که مطابق میل ما رفتار نکند جنده خوانده می‌شود از روی شما شرمنده‌ام. به خاطر مذکر بودن و انسان نبودنم از روی شما شرمنده‌ام. به خاطر جنسیت‌پرستی سیستماتیک از روی شما شرمنده‌ام.

از روی شما شرمنده‌ام که ظلم هرروزه و آشکار به نیمی از مردم کشورم را می‌بینم و به عادت همیشه هیچ نمی‌گویم و نمی‌کنم. خواهش می‌کنم من و امثال من را نبخشید. ما را نبخشید و حق خود را از ما بگیرید. هیچ‌کس را نبخشید. این حماقت و کثافت اجتماعی را تحمل نکنید. بدانید که من حق شما را به شما نخواهم داد. خود را با طرز فکر من هماهنگ نکنید. بشورید و سرکش شوید و بیاموزید و بدانید و مستقل باشید و به‌پاخیزید و من را با خود و خود را با من برابر کنید.

به خاطر مذکر بودنم و انسان نبودنم شرمنده‌ام.

VN:F [1.9.22_1171]
Rating: 10.0/10 (3 votes cast)
VN:F [1.9.22_1171]
Rating: +3 (from 3 votes)

PS4 Unveiling

My impressions of the last-night unveiling of the PS4:

  1. Glad they didn’t spend any time on Vita, LBP, Move, 3D, Last Guardian, Kaz Hirai, …
  2. Glad they had only some bullshit in there. I guess they controlled themselves as much as they could!
  3. Delighted that the didn’t try to come up with a “cooler” name than PS4!
  4. Ecstatic that it’s basically a PC. And an integrated high-end CPU-GPU chip to boot! With unified memory for system and graphics!
  5. Glad that there is dedicated hardware in there to run the OS and to get the video compression (for video sharing, I guess) and background downloading and shit out of the way of the actual game! Who wants to lose a CPU core to that stuff?!
  6. Curios about how they’re going to support the indies… Will that mean the system can act as a devkit, maybe with a software upgrade? (doubt it.) Will that mean tools and SDK will be easily available? (doubt it.) What did they mean?
  7. Where the fuck where the fucking numbers?!?! I want more than just 8 cores, 8 GiB!
  8. Very disappointed that Naughty Dog and Santa Monica Studios where absent.
  9. Very underwhelmed about the announced first-party games. They definitely needed shock and awe! They needed to announce something on the scale of “Last of Us”.
  10. Very disappointed that we neither got to see the actual console nor a firm release date was announced!
  11. Quite impressed with their dedication to a strong support/content network.
  12. Blizzard, you traitorous bastard! (“Ha ha, only serious!”)
  13. Since I won’t be using any of the “social” features of the new console (for various reasons,) I don’t have a strong opinion there. Seemed “ubiquitous but not intrusive”, which is good.
  14. Mildly surprised that they didn’t try to push 4K and 8K resolutions! Probably because they know only two people in the world actually really want them, and one of them is me that doesn’t buy consoles!
  15. What else?
VN:F [1.9.22_1171]
Rating: 9.7/10 (3 votes cast)
VN:F [1.9.22_1171]
Rating: +2 (from 2 votes)

I Wish I Was Three

Not three years old, but three instead of one. I wish there were three of me. This is actually close to my biggest impossible wish. And here it is, in more details.

I wish there were three of me. Three identical copies that had instant mental replication. That means that everything one of us saw or heard or felt, the others did too instantly; while still being able to separate their three sets of I/O streams if they wanted. We should also have elective brain-to-brain communication with a lot of bandwidth whenever we want, with no protocol and content restrictions. Basically, the three are better than a single triple-sized brain with three independent identical bodies. It is also worth noting that the I/O sharing is elective too. If one of us is in pain, the other two don’t necessarily have to be.

The physical bodies are always in the best shape among the three. Only one of our bodies can’t sustain an injury; because the best state is not being injured. Even two can’t be injured. One of our legs can’t be cut off, because there are at least one of us that has both legs. If one of us is not tired or hungry or thirsty, the other two won’t be either. If one of us sleeps enough, that would keep the other two rested and energetic. If one is happy, the other two will be too. This of course opens a loophole; via which one of us can be kept drugged up and happy which will in turn incapacitate the other two. But I’m not a crime-fighting superhero; I’m just a guy making an impossible wish! This might also mean that we are immortal (won’t die of natural causes) but this depends on the exact mechanics of this “bodies in best shape among the three”; I have several candidates but I’m not discussing them here.

The three of us can merge and re-divide into three or two or one whenever we want. This involves any of the instances vanishing, or reappearing beside another of us. This means that we can be only one person whenever we want (without losing any information) and get by with only a single legal identity. This also means we can teleport to each other!

With all that settled (and much detail missing; like what happened in our childhood and how other people react to this,) I can divide my life into three roles: one of us will sleep, eat, watch movies, read books and enjoy himself all day long. He basically will do whatever he wants. He will smoke and drink (without getting too drunk, because that would be a worse state than the other two) and what not. Another will only read and learn and work out, focusing on keeping the three brains and bodies in tiptop shape. The last one will write code and think about code 24/7.

Obviously some of us have to work. The “enjoying me” and the “learning me” do do some work and earn some money (mostly teaching and consulting) but the bulk of the work fall to the “coding me”; which is fine because I can find (have found) work that is enjoyable and educational and fulfilling and also pays the bills.

Well… there you go. This is my wish. Why only three you ask? No reason. There should be at least three. But more will obviously work out better. I’d like the ability to be more than three.

VN:F [1.9.22_1171]
Rating: 10.0/10 (2 votes cast)
VN:F [1.9.22_1171]
Rating: +2 (from 2 votes)

“Modern CPUs and Caches: A Starting Point for Programmers” Presentation

I gave a talk about some features of the modern CPUs and (a little bit about) caches. It’s an introductory talk on what makes them “modern”.

Here are the slides: in PDF (4.5MiB) and PPTX (207KiB).

Note that this talk is heavily influenced by Cliff Click’s excellent presentation on modern hardware (can be found from references of my slides.)

VN:F [1.9.22_1171]
Rating: 10.0/10 (3 votes cast)
VN:F [1.9.22_1171]
Rating: +2 (from 2 votes)

The Wheel of Time and I

I ran into the Wheel of Time series in late 2001. I started seriously reading the series in early 2002, and I’ve been reading and rereading them ever since. It’s a truly epic high-fantasy story, with characters and intricate story line that would literally take books to describe.

There have been 13 books in the main series, plus a prequel, plus an encyclopedia-like book, and I’ve read them all at least twice. I think I have read the series at least 4 full times, and some favorite books and parts of books much more (Sorry, Elayne. I’m not gonna waste time on your sniffing and snubbing and bitching and being stupid and petty, when Mat Cauthon is adventuring around!)

My fourth read-through is coming to a close, as I am partway into The Towers of Midnight (book 13,) warming myself up for the 14th and final book, A Memory of Light, to be release on Januray 8th, 2013.

With that, which I will have my hands on on Januray 9th probably, an epic adventure will come to a close, and a chapter of my life too. I’ve spend the third decade of my life, the most defining and fraught part of it so far, with the Wheel of Time, and although I wouldn’t go so far as to say it defined me, it certainly was worth all the time I put into it (and still will.)

So, with great expectations and very fond memories, I welcome the end to this great companion.

VN:F [1.9.22_1171]
Rating: 10.0/10 (1 vote cast)
VN:F [1.9.22_1171]
Rating: +1 (from 1 vote)

Fun with C++: std::list vs. std::vector Performance

Motivation

Bjarne Stroustrup, in his Going Native 2012 presentation, talks about a piece of example code that does some routine things with std::vectors and std::lists. He meant to present the resulting timing graphs, but the chart was missing from his presentation (it doesn’t matter that the graph was missing; everybody should know what the result is.)

A couple of weeks ago, my friend Amir H. Fassihi had a talk about video game optimization, so he and I sat down and wrote a couple of small programs to measure some simple stuff. The vector vs. list program was one of them. I’m going to present here the results and graphs that I hope would be analogous to Stroustrup’s missing one. I should also mention that while I was at it, I measured some other aspects of the same program, like the performance of Debug vs. Release builds and the effects of Microsoft’s iterator debugging “feature” on performance.

I did all these tests using Microsoft Visual C++ 2012 (the newly released final version, a.k.a. VC++11, a.k.a. Microsoft C/C++ compiler version 17.00.50727.1 for x86) on Windows 7 64-bit, on a machine with an Intel Core i7-920 CPU and a lot of DDR3-1066 memory. All my builds were 32-bit.

The Test Case

The code does two simple things. It fills a container (std::vector or std::list) with many thousands of random integers and keeps the numbers in order. It generate a number, finds the place in the container it should be at (and does that in a linear and stable manner,) inserts it there (which in case of vector, means moving everything after it one place over and possibly a reallocation and copy.)

After that the code erases the numbers from the container one-by-one, using the same numbers that was used to fill it, and does it in the same order. Again it searches for the number linearly, and in the case of vector a lot of numbers need to be moved around and possibly a reallocation and data copy needs to be performed.

Note that with a sorted vector, I could have easily done a binary search, but I didn’t do that (you know, to be fair to std::list!)

Here are the two functions that do the insertion and deletion of those many numbers:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
template <typename ContainerType> // Assuming ContainerType is std::vector<int> or std::list</int><int>
void Insert (ContainerType & c, unsigned n, unsigned seed)
{
  srand (seed);
  // Insert a sentinel to simplify the logic inside the loop.
  // The sentinel never occurs in data stream. The Rand() function ensures of this.
  c.push_back (std::numeric_limits<containertype::value_type>::max());
  for (unsigned i = 0; i < n; ++i)
  {
    auto d = Rand();	// This function generates an integer in [0, 1'000'000)
    for (auto j = c.begin(), je = c.end(); j != je; ++j)
      if (*j > d)
      {
        c.insert (j, d);
        break;
      }
  }
  c.pop_back (); // Remove the sentinel
}
 
template <typename ContainerType>
void Delete (ContainerType & c, unsigned n, unsigned seed)
{
  srand (seed);
  for (unsigned i = 0; i < n; ++i)
  {
    auto d = Rand ();
    for (auto j = c.begin(), je = c.end(); j != je; ++j)
      if (*j == d)
      {
        c.erase (j);
        break;
      }
  }
  assert (c.empty());
}

The rest of the file is timing code and loops that run these many times for different values of “n” and reporting code.

Here’s the full test source code: vector_vs_list_win32.cpp. Note that you must edit (a couple of lines in) this file to run with different containers and iterator debugging levels. The edits should be self-evident.

So without further ado, this is the result of two runs of the program, optimized in both runs, for std::vector and std::list, for lengths from 5’000 to 100’000 in increments of 5K:

std::vector vs. std::list

Figure 1. Run of optimized program (i.e. Release mode) for std::vector and std::list

The vector out-performs list by a large margin. It basically blows it out of the water. This result should not surprise any experienced programmers, but unfortunately it does many of us. I want you to think about it a little bit: the test case is almost ideal for linked-lists. It does only linear access (no binary searches and other shenanigans) and inserts and deletes a lot of items. Vectors are supposed to perform badly under these situations. There are a lot of pushing stuff along, one by one, and quite a few copying of the whole vector when the need for a re-size and reallocation arises. I could have pre-allocated the vector, but I didn’t. I could have done a binary search, but I didn’t. I could have put all the numbers in and then sorted them, but I didn’t. This is an almost pathologically bad case to use a vector.

The vector still kills the linked list in this test. And that’s all because vector stores its data in a contiguous block of memory. The vector has so much better performance (and as you increase the size of your data structure, the difference in performance becomes much more pronounced) because it accesses memory linearly.

The memory is so slow when you access it randomly, that the penalty for cache misses almost always dominates your CPU usage time. If you care at all about performance, the main memory should be considered a sequential-access device. You have been lied to all your life; RAM is not a random-access memory device!

This has been true in the last few years and in the future it will only become more so, barring any true revolutions in hardware technology (think Ansible!) or a fundamental change in computing model (think abandoning the von Neumann model,) of which there is no sign yet (and it would take decades, even if we had an alternative model.) This particular performance killer will get worse and worse with time.

By the way, I actually ran the vector vs. list for much larger numbers of elements. I went up to 800’000 elements and the list took 40 times the time to do the operations (about 4 and a half hours.) I didn’t include those results because I had changed the code a little bit and didn’t want to spend another 48 hours to run the full set of tests up to a million. Besides, the extreme difference in those cases would have made the charts unusable.

A Word About Memory Allocator Pressure

In my test program, I start from small values of N and go successively upwards. On the way, the program allocates and frees a lot of memory (specially the list version.) It is theoretically possible that after those many (de)allocations, the memory is fragmented too much (shouldn’t happen, because all those allocations in the list test are of the same size) or the internal data structures of the memory allocator are in such a state that make later allocations slower. This, should it happen, would be a problem specific to this test and wouldn’t happen if I just have run the test for a single specific (large) size.

Of course, that’s a theoretical problem. I did run the test for single specific sizes and didn’t see any meaningful differences from the numbers presented in the above chart. So, the allocator in VC++ 2012 and Win 7 at least doesn’t have these problems. (The allocator seems quite good actually. I haven’t done any real benchmarks (don’t know how!) but it looks very good.)

The Iterator Debugging Fiasco

Let me give you some background information about iterator debugging first. For years now (at least since VS2005) the Standard Template Library that is shipped with VC++ has had some debugging and security “features”. And I’m not talking about a handful of compile-time and run-time assertions here and a handful of NULL checks there.

Among other things, every instance of an iterator could (and probably did) store a pointer to its parent container (which adds 50%-100% to the size of most iterators) and had to keep this pointer current and correct which requires some bookkeeping and did a lot of checks against that on almost all operations (e.g. to see whether a begin() and end() passed to a function belong to the same exact container or not, etc.) Let’s call this category of “features” security features.

Also, every container could store a linked list of all iterators created against itself, and had to keep this list up-to-date (oh, the horror!) so that it could (among other things) notify all its iterators that it is being destructed and they are becoming invalid. This also adds to the size of the iterators, in addition to the obvious performance penalty. Let’s call this category of “features” debugging features.

The basic causes of the problems that ensued (which I will describe in a moment) were that having these “features” enabled produced a large and certainly noticeable performance hit, and also it changed the in-memory size and layout of STL containers and iterators. The performance thing is obviously undesirable, but at least most of us are used to trading off some performance to gain some correctness or robustness or reliability or security or whatever (although in this case, it wasn’t always some performance. It’s not hard to find cases that are several hundred or thousand times slower with these “features” enabled.)

The change in size and layout these “features” brought with themselves, caused a horrible problem: you couldn’t link a module that had either of these sets of features enabled with one that didn’t. I mean you could, but you had to be very very very careful and redesigned all your interfaces and reviewed all your code and some other very “trivial” things! (You couldn’t link them together because then your program would have violated the “One Definition Rule” of C++; which basically states that objects – functions, structs, classes, etc. – must have one and only one definition, which most certainly includes size and memory layout. Google “ODR” if you don’t know exactly what it means and why it is important.)

All this so far is quite inevitable from the point you put those “features” in onwards. I mean, sure, Microsoft could have introduced a special mode that would change the size and layout of data structures but not do the checks and not have most of the overhead. But hey, how can you expect them to put in a (partial) solution to a problem they have created out of the blue?!

And then they went on and added real insult to injury: they enabled those features by default (all of them in Debug mode, the “security” “features” in Release.) I don’t know whether you can imagine how horrible this was. Either you payed the performance penalty (sometimes several orders of magnitude,) or you had to make sure every piece of code you linked together had the same consistent state for these “features”. And this, unfortunately, is a lot harder than it should be, even with opensource software and libraries.

But fear not! Microsoft decided to (partly) do the right thing with VS2010 (VC++10) and disabled these “features” by default in Release mode. The full insanity is still in force in debug builds, unless you explicitly disable it.

The funny thing (funny like “you just cut my stomach open and spilled all my guts on the floor, ha ha!”) is that I have been using all these versions of Visual C++ almost exclusively over the past decade and I have been programming in C++ almost exclusively for all that time, and I have been doing only programming almost exclusively as a job and hobby and education, and (wait for it) I have never ever seen an error or exception or assertion or something that would mean that these “features” have found a semantic or logical bug in my code. NEVER. (This have happened in experimental code I have written to explore these exact “features” though. Oh, and I think once I did get a runtime error when I was demonstrating a deliberately wrong use of STL iterators to a colleague. But that’s it.)

Anyways, the way you control these “features” now is through a macro: _ITERATOR_DEBUG_LEVEL. By setting its value to zero, you disable all these “features”. If you set it to 1, you enable the security checks (internally, this sets _SECURE_SCL to 1, if you are interested,) and setting iterator debug level to 2 (which is only possible in Debug builds) enables both “security” and “debugging” “features” (that is, internally sets both _SECURE_SCL and _HAS_ITERATOR_DEBUGGING to 1.)

Note that iterator debugging features cannot be enabled in release builds (thank goodness!) only the security stuff. And you must make sure that these setting are consistent (i.e. the same) in all your translation units linked together, or be ready for a horrible and painful death.

Another redeeming step that Microsoft has taken is that its linker now actually detects whether the values of these macros are difference in different object files being linked together and emits an error at link time. This is done via #pragma detect_mismatch which you can use too, but only if you really really absolutely need it. I don’t recommend its use at all. (Actually, I don’t recommend doing anything that would make you need that pragma, not the use of the pragma itself per sé.)

Let me reiterate: by default, _ITERATOR_DEBUG_LEVEL is at 2 in debug builds and at 0 in release builds since VC++10. So you have less worries.

Debug vs. Release, Iterator Debugging Levels and Vector vs. List

I wanted to see how much of an impact does building in debug mode has on performance. While I was at it, I figured I would test the effect of iterator debugging too. Of course, this particular test case might not be too representative of the extent of the impact of debug iterators. It doesn’t make a lot of them. But it’s a test case, right?!

So, I ran the above code in 10 configurations: 5 times for std::vector and 5 times for std::list. Of each of those 5, two were release builds (with iterator debugging level at 0 and 1) and three were debug builds (with iterator debugging level at 0, 1 and 2.) And here are the results:

All variations of Debug/Release and "iterator debugging" levels of std::vector and std::list

Figure 2. The slower tests are charted to 50’000 items only to make it more readable and comparable. Also, “vector (D1)” means a test run with std::vector, in Debug mode and with _ITERATOR_DEBUG_LEVEL set to 1. Others are similar.

Clearly, the results are divided into 4 or 5 groups:

Last and least are std::list(D1) which for some reason is slower than everything else, and list(D2), vector(D1) and vector(D2). These are the slowest of the bunch, and very excruciatingly so. The surprising aspect is the performance of vectors in debug mode and with _ITERATOR_DEBUG_LEVEL at more than 0. They match perfectly with performance of linked lists under the same circumstances. It might be interesting for you to know that the performance of vector(D1) is consistently 350-380 times slower than vector(R0)!

Then there are list(D0) and vector(D0) which, although do have a visible difference in performance themselves, can be said that run in about half the time of their (D1) and (D2) counterparts. Which is nice; showing that Microsoft took something slow (containers at D0) and made it twice as much so, by adding iterator debugging! Again, note that vector basically turns into a linked list (performance-wise) in debug mode.

Then there are release runs. Slower among them are (not surprisingly) list(R0) and list(R1) which have no significant difference in performance.

Then comes vector(R1) which consistently takes about 4.2 times that of vector(R0) to run.

And vector(R0) is obviously the fastest of them all.

The Numbers

Here are the raw timing numbers, in case you are interested:

std::vector<int> std::list<int>
Items/1000 vector (R0) vector (R1) vector (D0) vector (D1) vector (D2) list (R0) list (R1) list (D0) list (D1) list (D2)
5 0.014574 0.063870 2.558822 5.570211 5.538715 0.045877 0.055235 2.871468 5.467757 5.474841
10 0.057579 0.254296 10.087894 22.083037 21.996993 0.212346 0.223635 11.576666 21.970288 21.819562
15 0.134310 0.584379 22.688879 49.628201 49.437376 0.671322 0.690941 26.006502 49.546681 49.321179
20 0.242330 1.028257 40.302217 88.207767 87.877841 1.621966 1.636993 46.400773 89.217524 88.499149
25 0.381292 1.609348 62.950749 137.784724 137.224947 3.099696 3.090824 72.671461 138.326046 138.027257
30 0.551077 2.318301 90.645978 198.336218 197.555785 5.034639 5.043223 104.800285 204.790316 199.959935
35 0.751020 3.157186 123.398738 269.935877 268.897665 7.447922 7.482581 143.059471 295.871096 271.500342
40 0.981919 4.124002 161.125857 352.516223 356.929195 10.363181 10.404890 187.012014 388.014482 357.468342
45 1.245177 5.220314 203.909820 446.220991 449.804260 13.722670 13.774679 236.784123 477.310521 454.270915
50 1.535809 6.444861 251.741221 550.736461 549.767501 17.565583 17.611244 300.148923 579.745649 552.227950
55 1.860030 7.801910 304.488853 666.36927 664.5630890 21.896122 21.939313 355.486155 680.804643 664.930526
60 2.213955 9.282642 362.487499 793.449197 791.424358 26.732949 26.802989 433.181225 818.329155 791.776140
65 2.598303 10.894812 929.046145 32.013240 32.094197 510.505818 957.129359 930.691816
70 3.016086 12.637096 37.816432 37.757181 595.500633 1110.138180 1079.866346
75 3.466504 14.510042 44.219127 44.011504 690.915856 1304.677958 1241.188626
80 3.947258 16.532691 50.760717 50.753722 787.618202 1474.479303 1413.809649
85 4.461017 18.650448 58.458141 58.003727 1590.378265
90 5.007600 20.911465 67.005726 65.900983 1784.907041
95 5.586207 23.301805 74.255781 74.111179 1990.003730
100 6.192424 25.824041 83.084121 83.078030 2208.519321
Table 1. The raw data that the above charts are generated from.

Conclusions

  1. It is probably always worth is to linearize your data and flatten your data structures. This, instead of hierarchical data structures and pointer spaghetti should be your default choice of data structures. Again: unless you have proven otherwise, a flat vector of data (not pointers) should be your default data structure for everything!
  2. Think of vector vs. linked list this way (I’m paraphrasing Stroustrup here): A linked list is a sequential data structure, but when you walk it sequentially, you are randomly jumping around in memory. A vector is random-access data structure, but when you walk it sequentially, you are also accessing memory linearly and very nicely.
  3. Look at Figure 2, at the slowest performers. Two vector runs are in there as well as two list runs, and they have almost exactly the same performance. It shows that it doesn’t take much to ruin a great thing: just some unconscious and uninformed decisions.
  4. I could have made a functionally equivalent version of the test for vector that would be much faster than this; by doing a binary search and by preallocating the vector and by employing other sort strategies (instead of essentially doing the worst things possible short of shuffling it randomly until it is sorted!) And I could have done all this without much more code, maybe 20-50 more lines. None of these optimizations are possible with a list. You have to go for other, far more complicated data structures to accomplish some of these (skip lists, balanced BSTs, etc.) I might do the optimizations one day to see how a faster version of vector compares to std::set for example.
  5. Building in debug mode and using iterator debugging features of VC++ very effectively turns your vector into a linked list!
  6. Both levels of iterator debugging are toxic to STL performance. It’s not 10% or 15%. It’s several hundred times! By the way, as I mentioned before, our particular use of iterators and containers doesn’t trigger the horrors of full iterator debugging. It’s a whole other level of hell by itself.
  7. Unfortunately, but somewhat unsurprisingly, some programmers “know” without reason or equivocation that STL containers are slow. That is absolutely not true. You need to know what’s going on under the hood, and make a case-by-case decision, or at least time your fucking code! Do not automatically assume anything, specially about performance.

VN:F [1.9.22_1171]
Rating: 10.0/10 (8 votes cast)
VN:F [1.9.22_1171]
Rating: +8 (from 8 votes)

Unstupendous Next-gen Consoles and the Revenge of PC?

The Rumors

The hardware specs for the next generation of gaming consoles, namely Orbis (which is a codename for the next PlayStation) and Durango (for the next Xbox) are just rumors and speculations at the moment. Here’s what’s rumored: Orbis is going to have an AMD x86 CPU, plus an AMD “Southern Islands” (i.e. Radeon HD 7000 series) GPU. Durango is rumored to have a 16-core IBM POWER family CPU with (again) an AMD Southern Islands GPU. I don’t know how much RAM either of these systems might have, and I’ll be happy to know of any rumors that might be flying around if anyone knows of any.

Hardware specifications for Nintendo’s Wii U, while not completely known, are at least official. They also list a “multi-core” IBM POWER-based CPU and an AMD GPU.

Change from Current Generation

For the next generation, Microsoft has not changed manufacturers of its two main parts. Xbox 360 also uses an IBM CPU and an AMD (then ATI) GPU. But Sony has turned its back to its previous suppliers (including itself.) Instead of the CELL Broadband Engine jointly developed by IBM, Sony and Toshiba, with an Nvidia designed and manufactured GPU, Sony has given the heart and soul of its next flagship console to AMD to play with. For completeness’s sake, the current generation high-end consoles both have 512MiB of RAM, although they use different configurations and specs.

AMD GPUs FTW!

If the rumors turn out to be close to reality, AMD will turn out to be the most important and unrivaled-in-that-importance gaming hardware manufacturer for the coming generation. The obvious first benefit is the revenue they will be harvesting from all those Microsoft, Nintendo and Sony consoles sold. Besides that, they will be in a unique position to drive the next generation of graphics. Also, since all these platforms will have AMD GPUs, PC developers (most of whom have an eye towards console targets) might also become inclined to use their hardware for development, testing and optimization.

The Steam hardware survey shows AMD’s GPU adoption at 35%, versus Nvidia’s current lead at 47%. The above, added to my own personal opinion that AMD’s GPUs have better performance and features at the same price point as Nvidia’s, might finally let AMD displace Nvidia as the top gaming GPU seller for the desktop. I hope neither capture too big a lead though, as a monopoly will certainly mean trouble and being ignored for the users.

An Integrated Dream for PlayStation 4

If both CPU and GPU of Orbis are indeed made by AMD, there is a chance for that platform to offer new levels of integration and cooperation between these two most important components of that system. Of course, all console platforms (except maybe a few, by which I mean the original Xbox!) have been much more integrated and accessible than PCs (where you have 16 layers of APIs and operating systems and drivers and crap between you and hardware,) but still there is more opportunity for integration and low-level cooperation when a high-end console’s CPU and GPU are designed and manufactured by the same company.

Since AMD’s general direction in the past years have been integrating CPUs and GPUs (albeit in the low-end end,) I can only wonder what Orbis will actually be like and capable of.

Generational Gap with PCs

When the last generation high-end consoles were released (in 2005 and 2006) they both boasted around 10 times compute power compared to typical high-end PCs of their day. This time around though, even if these two new consoles were released today, they wouldn’t be more powerful than enthusiast PCs that anybody with a few thousand bucks can buy. Much cheaper at the same performance; sure, but nothing to be excited about. (Note that some of the 16 cores in Durango, for example, are going to be dedicated to run its version of Kinnect, which brings the number of available cores down to 12 or 10 or 8, which is not that remarkable today.)

And they are not going to be released this hour. The consensus in the industry seems to be that they are going to be launched at the end of 2013 at the earliest, which is still 15 months off. It seems obvious that by then, even mid-high-end PCs will be comparable to them in theoretical power. So, the generation gap between consoles and PCs will be a thing of the past; and that means PCs will have much more juice in them at least to the end of this decade.

Cost-mitigating Hardware

So, why this unstupendous hardware? Simple: to control costs. Both the costs of the hardware at launch (and the couple of years after that) and much more importantly, the cost of developing games for these platforms. There is a very insightful and very informative article called the Rise of Costs, the Fall of Gaming about this subject, which I recommend for anyone even the least bit interested in video games.

It seems clear that most of AAA games need to take advantage of the power of the platforms they run on, or they will be considered inferior-looking and last-gen by many if not most gamers and game media. This mostly means higher-quality and more detailed textures, models, animations, sounds, etc. and more content in all areas. And this almost always directly translates to higher production budgets for those AAA games. Of course, other grades of games have to follow suit too.

Therefore, by introducing an overpowered platform, console manufacturers automatically increase the costs of making games for those platforms, and this is not something the content-makers and their consumers (who ultimately will be stuck with the costs) want.

It makes a lot of sense to me for the next generation of the consoles to be more powerful than the current generation by only a manageable margin. This lets the developers keep most of their current tools and technologies and processes in place (all of which have cost a lot of money, experience and time to create) and also helps mitigate the rampant race in useless graphical quality. And anything that occurs to me, obviously has occurred years ago to the experts in the field.

Revenge of the PC

Right now, at the end of a console generation cycle, an average gaming PC has more power than the consoles. And the games with best visuals that are unencumbered by platform politics all have their best presentations by far on the PC (e.g. Battlefield 3, Crysis 2, Rage, etc.)

With the release of the new generation, for probably the first time in the past decades, this pattern will continue and PCs will still hold the cutting edge. However, the general gamer public will always view consoles as the standard for visuals and presentation quality. This puts the PC in a unique position of being more powerful and at the same time not having to have much better looks.

The new generation of consoles will also be more like PCs, hardware-wise. Certainly much more similar than say, the CELL was!

Since developing anything for PCs is already easier (more tools, more accessible platform and information, developers more familiar with PCs) and the hardware in PCs will be more like those available in consoles (which mitigates some costs and hardship in porting code and content) and the PCs will have more power in them, the cost of development and porting of games to the PC will probably be much lower compared to consoles (certainly with a wider gap than over the lifetime of the current console generation.)

This, in all probability, won’t be able to dethrone consoles as the primary gaming platform, but may speed up the revival of PC as a target for more AAA games and more respect and care from the game developers. I myself certainly hope so!

VN:F [1.9.22_1171]
Rating: 9.7/10 (3 votes cast)
VN:F [1.9.22_1171]
Rating: +1 (from 1 vote)

A New “C++11: A Change in Style” Workshop

I’m going to present this subject again (as promised) in Iranian Game Development Institute with new and expanded content and a more interactive form. The workshop will be held on Thursday, August 2nd, 2012 (12th of Mordad(?), 1391) from 2:00pm for 4 hours.

For registration and more info, see here.

Update (2012-08-03): Here are the slides for the talk: PDF and PPTX. (Ignore the 10 or so slides in the back after the “Any Questions?” slide.)

VN:F [1.9.22_1171]
Rating: 9.1/10 (10 votes cast)
VN:F [1.9.22_1171]
Rating: +7 (from 7 votes)

C++11 – A Change in Style

That was the title of my Tehran Game Expo 2012 presentation that I gave on Friday, June 29th, 2012. Here are the files (pptx (147KiB) and pdf (1.8MiB)) for the impatient.

I doubt if the slides are of any use by themselves (even with me talking, they won’t be of much use!) But there they are anyways. You should note that I had planned many more slides about variadic templates, threads and atomics, general design and the re-emergence of C++.

However, halfway through the making, I realized that I would never get to even mention all I wanted in an hour, so I stopped. But I couldn’t bring myself to trim the slides I had already made. If you take a look at the presentation slides, keep in mind that I skipped everything from std::move to the end completely, except for the sample about std::async and std::futures (which I lifted from STL’s GoingNative 2012 talk, and was too cool to pass over.)

By the way, the presentations were being filmed, and if I receive the film and I can get permission to post it myself and I like my own presentation, I might post it here.

VN:F [1.9.22_1171]
Rating: 10.0/10 (3 votes cast)
VN:F [1.9.22_1171]
Rating: +3 (from 3 votes)

No More Singletons Ever

Four score and seven years ago, or maybe four and a half years ago, I wrote about my problems with unguaranteed order of initialization of static member data, across classes.

Obviously this pattern of problem comes up a lot in any mid-sized to large project, specially if you are using the concept of singletons. The solution that seems to be working for me is to forgo the use of normal singletons altogether. I mostly have used singletons for their convenience, their ability to solve a certain class or lifetime and object initialization order issues, and mostly as a means for service location.

The first and second reasons are basically moot. Any gains in convenience and lifetime management will be lost when your project reaches middle sizes (5 digits or low 6.) And there are many other ways to implement the service locator pattern without use of singletons.

You see, the problem with traditional ways of implementing singletons – when I say traditional, I mean ways that require global or static data, and strictly allows for exactly one object of one type – is exactly that they allow for one and only one instance of a certain type.

I can give you examples of when you need more than one instance of a singleton (e.g. more than one RenderSystem or LogManager) but I won’t. It probably won’t make sense until you’ve really felt the need. Instead I’m going to give you another kind of reason for abandoning traditional singletons.

Any tool, any abstraction, any language, anything like those, needs to at least do one thing to be generally usable and beneficial: it needs to make common tasks easy and uncommon tasks possible. Although singletons make dealing with a certain class of common problems easy, they also make a (not entirely) uncommon task completely impossible. If your singleton class puts real data, per-instance data, in static variables, that makes creating two instances of that class impossible.

In the future, I might post the technique I’m currently using and happy with here. Who knows?! It’s by no means revolutionary or spectacular, and I might not stay happy with it, though.

VN:F [1.9.22_1171]
Rating: 9.5/10 (2 votes cast)
VN:F [1.9.22_1171]
Rating: +1 (from 1 vote)

Another Birthday Present?!!!

Holy shit! Holy shit! It’s a Radeon HD 6990 (dual GPU)!!! Holy fucking tera FLOPS!

Thanks, Siamac!

Yes, an AMD Radeon HD 6990x2 graphics card!

AMD Radeon HD 6990x2

VN:F [1.9.22_1171]
Rating: 10.0/10 (7 votes cast)
VN:F [1.9.22_1171]
Rating: +9 (from 9 votes)

Goodbye, dmr…

#include <stdio.h>

int main ()
{
    printf ("goodbye, dad.\n");
    return 0;
}

VN:F [1.9.22_1171]
Rating: 9.6/10 (8 votes cast)
VN:F [1.9.22_1171]
Rating: +8 (from 8 votes)

New Cellphone! Yay!!!

Just got a coooooooooool new cellphone as a (early) birthday gift. It’s a Samsung I9100 Galaxy S II. I’ll write more about it later, but let me tell you now that it’s a monster!

Thank you, sweetheart.

VN:F [1.9.22_1171]
Rating: 10.0/10 (3 votes cast)
VN:F [1.9.22_1171]
Rating: +4 (from 4 votes)