The Speed of Software Halves Every 18 Months?

Eugenia Loli 2006-03-22 General Development 82 Comments

“Newer software does try to be sexier by doing flashy things such as the talking paperclip. This causes a tendency for the software to bloat to the point of saturation, i.e. until it performs at barely acceptable speeds. But application programming seems to get worse with faster processors. Even with more memory, faster disks and multiple CPUs, your average web application runs slower. Is it possible that the faster hardware has made us worse programmers?”

About The Author

Eugenia Loli

Ex-programmer, ex-editor in chief at OSNews.com, now a visual artist/filmmaker.

Follow me on Twitter @EugeniaLoli

82 Comments

2006-03-22 6:17 pm
TADavis
Why not use better hardware to simplify software? Complexity’s hell. I had a boss who seriously considered runing 16-bit Intel code on a 386 for the code size advantage. Who wants to work with segment registers… yuck.
2006-03-22 6:20 pm
sappyvcv
I don’t think it’s as simple as the guy is making it out to be.
Also, it seems like he’s making out programmers not worrying about the details of how stuff is done (such as how a value is set to 1) is a bad thing. To the contary, it’s a good thing. The less a programmer has to worry about the low level details of an application, the better, even if it’s at a cost of performance and/or memory.
Why? Because the more a programmer has to worry about low level details, the more likely they are to screw something up. Something bad too. Code should be more abstract, even if it means increasing the overhead performance and memory wise. In the end, it will produce more robust and trust-worthy code.
An example of this would be using std::string or something similar in C++ instead of char* and allocating the strings yourself. There are some overhead, but you no longer have to worry about buffer overflows (unless you really screw up) or memory leaks.
As far as the other reasons.. you can pin it mainly on software becoming increasingly complicated. Also, people tend to run more applications at once now, because they can.
But yes, programmers have also become lazier, because they can get away with it.

2006-03-22 6:36 pm
gamehack
I do not think that programmers should generally be encouraged not to worry about the details. You should really try to be as abstract as possible but do not forget that you are not living in a perfect world – all the statements you have written will eventually translate to low level assembly (which in turn goes down to binary which we all know). So what you should really know is how much things cost – how much does it cost to allocate 100 objects, how much does it cost to do things asynchroniously, how much does it cost to call this method or the other one. If you don’t really think & know about this, then I call this carelessness.
But it again boils down to “the right tool for the right job”. It’s not that you cannot write system software in Java or that you cannot write games object-modeled frameworks in C, it just is not the best thing to do. So I just want to say to all programmers that you should really know what you are doing and know the details. Don’t use a garbage collector as an excuse for your code being slow etc.
Regards

2006-03-22 9:37 pm
sappyvcv
Right, but if you need to optimize, you should go back and do it later, and not worry about it while writing the code.

2006-03-22 10:20 pm
ma_d
Definitely. But I think you should be able to make a few guesses before you start profiling at where problems will arise. But early optimization is definitely the devil.
I’d bet that in many software houses there’s no time left to optimize and it’s shipped.

2006-03-22 10:27 pm
sappyvcv
This industry is still evolving at an ungodly pace. It’s hard to perfect anything, because everything is changing.
Over time, the tools will improve, the processes will be improved and tweaked. Certain processes will be proven to be more efficient than others and more widely adopted, and we’ll start to see efficiency in programming improve again. Not because of the programmers so much, but partly because the damage inefficient code can do will be better limited and contained.
I have a feeling we’re still a ways off though.
What I think would be interesting is seeing more and more of the stuff software does now moved into the hardware. Imagine if a garbage-collected, safe, secure language (such as say Java, or a .NET language) was implemented almost purely on the hardware level?
edit: Not the libraries included, of course, just the core parts of the language(s).
Edited 2006-03-22 22:28
2006-03-22 11:54 pm
Luke McCarthy
deleted
Edited 2006-03-22 23:56
2006-03-22 11:56 pm
sappyvcv
Um.. what? I looked around and didn’t find any information on OCaml being handled on a hardware level.
2006-03-22 11:58 pm
Luke McCarthy
Sorry I mis-interpreted you post. I though by “hardware level” you meant “not in a VM”.
I remember reading about garbage collector hardware once.
Edit: For example:
http://www.google.co.uk/search?q=%22A+performance+analysis+of+t…
Edited 2006-03-23 00:06

2006-03-22 6:54 pm
Benjamin Cutler
I agree.
Why? Because the more a programmer has to worry about low level details, the more likely they are to screw something up. Something bad too.
C strings are a prime example. I wonder how many millions (billions?) of dollars have been lost due to the buffer overflow vulerability. Everything from large systems to business workstations to home computers have been effected by this vulerablity whether it be by a virus or by direct hacking. The downtime is expensive.
The designers of the C language designed C strings they way they did for a very specific reason: they are fast. C was indended to be a systems programming language (portable assembly), as such it needed to be as effecient as possible (computers at the time were slow) even if at the risk of being unsafe. The desingers of the C language were not ignorant people living in an early and ignorant time. C was invented in the early 1970’s; by that time garbage collection was over 10 years old. They knew exactly the tradeoffs they were making, but again, C was for systems programming, safety wasn’t as important as speed. Now we live in a different time. Safety is way more important than speed.

2006-03-22 7:09 pm
flav2000
C strings are a prime example. I wonder how many millions (billions?) of dollars have been lost due to the buffer overflow vulerability. Everything from large systems to business workstations to home computers have been effected by this vulerablity whether it be by a virus or by direct hacking. The downtime is expensive.
I agree with the need for security too. I am TAing a embedded systems course at a local university. I can’t believe the code the student writes. Given that most of them would not become embedded system coders, making coding more abstract makes sense.
It’s better to have the really talented people work on optimizing the compilers rather than making many not-so-talented people write better code.

2006-03-22 9:31 pm
ma_d
There’s definitely a problem with that mentality. I agree that really talented people should be doing things to make life easier for those of us who aren’t so skilled. However, I think that’s definitely already happening!
The trouble is this. Incoming freshman at my school, ISU, take one year of “hardcore programming” classes right off (a good thing). They’re in Java.
Right off the bat they’re working with immutable strings. They’re never going to appreciate them if they don’t play with c strings for several hundred hours.
Before that (last year) they came in with c++. Similar problem. While they now have mutable strings with no bounds checking, they still have a correct length function to use. And they have other nice little helpers that make strncat sound scary.
Realistically students should first be exposed to the basic logic structures they’re going to need. It doesn’t matter what language that happens in. Then you expose them to theory classes which tell them only asymptotic problems are problems (only if there isn’t a user wondering why it takes a half second to click a button). What they’re missing is a nice class to expose them to c programming. Let them get in trouble with corrupted heaps, corrupted stacks, string manipulation like it was a static sized array, etc.
TMK, the university has one c programming class and it’s under the college of engineering, computer engineering. I think it has a focus on embedded programming as well.
As for asm, well, that’s one section of one semester on MIPS. I’m not convinced asm should be required though.
After spending a lot of time with C when I program in a high level language these days I notice when I do something that should be costly, even if it’s one line. Cause I know how much work it’d take to do it in c.
Anyway, a lot of the slow down is added functionality I think. Comparing Lotus 123 and Excel, ouch. Besides, Excel is a snappy program!
I know I truly enjoy having a GC when I’m using a language with one. Realistically we have bigger problems to solve these days, and we have the same programmers. We need better tools, better abstractions. And hence, people need to buy better computers .
I still program with a 700 Celeron in mind. No matter what language. Unless the project were to dictate more (a 3d game).

2006-03-22 7:39 pm
lasuit
I basically agree with your point, particularly regarding safety and speed. At the desktop/workstation level we live in a world of GUIs, and things like strings should be made into “failsafe” objects, not to mention all the graphic bits.
However, it’s still exciting to see people who can work “magic” with low level code, and know how to better optimize between safety and speed. It’s not clear that you can always build the optimization into the compiler.
2006-03-22 9:56 pm
Zenja
The designers of the C language designed C strings they way they did for a very specific reason: they are fast.
What is a C string? The C standard has no concept of strings, it is not a built in data type, it is just an array of data (which some software interprets as an array of ASCII characters).
Software can organise data in any format they wish. There are character handling libraries which operate on structures where a count is also included with a buffer. Software engineers can use which ever format suits them best.
Just because some engineers are lazy and do no bounds checking, doesn’t mean that there is a fault with the language.

2006-03-22 10:00 pm
sappyvcv
I think he’s referring to null-terminated strings, which are sometimes called “C strings” as well. While it may not be technically correct, it doesn’t really matter.

2006-03-23 12:25 pm
BryanFeeney
The designers of the C language designed C strings they way they did for a very specific reason: they are fast.
Actually the C langauge’s strings are not fast, in fact it’s quite a suboptimal format. They went with the format because null-terminated strings had some hardware support on the PDP-11 (or some other PDP machine)
Using a marker at the end makes finding the length of a string an O(n) operation. This then has knock-on effects on a lot of other functions like strncat() which require the length of a string. This performance-hit is the reason why so many programmers went with things like strcpy() even when strncpy() was available.
The better approach, used by almost all other languages, is to store the length of the string separately, and update it accordingly in each string operation. In Pascal, the first byte of the character array holds the length of the string, and all string operations update that value. It sounds complex, but it is actually far more perfomant. In C you often see something like this
typedef char* string_t;
typedef unsigned char uint8_t;
string_t string_create (uint8_t len)
{
char * result = (char *) malloc ((len + 1) * sizeof (char));
result[0] = (char) len;
++result;
return (string_t) result;
}
int string_len (string_t str)
{
char *data = (char *) str;
–data;
return (uint8_t) data[0];
/* Note this would realistically be written as
return (uint8_t) *(((char *) str) – 1));
*/
}
Microsoft’s old BSTR type was implemented this way. By having the data before the declared start of the pointer, things like printf() still work, while the internal methods can easily and efficiently return the length of the strings for checks in other functions.
Edited 2006-03-23 12:35

2006-03-23 1:33 pm
TADavis
What I hate about strings with lengths is you have to worry they won’t overflow. I guess you could put a 32-bit length and be pretty safe. In my operating system, I read text files into strings and I imagine you’d pick a length too small for a decent sized text file. Some files (guttenburg bible) are 3 meg. I prefer uniformity and simplicity.
It’s true some operations are more effecient, but not all. A copy on some architectures is faster with a zero termination–you loop until a zero flag.

2006-03-22 7:13 pm
ThanhLy
The less a programmer has to worry about the low level details of an application, the better, even if it’s at a cost of performance and/or memory.
In a perfect world, yes this would be how all programmers work. But it’s not a perfect world, and some of us are limited by the SDK and/or available code libraries. If you’ve ever developed for Palm OS you’d understand. Low level memory management is a requirement for Palm OS app development, because that’s how their API works. On top of that, older versions of Palm OS limited apps to 96KB of heap space. The icing on the cake is “how do we make the most of this 16 MHz chip?”
I’ve seen my clients flat out stop using certain handheld programs because they were “too slow.”

2006-03-22 9:38 pm
sappyvcv
Coding on such devices is a whole different ballgame.

2006-03-22 7:27 pm
dylansmrjones
As the level of abstraction rises, the level of system resource abusage rises, and software becomes slower.
It’s been known for quite awhile.
Your entire notion of “Code should be more abstract, even if it means increasing the overhead performance and memory wise. In the end, it will produce more robust and trust-worthy code” is funny, since most applications shows that the opposite to be true.
The higher abstraction the lower code quality.

2006-03-22 9:40 pm
sappyvcv
“The higher abstraction the lower code quality.”
I should have been more clear. I don’t mean in regard to everything. But things like using std::string, std::vector, etc, in C++ are examples of the kind of things that should be abstracted.

2006-03-22 10:07 pm
dylansmrjones
Okay. I can clearly agree with you in this situation.

2006-03-22 6:32 pm
TADavis
Balance is the key. See my operating system at http:/www.justrighteous.org
2006-03-22 6:41 pm
shotsman
As a programmer of some 34 years(yep I’m a dinosoar) I have to state that there is bloat in software and its geting worser.
For example.
1975, Fortran Program + various Assembler modules to run the avionics for a Harrier Aircraft in real time. This drove all the real aircraft avionics & weapons systems. It ran on a PDP-11/45 in less that 56Kb of code!
Please try that today.
This drove an attached processor (11/05) via a 4KB memory window. This system supplied the graphics to a large CRT placed in front of the Cockpit.
I could even mention a Boeing 707 Flight Simulator powered by a Honeywell DDP124. The application was loaded using Punched Cards.(Circa 1970)
Then there was the team of 10 software developers all coding in Basic & C working on a VAX 11/780 with 1.5Mb or RAM!
Now I write Java & .Net apps that have so much bloat its unbelievable due to the way its all packaged.
All these wrappers, jvm’s etc are all very well in principle but do vastly increase the amount of finite resources available to the actual code that you are trying to run.
Even PDA’s run vastly bloated software.
IMHO, the only arena where bloat is considered evil is in the embedded market but 99.99% of programmers don’t even understand what proper real time programming.
Thats my 0.02Tenge’s worth.

2006-03-22 7:23 pm
KenJackson
Ah, the good old days!
IMHO, the only arena where bloat is considered evil is in the embedded market…
Well you are right about embedded. I am very pleased to be using gcc to program an Atmel AVR microcontroller with 8K FLASH and 1K RAM.
To further your point, I was once criticized for using C instead of assembler to program a PIC microcontroller because C bloats by comparison.

2006-03-22 9:34 pm
ma_d
Was this a shell code writer who was accusing you?

2006-03-22 9:49 pm
KenJackson
Was this a shell code writer who was accusing you?
No. It was another engineer who is proud of his hard-to-read assembly code.

2006-03-22 6:42 pm
BryanFeeney
Consider: Lotus 123 had strict rules about case and white-space. Excel is case and white-space insensitive, which is an O(n) cleanup job. Lotus 123 only supported ASCII. Excel supports Unicode. Excel has multiple levels of undo, prompting in formulas, colour coding of values, and a lot of other things that make life easier. Most of this isn’t bloat or eye-candy in the traditional sense, it’s worthwhile, and in some cases necessary, improvements.[1]
Also it’s important to realise, as has already been said, that high-level languages (HLLs) encapsulate best practices. When everyone was writing assembler, a lot of people probably wrote mov ax 1. When everyone starting using HLLs, the intelligence was built into the compiler so it would know how to get about the xor trick.
In fact, with modern CPUs being hyper-threaded, pipelined, and superscaler – and having SIMD units built in – it would take a prohibitive amount of time for an assembly programmer to match what a compiler can do in a few minutes.
Arguably some HLLs feature poor decisions: for example, it will only be with Java6 that objects will be allocated on the stack where possible (using IBM’s escape analysis technique). However, in the general case they allow programmers to write more powerful, featureful languages in drastically less time than before. Indeed, they let programmers attempt projects that would have been nigh-on impossible before.
That said, I would always advise teaching students a low-ish level language like C, so they are aware of what goes on under the hood in more modern langauges. For the same reason it’s a good idea to teach them the basics of CPU design with some CPU emulators on which they can experiment with basic assembly. However there are very few genuine areas in the modern world where you need, or even should, code to the metal in assembly, or even C.
————
[1] That said the example of Gnumeric show that Excel need not be as large as it is. Ironically, this is probably because Gnumeric is built on a more more abstracted API layer (GLib, GTK) than Excel (Win32).
Edited 2006-03-22 18:44

2006-03-22 7:34 pm
KenJackson
However there are very few genuine areas in the modern world where you need, or even should, code to the metal in assembly, or even C.
One such area is my domain–small embedded microcontrollers.

2006-03-22 6:47 pm
Luke McCarthy
*insert horrible mental image*
2006-03-22 6:50 pm
TADavis
I left-off the other slash in http://www.justrighteous.org. I’m a dinosaur too.

2006-03-22 7:03 pm
bryanv
I did. I had to copy/paste it.
Actually looks like a really neat project. You rock!

2006-03-22 7:50 pm
Lakedaemon
Looks nice, indeed…
I did not understand what the rambling about god was about though…
Maybnee you should keep it separate from your programming notes…

2006-03-22 8:01 pm
TADavis
It’s gone, but not all. Thanks.
How about an article on what people would want if there were a fresh slate?

2006-03-22 8:55 pm
suryad
Why did he get modded down?

2006-03-23 6:03 pm
cerbie
As a man dressed as a lady with vikings in a restaurant once said: “I don’t like SPAM!”

2006-03-22 9:20 pm
CaptainPinko
ASCII only? Yes it is a lot smaller than Unicode… but at a great loss of functionality. It makes computers difficult for any non-english speaker to use and hell for anyone using a non-latin alphabet. So ASCII is not more effecient… it’s just more useless and just happens to be smaller as a side effect.

2006-03-22 6:53 pm
TADavis
I run a space to tab utility on my code and everything, so I don’t double-space sentence starts or they get messed-up. I think I’m going to add an ASCII code for spaces not to touch. I love starting with a fresh slate.
2006-03-22 6:56 pm
Cobain
In a company you are always try to make the project times as short as possible and make a product as cheap as possible and easy to maintain, this means as little time as possible is set aside for speeding up the code. This means using a lot of tools that generates bloated code.
2006-03-22 7:00 pm
negativity
Many companies which sponsor software development have interests that are contrary to better software development. For example, why do companies like Sun, IBM and Microsoft create software development tools like Java and .NET but do not provide linkers which could free the end developers of having to deploy huge runtimes separately? They prefer to waste money on JIT than on linkers? On the other hand, many these abstractions do make sense, like in creating code that’s cross-platform. On the other hand, they still need to depend on platform-specific code to run their code. Thus, change everything to stay the same?
Some people mention the ObjectiveC and D programming languages as good alternatives. But Java and .NET are the defacto standards nowadays. What gives? When did it went wrong?
2006-03-22 7:06 pm
Sphinx
After several rounds of job postings and interviews I have to say it certainly does feel like the quality of programmers has seriously declined, there should be an apprenticeship program.

2006-03-22 9:36 pm
cg0def
yes there are a lot of crappy programers today but maybe this is so because the industry wants it this way. Back in tha day when people were moving away from assembly a similar thing occured. The demand for great programers goes down and the demand for cheaper and less skilled labour goes up. Only this will come to bite us in the butt, I’m affraid.

2006-03-23 4:48 pm
Sphinx
Word

2006-03-22 7:27 pm
luzr
Well, one thing forgotten is that by using those bloated frameworks/platforms, productivity in fact often goes DOWN and maintainance costs UP, as more complicated frameworks inevitably results in more interface contracts to handle with and therefore in much more bug opportunities.
So in fact, it is even worse – you not only get slower programs, but also more bugs…
2006-03-22 7:44 pm
hustomte
One of the things that seems to be very influential on the memory hogging that is introduced by new programs is the *bad* way that memory gets allocated, and the wasteful way that memory is used.
For example, it seems to me that anyone who writes a Java app manages to do this is such a p*ss poor way that the garbage collector can’t reclaim memory as it should (due to referenses never used again).

2006-03-22 8:24 pm
Luke McCarthy
One of the things that seems to be very influential on the memory hogging that is introduced by new programs is the *bad* way that memory gets allocated, and the wasteful way that memory is used.
Indeed. I see a lot of heap allocation used when static or stack allocation would suffice. I’ve seen many Java classes where fields are declared and objects allocated for every instance, when only one instance is ever needed (should be declared static). Also, I’ve seen many classes where fields are declared which should be locals. It all adds up. (The code I’m talking about is by fellow students so I don’t know if such amaturish practices are more widespread). Got any more examples yourself? I’d like to catalogue these “memory abuses” for fun & profit 😉
I have written an allocator with essentially zero overhead per object (they are packed together like an array) with only a small overhead per page (20 bytes I think). Of course this requires that you have seperate pages for different object sizes, but it’s a nice trade-off and works well.
Think about the overhead you pay with java.lang.Object for small objects. If you have, say, an object with two int fields, 8 bytes, and the overhead was 8 bytes (hypothetical, I don’t know), you’ve just doubled your memory usage! People don’t consider these things.
For example, it seems to me that anyone who writes a Java app manages to do this is such a p*ss poor way that the garbage collector can’t reclaim memory as it should (due to referenses never used again).
What would you suggest the GC do though? It can’t leave dangling references. Java supports weak pointers (see java.lang.ref) if that’s the behaviour you need (very useful for object caching).

2006-03-22 7:50 pm
dimosd
Nature hates a void: as the size of RAM/hard disks increases, programs/data will expand to fill it.
Lisp (/Java/Mono) programmers know the value of anything and the price of nothing.
I don’t think it’s simply abstraction vs low level optimization. It’s sheer laziness as well (if it works, somewhat, why try harder?)
2006-03-22 8:10 pm
mOrPhie
Back in the days where I was using a PII 350 my computer could just run new applications such as MS Office 2000. Today I run KDE 3.5 on linux en Windows XP SP2 without any speed problems. So in my experience software became faster relatively to the hardware.
2006-03-22 8:23 pm
Innominandum
I’m glad I haven’t been alone on this opinion (re: pathetic software performance) for all of these years.
2006-03-22 9:01 pm
jonsmirl
If you trying running Wordperfect or Lotus123 from 1990 on current hardware you’ll see that performance is pretty much instantaneous. Of course those apps are missing many of the features of the current versions. But do you really need those features? People were obviously able to get work done using those programs.

2006-03-22 9:18 pm
BlackJack75
There’s indeed a problem with the “need” for new features everytime a new software release comes out. Obviously some people do need that new feature or that other but in the end the result is that Adobe Acrobat Reader takes 30 secondes to start because it loads dozens of plugins 99% of the people would never use.
On the other hand, Apple’s Preview app does display PDF instantly and for most people the missing features aren’t missed at all. Same goes for GPDF.
I think part of the problem is related with the ease of piracy. So many people run PhotoShop CS2 on hardware that wasn’t designed for it (eg. not the highest end machines).

2006-03-22 9:27 pm
DonQ
Actually there’s one constant, not related to coding techniques or hardware or computers at all – acceptable user interface response time.
Whatever program we’re using – we consider it fast, when we don’t need to wait for some things happen (text editing – characters need to show instantly) or when we shouldn’t wait more than some seconds for ‘state-changing’ operations (document saving for example). If response is slower, we consider such application crap and (if possible) start search for alternatives.
Due to the very high software development cost, applications will be optimized exactly to the point when they are [barely] usable on average hardware. (Of course, there are exceptions.) And this makes “Gates law” (from article) absolutely clear – software tends to become slower at the same rate as hardware becomes faster. Net effect is zero; user experience – speed doesn’t change.
2006-03-22 9:27 pm
rayiner
First, his knowledge of the CPU seems at the level of a typical assembly programmer’s — he thinks the ASM is actually what the CPU runs. For example, his example about mov ax, 1 is completely off-base. x86 can encode immediate data up to 32-bits (one of the nice things about being a variable-length ISA). So his mov ax,1 example is likely more efficient than his two-instruction example.
Second, he misses the forest for the trees. Most modern code is slow not because of a lack of low-level optimizations, but because of poor algorithms. He goes on and on about Java, but fails to realize that most software (most of which is slow), is written in C. Indeed, C is potentially the problem. C programmers spend so much time worrying about details, and C is so bad at abstracting complexity, that C code all too often uses simpler, slower algorithms instead of faster, more complex ones.

2006-03-22 9:59 pm
sappyvcv
“x86 can encode immediate data up to 32-bits (one of the nice things about being a variable-length ISA). So his mov ax,1 example is likely more efficient than his two-instruction example. “
No, actually it isn’t, the guy is right. xor’ing a register on itself is faster than moving a value to it.
—-
Another thing is that Code reuse is pushed heavily. You know, don’t write your own code to do common things, use existing code. Well let’s see you want to do a certain functions, and some library provides the ability to do that, but you only need THAT function. It doesn’t matter, all the code for that library will be included (most likely) in your program, and most of it is unused. That is “bloat” in a sense, but not out of laziness.
It’s simply faster and safer to use existing trust-worthy code, even if you have to take an overhead of code included that you won’t use.

2006-03-23 12:46 am
rayiner
No, actually it isn’t, the guy is right. xor’ing a register on itself is faster than moving a value to it.
Historically, xor eax,eax was preferred to mov eax,0 because the former can be emitted with 2 bytes, while the latter takes 5. However, we’re not clearing eax, we’re setting ax to 1. That means the first sequence is at least three bytes (two for the xor, one for the inc), and the second sequence probably has a shorter encoding. Since the first sequence decomposes into 2 uop, and the second into 1 uop, the second should be faster.

2006-03-23 12:49 am
sappyvcv
No, xor took less clock cycles, that’s why.

2006-03-23 1:17 am
rayiner
You’re right that at one time the XOR was preferred because it took fewer clock cycles, but that was on the 8086. Ever since the 286, clearing a register via MOV and via XOR has taken the same number of clock cycles. However, to this day, GCC will still clear a register using XOR because it takes fewer bytes.
2006-03-23 1:35 am
sappyvcv
According to my sources, the last processor where this occurred was the pentium III.
2006-03-23 1:51 am
rayiner
What source was this? XOR and MOV have been single-cycle since the 486. Perhaps you’re referring to the partial register stall issue? That’s not really relevant to this particular case, though it might be depending on the surrounding code.
2006-03-23 2:49 am
sappyvcv
An ASM programmer.
In the p4 manual is when they started recommending using mov wax,0 because of register stall issues. Or you can pair op codes a certain way.
I don’t believe it had to do with actual clock cycles so much as register stalls and surrounding code.
2006-03-23 3:03 am
rayiner
ASM programming knowledge is evanescent. Your ASM programmer friend is right that the XOR form can avoid partial register stalls, but only on the P6 core, which is prone to them. On that architecture, reading a 32-bit register after writing to that register’s 16-bit lower half will cause the processor to stall, except with XOR or SUB. So depending on whether eax is ever read in the above fragment, the XOR form could be faster, but only for the P6 core. No other current x86 core (including P6-derivatives like the Dothan or Yonah cores), suffer from this specific case of partial register stall.

2006-03-23 1:38 am
rajj
Second, he misses the forest for the trees. Most modern code is slow not because of a lack of low-level optimizations, but because of poor algorithms. He goes on and on about Java, but fails to realize that most software (most of which is slow), is written in C. Indeed, C is potentially the problem. C programmers spend so much time worrying about details, and C is so bad at abstracting complexity, that C code all too often uses simpler, slower algorithms instead of faster, more complex ones.
Your postulation ignores that fact that faster algorithms usually have large constants that overwhelm n when n is small, and n is almost always small.

2006-03-23 2:08 am
rayiner
That’s highly dependent on the algorithm in question. Most “complicated” algorithms I’ve seen are rarely more than a constant factor of 2 or 3 slower than the simple alternative, and are vastly faster for large data sets. Also, remember that in most cases, performance isn’t particularly critical for small data sets anyway. If you’re writing an iPhoto competitor, an O(1) algorithm that takes 0.5 seconds with 100 photos is better than an O(n) algorithm that takes 0.1 seconds with 100 photos. Both are “fast enough” for the small case, but the former is far better for the cases where performance will truely be relevant — large data sets.
More generally, my point is that if you’re trying to figure out what makes software slow, don’t look at the ASM, look at the algorithms. Excess I/O, excess IPC, bad asmyptotic behavior, cache thrasing, etc, are all far more likely candidates than a few wasted cycles here or there.

2006-03-22 9:29 pm
cg0def
This is not always the case and we can now say that this is really not the case with open source software projects like Gnome. ( the speed of 2.14 is greatly increased compared to the releases from and year ago ) Though when it comes to commercial software projects it is inevitable that speed goes down and complexity goes up. It is not that the programers are worst than they used to be 10 years ago but every company wants to increase its profits and debugging is REALLY expensive. Debugging is actually the most expensive part of coding and while sometimes it is necessary, when it comes to speed of execution it is merelly a suggestion. After all your userbase is likelly to purchase a new computer in the next 3 – 5 years anyway so the money that you will spend increasing the execution speed is a lot better spent developing new features. This is exactly why so many companies are falling in love with managed code. Managed code is the worst thing that could happen to performance but it does provide you with a consistent framework for writing medium speed software at a relativelly fast rate. Perfect for big business and terrible for the consumer pocket as it basicaly means that good design patters are out the window and great ones are never even going to enter through the front door. That combined with the fact that most office software has a lifecycle of 10 or more years spells disaster for me.
2006-03-22 9:56 pm
saxifrage
High level language abstractions, and to some extent, frameworks of one kind or another have undoubtedly affected the speed and memory footprint of much of today’s software.
As we develop in languages that are further and further removed from the system’s hardware, though, these abstractions are meant to give us more useful and robust building blocks to build ever more complex and useful software. Should OpenOffice.org Writer or Microsoft Word be rewritten in assembler because it would give us gains in speed or reduce program size? I say no! The negative consequences would be immediately obvious — the program would never get out the door.
Look at all of the wonderful programs that are being designed with Python and GTK. These could be written in C, but the code size and complexity to some extent would go up. And where speed becomes an issue, the relevant sections of code can be written in C instead.
The same abstractions and tradeoffs have been made for years. How many lines of code does it take in assembler to make a hello world program? I can’t even remember my first hello.com file that I compiled with TASM back in the day, but I could still rattle off the code for ‘Hello World’ in dozens of other languages — an example of where this abstraction (and unfairly characterized laziness/ignorance) leads to easier code reading, maintenance, and faster development.
2006-03-22 11:28 pm
de_wizze
… programmers have been spoiled into utter toddlers. If the programming language doesn’t clean up after them they don’t want to play with it. I think the problem is that it has come to a point where so much is done behind the scenes by these IDE/GUI Building/Compilers to silently fix or accomadate errors that you end up building complex systems with out the slightest bit of consideration for doing the fundamental building block correctly.
The concept of on the first day of class learning how to make a ‘hello world’ application window is just beyond me.
2006-03-22 11:42 pm
Luke McCarthy
Let’s put this one to rest.
==============================
[shaurz@proxima ~]$ cat test.asm
use32
global _start
_start:
mov ecx, 0xFFFFFFF
.loop:
xor ax, ax
inc ax
dec ecx
jnz .loop
mov eax, 1
xor ebx, ebx
int 0x80
[shaurz@proxima ~]$ nasm -felf test.asm
[shaurz@proxima ~]$ gcc -nostdlib -o test test.o
[shaurz@proxima ~]$ time ./test
real 0m0.363s
user 0m0.360s
sys 0m0.004s
[shaurz@proxima ~]$ time ./test
real 0m0.364s
user 0m0.364s
sys 0m0.000s
[shaurz@proxima ~]$ cat test.asm
use32
global _start
_start:
mov ecx, 0xFFFFFFF
.loop:
mov ax, 1
dec ecx
jnz .loop
mov eax, 1
xor ebx, ebx
int 0x80
[shaurz@proxima ~]$ nasm -felf test.asm
[shaurz@proxima ~]$ gcc -nostdlib -o test test.o
[shaurz@proxima ~]$ time ./test
real 0m0.365s
user 0m0.360s
sys 0m0.000s
[shaurz@proxima ~]$ time ./test
real 0m0.362s
user 0m0.364s
sys 0m0.000s
==============================
The important timing here is “user”. As you can see for both “mov ax, 1” and “xor ax, ax; inc ax” the first and second times are exactly the same!
This is on an Athlon XP 1800+ in Linux.
Moral of the story, micro-optimisations like this are irrelevant. Clarity first. Choose a better algorithm.

2006-03-23 1:34 am
rayiner
The author’s code fragments are interesting in that even on the 8086, on which the XOR versus MOV thing came about, the first code fragment would’ve been slower. The cycle counts are shown below:
Fragment 1:
xor ax, ax ; 3 cycles
inc ax ; 2 cycles
Fragment 2:
mov ax, 1 ; 4 cycles
On that processor, the two fragments would’ve also been the exact same size, coming in at 3 bytes apiece.
On any modern processor, the first form will be half as fast as the second form, because it uses two uops instead of one. You’re not seeing it in your benchmark, because on a superscaler processor the INC can execute in parallel with the DEC. You can modify the benchmark as such:
Fragment 1:
xor ax,ax
inc ax
xor bx,bx
inc bx
Fragment 2:
mov ax, 1
mov bx, 1
You’ll see that the second version runs faster than the first (0.45s versus 0.75s on my Pentium 4 2.0 GHz).

2006-03-22 11:47 pm
Luke McCarthy
Furthermore, the “better” version is actually 1 byte larger (mainly due to 16-bit prefix):
66 b8 01 00 mov $0x1,%ax
66 31 c0 xor %ax,%ax
66 40 inc %ax

2006-03-23 12:12 pm
TADavis
The boss was suggesting using a 16-bit code segment where 16-bit instructions are default and 32-bit have the prefix. An old rule of thumb was you could calculate speed by memory accesses. With cache, it’s mostly not true, however, with cache, short code is faster under certain conditions.
I imagine when 16-bit code segments are used, 16-bit instructions are the same speed as 32-bit instructions in a 32-bit code segment.

2006-03-22 11:52 pm
Luke McCarthy
And this guy should know that using 16-bit registers on modern x86 CPUs is suboptimal, right?
2006-03-23 12:34 am
JeffS
My first computer was 233MHz Pentium II, with 32 megs of RAM, and had Win95 on it. It ran MS Office, SQL Server, Visual Basic, and a host of games quite quickly.
My current machine is a 2.2 GHz Pentium IV, with 768 megs of RAM, with WinXP SP2, dual booting with Mandriva 2005. On the Windows side, Office is fairly sluggish, as is OpenOffice. NetBeans and Eclipse take a while to launch, and use lot’s of memory, but once up, they’re pretty quick.
Anyway, while the newer machine has vastly superior specs to the original machine, overall it’s only a bit faster, and in some cases, with some apps, it’s slower.
Amazing.
2006-03-23 12:39 am
HappyGod
I once wrote a time management application for a library. And it worked just great on my Pentium II, and was about 5Mb worth of install (expanded out to about 10Mb once users started entering in info).
I was then told it would be running on a 386 with 8Mb RAM, that had approximately 15Mb free disk space. Not surprisingly it was a total disaster.
So, I went back to the drawing board and created Mk2 which could fit on a floppy disk, proving that when you have to your programs are leaner and more efficient.
2006-03-23 4:26 am
Mystilleef
Does anyone really want to return to the good old days? I know I don’t. The reason software is “bloated” today is because expectations are higher. Users today demand a better user experience, more automation, more intelligence, robustness, scalability and more eye-candy! They didn’t buy that 512MB graphics card and 4GHZ CPU to be stunned by a 1970 PDP experience.
Then in the programming world they came up with these things called abstractions and libraries. I hear developers who taste of these things begin to write “bloated” code. You see as opposed to rewriting the sun and the moon, they use ready-made packages that help them focus on the problem at hand as opposed to playing hide and seek with hardware.
The reality is that code only needs to be fast enough and that’s about it. In mordern graphic user environments, the application just sits there doing absolutely nothing 90% of the time, and eating memory while it’s at it.
What developers need today are better diagnostic tools to enable them identify the bottlenecks in their programs and thus allow them to focus on what most important in software. No, sorry, it’s not speed, it’s good design and better experiences. And these come at a cost.
Smart progammers appreciate fast software, they are just not obsessed over it. And apart from geeks who have numerous ugly skinned system monitors active on their desktop and who get an orgasm monitoring CPU spike patterns, most users just don’t give a damn. I want a better experience thank you very much. The good old days were fun, but I look forward to the future.
2006-03-23 10:17 am
mark15
1. In this discussion we concentrate on technical aspects of programming. It’s important, but not so much – much more important is MARKETING.
Consider Microsoft Office. Lets say, they have version 6 and want to release version 7. Do you think that they will able to sell it just saying: the newer version is less buggy, faster and needs less memory? No, probably no one will buy it. Instead, they must say: the newer version has more functions, more colors and is more eye-candy. Therefore they introduce more and more new functions. And then they will find new clients. And this is – in my opinion – the most important reason, why we have bloat software.
2. Solution? Better compilers. Yes. Better education of the programmers – with no doubt. Better algorythms – of course. But I would like to point on another solution suggested by Niklaus Wirth: operating system and applications as a set of independent modules. As a consequence everybody would be able to create its own “Office” application by gathering together modules she/he needs (see Oberon System).
3 What is faster: xor ax,ax; inc ax versus mov ax,1.
In my opinion both answers are correct and it depends on hardware we consider:
a) on the very old 8086 processors (4MHz) where access time to memory was comparable with processor clock both algorythms gave probably the same results (it would be nice to check it :-).
b, later on with advent of 286, 386 and next processors, the access to memory (comparing to the processor clock) was so slow that the first solution was much faster and widely used.
c, nowadays with highly optimised superscalar processors (as rayiner shows) the second solution is faster again.
Marek

2006-03-23 3:08 pm
rayiner
When did access time to memory ever become a factor? Neither sequence accesses memory directly, and in terms of instruction fetch, both sequences are the same size.

2006-03-23 4:58 pm
Kochise
Ask (real) people about RAS-CAS clock/latency, and if there can be an impact on performance. Does EDO ram sticks performs betters than DDR3 ?
Kochise

2006-03-23 11:02 am
Kochise
I’m a legacy assembler and embedded coder who, due to some reasons, had to ‘convert’ himself to Windows. I used to code on ARM and 68K architechtures, and shifting on x86 was a pain. I never experienced memory segmentation before, and I can tell it’s a no-go. Hopefully Intel copied the Motorola’s flat memory concept of its 68K family in their 386+ CPU. Good…
What puzzle me the most is seing all these human/coder resources for 40+ years still beating themselves with strings and so. I’m currently fighting against Microsoft’s new security policy in Windows Mobile 5 platform, like if they just discovered what coding security is. Note they released many PDA devices stuck to a QVGA 240×320 resolution, with no API for higher resolutions, like if things would stay this way forever, and unable to learn from their own desktop experience.
So it’s easy to bash the author of the article for telling already known truth. But just admit YOU are in fault, too lazy to code better by yourself, or under the dictatorship of a manager/CTO that was offered a new technology supposively able to BOOST his peer’s productivity, making him shine a great profit interest at the end of the year.
I’m an average-joe-coder that won’t touch any bill from any profit, only stock holders’ will. So what’s my interest in the process ? Ease myself coding ? Make my boss earning more money ? For the fun of it ?
For my personal projects, I try to code wiser, as I know it’s done not for profit and I will have to maintain the stuff later. In a coding staff, you don’t really care about the poor novice that will have to maintain your code in some years, as a job evaluation or project thesis.
Kochise
2006-03-23 11:31 am
axilmar
Today’s software seems slower than the software of yesteryear due to slower algorithms being applied to a problem. It is neither due to more abstraction or lack of micro-optimizations or feature bloat.
1) most garbage-collected languages require that all objects are allocated on the heap. That has a big impact on many algorithms, where local stack data would be sufficient.
2) most garbage-collected languages use handles for objects. Handles are double pointers, and the cost of dereferencing double pointers is greater than the direct approach.
3) many languages check everything in run time, where as a simple proof at compile time would suffice.
4) many languages, especially dynamic ones, sacrifice efficiency for flexibility, without succeeding in either at the end. For example, the Java’s type system can not handle primitives like ints in generics: instead ‘int’ is converted to Integer internally.
5) most environments are boxed inside virtual machines that provide yet another abstraction of a Turing machine. Take a J2EE app, for example: in runs under the JVM, which in turn runs under the local O/S in protected mode, which in turn runs under a kernel supervisor mode, which in turn runs at 80×86 code which is translated on the fly on RISC instructions for the Pentium/AMD/other processor.
6) operations have to go through many layers of library code to get the job done. Take drawing of a line in Java, for example:
a) the programmer has to instantiate a device context,
b) draw a line in the context,
c) the context stores the command in a command buffer (probably enlarging the buffer if needed),
d) the Swing window manager refreshes its dirty rectangle lists,
e) the call is passed to Win32 through JNI
f) Win32 passes the call to the underlying device object
h) the O/S switches to kernel mode to execute the call
i) the video driver is invoked
j) the O/S preempts the kernel and gets back to the application.
What used to be a nice algorithm of going through the VRAM of the VGA card, setting pixels along the path of a line, is now “reduced” to 6-7 calls and 4 context switches (2 for inside/outside the kernel, 2 for inside/outside the JVM).
And if we talk about more serious things like I/O, it gets more ridiculus: one has to instantiate a file object, then a byte reader object, then open the file, then call the byte reader to read bytes, then call the O/S object to read bytes in its own buffer, then copy the data from the O/S buffer to the C buffer the JVM uses, then to the user buffer, then close the file and the byte reader…
Here is a real example that I have to confront every day: the window ‘project options’ in Visual Studio 6 is very quick. The same window in Visual Studio 2005 is very slow, because of all the work the .NET VM has to do. This happens with lots of UI parts of VS 2005, even if it is a superior product to VS6. It is annoying to have to wait for things that you wouldn’t in the previous version.

2006-03-23 3:17 pm
rayiner
1) most garbage-collected languages require that all objects are allocated on the heap. That has a big impact on many algorithms, where local stack data would be sufficient.
You mean Java and C# require objects to be allocated on the heap. Sophisticated Lisp (and probably ML) compilers will quite happily allocate non-escaping objects on the stack.
2) most garbage-collected languages use handles for objects. Handles are double pointers, and the cost of dereferencing double pointers is greater than the direct approach.
Actually, I think Java and C# both use a single-pointer representation for objects these days.
3) many languages check everything in run time, where as a simple proof at compile time would suffice.
Again, you mean Java and C# check everything at run time. Most sophisticated compilers for other high-level languages do a great deal of type inference to elide run-time checks. In any case, the cost of run-time checks on modern superscaler processors is minimal. The check can be performed in parallel, and its an easily-predictable branch.
4) many languages, especially dynamic ones, sacrifice efficiency for flexibility, without succeeding in either at the end. For example, the Java’s type system can not handle primitives like ints in generics: instead ‘int’ is converted to Integer internally.
Java is not a dynamic language, and its particular faults are not applicable to dynamic languages in general.
6) operations have to go through many layers of library code to get the job done. Take drawing of a line in Java, for example:
This is quite true in C as well, with most modern libraries. Ever see the call stack depth in a GNOME program?
I think you’d do well to realize that your complaints about C# and Java have more to do with their relatively primitive implementations than about the nature of high-level languages in general.

2006-03-23 1:57 pm
Vorlath
What most people here missed is that he states “In the old days”… Back then, P3 and PIV’s did not exist. You had the 8088 and indeed xor, inc was faster than mov. At the very least shorter.
I still wished people would stop spreading the false myth that GC’s are fast or that C is slow. GC’s obtain their speed at the expense of efficiency (in RAM) by not being friendly to other applications.
His point was simply that we think allocating objects comes for free because we don’t deallocated them. If we were more careful about object allocation, then our applications would be both more efficient, both in speed and memory.
This isn’t about Java, GC’s or all that. It’s about good programming practices in any language. Why do people turn this into Java vs. the world when there is a clear and valid message here?
2006-03-23 6:02 pm
blahblah
People (like the writer), who close their eyes and wish really really hard that they know what they’re doing. Java will allocate objects on the stack, and do all other kinds of neat things this guy would never think about. He’s why we have java.
See:
http://www-128.ibm.com/developerworks/java/library/j-jtp01274.html
Think before you speak dude.