Java vs. C++: The Performance Showdown

Wednesday Jan 6th 2010 by Liviu Tudor
Share:

It's time to settle this once and for all: For memory allocation, looping, and floating-point operation, does Java or C++ perform better?

Since the early days of the Java programming language, some have argued that Java's being an interpreted language has made it inferior to the likes of C and C++ in terms of performance. Of course, C++ devotees would never, ever even consider Java a “proper” language, while the Java crowd always throws “write once, run everywhere” in the faces of the C++ programmers.

First things first, how well does Java perform when it comes to basic integer arithmetic? If I asked you how much 2 x 3 is, you would probably answer in no time. How long would it take a program? To check, here’s a basic test:

1. Generate first X numbers of random integer numbers.
2. Multiply those numbers with every number from 2 to Y.
3. Compute how long it takes to perform the whole set.

Because you’re not interested in how long it takes to generate a random number, it is important that you generate the random numbers upfront, before you start measuring.

Putting Java to the Test

In Java, generating the random numbers is a very simple task:

``````   private void generateRandoms()
{
randoms = new int[N_GENERATED];

for( int i = 0; i < N_GENERATED; i++)
{
randoms[i] = (int)(i * Math.random());
}
}``````

The computations are just as easy:

``````   private void javaCompute()
{
int result = 0;
for(int i = 2; i < N_MULTIPLY; i++)
{
for( int j = 0; j < N_GENERATED; j++ )
{
result = randoms[j] * i;
result++;
}
}
}``````

Now, you could simply measure the time it took you to execute the javaCompute method shown above and consider that a valid test. However, that wouldn’t be fair for a few reasons. First, during the execution of a program, the JVM will load classes as needed by the classloaders. The OS itself also will prepare various data structures needed by the JVM as it requests them. So, chances are the first execution of the above function actually spends a lot of time preparing the execution. That's why measuring the time as a one-off is probably not a good idea.

You could instead run the program a few times and average the results. However, the first execution of the function will still suffer from the same problems in terms of execution time, which will prevent you from getting an accurate measurement of timing. On the other hand, the more you execute the method above, the greater the chances that some data will get cached—some by the JVM, some by the OS. As a result, you will end up with some data reflecting not necessarily the execution time but rather the result of OS code running optimization, which again will affect your averaging.

To counterbalance these problems, you can run the method above a few times and simply eliminate the “freakiest” cases: the lowest and the highest times. That should get you closer to the true average. And assuming you store the length of each time you ran the javaCompute method above in an array, here’s a simple method to achieve this:

``````   private static long testTime( long diffs[] )
{
long shortest = Long.MAX_VALUE;
long longest = 0;
long total = 0;
for( int i = 0; i < diffs.length; i++ )
{
if( shortest > diffs[i] )
shortest = diffs[i];
if( longest < diffs[i] )
longest = diffs[i];
total += diffs[i];
}
total -= shortest;
total -= longest;
total /= ( diffs.length - 2 );
}``````

Now, putting all of this together, you get the following code (also found in IntMaths.java):

``````public static void main(String[] args)
{
IntMaths maths = new IntMaths();
maths.generateRandoms();
//compute in Java
long timeJava[] = new long[N_ITERATIONS];
long start, end;
for( int i = 0; i < N_ITERATIONS; i++ )
{
start = System.currentTimeMillis();
maths.javaCompute();
end = System.currentTimeMillis();
timeJava[i] = (end - start);
}
System.out.println( "Java computing took " + testTime(timeJava) );
}``````

Notice that you’re measuring time in milliseconds here.

On my laptop (which is not quite high-spec but new enough to be representative), running the above code shows an average of about 25 milliseconds. That is, it takes my laptop about 25 milliseconds to multiply 10,000 random integer numbers with each number from 2 to 1,000, which basically means it takes 25ms to perform about 10,000,000 integer arithmetic operations.

Putting C++ to the Test

Now let’s try something similar in C++:

``````void generate_randoms( int randoms[] )
{
for( int i = 0; i < N_GENERATED; i++ )
randoms[i] = rand();
}``````

To be precise, the numbers generated in Java are not the same numbers generated in C++. Therefore, you cannot ensure that both implementations will use the same set of numbers. However, the differences between the two should be quite small.

The C++ computation is similar to the Java one:

``````void nativeCompute(int randoms[])
{
int result = 0;
for(int i = 2; i < N_MULTIPLY; i++)
{
for( int j = 0; j < N_GENERATED; j++ )
{
result = randoms[j] * i;
result++;
}
}
}``````

Because C++ doesn't seem to offer a standard way to get millisecond precision for timing the operations, this code is specifically targeted to Windows platforms. Using the QueryPerformanceCounter function provides access to high-resolution timers. To use this function, the code utilizes a simple “stopwatch” class that has just two methods (Start and Stop). Based on the timing, this class records when each of these methods is called and returns the time difference in milliseconds:

``````// Stop watch class.
class CStopWatch
{
public:
// Constructor.
CStopWatch()
{
// Ticks per second.
QueryPerformanceFrequency( &liPerfFreq );
}

// Start counter.
void Start()
{
QueryPerformanceCounter( &liStart );
}

// Stop counter.
void Stop()
{
QueryPerformanceCounter( &liEnd );
}

// Get duration.
long double GetDuration()
{
}

private:
LARGE_INTEGER liStart;
LARGE_INTEGER liEnd;
LARGE_INTEGER liPerfFreq;
};``````

You can apply a similar pattern for finding the average of the times the stopwatch class took:

``````long double test_times( long double diffs[] )
{
long double shortest = 65535;
long double longest = -1;
long double total = 0;
for( int i = 0; i < N_ITERATIONS; i++ )
{
if( shortest > diffs[i] )
shortest = diffs[i];
else if( longest < diffs[i] )
longest = diffs[i];
total += diffs[i];
}
total -= shortest;
total -= longest;
total /= ( N_ITERATIONS - 2 );
}``````
This pattern leaves you with the following implementation (present in IntMaths.c):
``````int main(int argc, char* argv[])
{
int randoms[N_GENERATED];
generate_randoms( randoms );

CStopWatch watch;
long double timeNative[N_ITERATIONS];

for( int i = 0; i < N_ITERATIONS; i++ )
{
watch.Start();
nativeCompute(randoms);
watch.Stop();
timeNative[i] = watch.GetDuration();
}
printf( "C computing took %lf\n", test_times(timeNative) );

return 0;
}``````

Find the above code in the IntMaths project included in the JavaVsCPP.zip code download.

Running the above code on my laptop returns the following (the measurement is in milliseconds):

``C computing took 0.001427``

So, it took the C++ code about 1/1000th of a millisecond to perform about 10,000,000 arithmetic operations!

To compare: 25ms in Java, 0.001ms in C++. That's quite a difference! However, bear in mind that the C++ code was compiled using full compiler and linker optimization for speed. Simply disabling the optimization will you return the following result:

``C computing took 70.179901``

Ouch! That is three times slower than the Java version! The moral of the story is: yes, C++ performs better at first glance, but (and this is a big but) only if the compiler optimizes the code well!

Most C++ compilers nowadays will perform a decent optimization of the generated code. However, the difference between 1/1000th of a millisecond and 70 milliseconds is left in the hands of the compiler. Always bear that in mind when you switch to C++ for speed reasons!

You’ve seen how Java and C++ compare in integer arithmetic, but what about floating point? Let’s face it, you would struggle to find an application that uses only integer arithmetic—most programs rely on floating point computation at some point (even if it is for the simple purpose of averaging two numbers).

Using the same approach as in the previous example, consider the results of generating some random floating point numbers, multiplying them with each other, measuring the time it takes, and then averaging those times. This time though, to ensure you are “properly” multiplying floating point numbers, generate two sets of random numbers and multiply them with each other:

``````   private void generateRandoms()
{
randoms = new double[N_GENERATED];
for( int i = 0; i < N_GENERATED; i++)
{
randoms[i] = Math.random();
}

multiply = new double[N_MULTIPLY];
for( int i = 0; i < N_MULTIPLY; i++ )
{
multiply[i] = Math.random();
}
}

private void javaCompute()
{
double result = 0;
for(int i = 0; i < N_MULTIPLY; i++)
{
for( int j = 0; j < N_GENERATED; j++ )
result = randoms[j] * multiply[i];
}
}``````

The above code (which you can find in DoubleMaths.java) generates the following result on my laptop:

``Java computing took 47``

In other words, it takes on average 47 milliseconds to perform about 10,000,000 floating-point operations (multiplications) in Java (roughly twice as much time as it took to perform integer arithmetic operations).

Now, let’s look at how well C++ performs when it comes to this (find the code in the DoubleMaths project included in the JavaVsCPP.zip code download):

``C computing took 0.001477``

Again, this is the result of using the compiler optimization for speed; \disabling any compiler optimization renders:

``C computing took 84.734633``

So, using an optimized compilation seems to hardly affect the C++ version. Using a decent compiler would render little difference between integer and floating point arithmetic, and what's more, the code generated would be around 25,000 times faster! However, you do have to choose your compiler carefully, or you could end up with execution times twice as long as those with the Java code!

Number Comparison

In terms of computations, so far C++ seems to be winning. But how do the two languages perform when it comes to number comparisons? Consider two examples:

1. Two integers are used in an if statement. The if statement will perform a simple assignment if the statement is true.
2. An if statement is used with floating point numbers involved in a tested expression.

For first example, you will generate a series of random integer numbers and traverse the array comparing the previous number in the series with the current one. If the current one is bigger, you’ll simply store it in a variable. Thus, at the end of the traversal, you will have found the largest number in the series.

You'll be using the same number-generation method as in the previous examples:

``````   /**
* Generate random numbers
*/
private void generateRandoms()
{
randoms = new int[N_GENERATED];

for( int i = 0; i < N_GENERATED; i++)
{
randoms[i] = (int)(i * Math.random());
}
}``````

The same goes for the execution time averaging, where you will use the same method again. However, the way you are going to execute this will differ. You could of course allocate a long array (a few tens of millions or so) of integers and traverse it (as described above), but you would run into another issue: the indexed memory access time (which is explained shortly). Instead, you will use a small array of int’s (100 in this case) and repeat the operation 100,000 times. And you will time the operation this way.

In Java, the code comes down to this:

``````   public static void main( String args[] )
{
IntComparison comp = new IntComparison();
comp.generateRandoms();
long timeJava[] = new long[N_ITERATIONS];
long start, end;
for( int i = 0; i < N_ITERATIONS; i++ )
{
start = System.currentTimeMillis();
for( int j = 0; j < N_REPEAT; j++ )
comp.javaCompare();
end = System.currentTimeMillis();
timeJava[i] = (end - start);
}
System.out.println( "Java compare took " + testTime(timeJava) );
}``````

Running the above generates the following result on my laptop:

``Java compare took 50``

So, on average it takes 50 milliseconds to perform 100,000 x 100 = 10 million integer comparisons.

Let’s have a look at the result of a similar implementation in C++ (find the source in the IntComparison project included in the JavaVsCPP.zip code download):

``C computing took 0.001971``

Draw you own conclusion, but remember that the code was compiled using “optimize for speed” settings.

So far, you’ve been looking at small chunks of data that are being accessed, but how does the memory allocation and access model perform when it comes to indexed data (i.e., arrays)? To find out, consider simply iterating over an array with a lot of items in it (millions!) and comparing the time it took to simply access each element. For the purpose of this exercise, accessing it simply means reading the data, storing it into a variable, and then writing it back into the array.

So the actual function you are going to measure looks like this in Java:

``````   private void javaTraverse()
{
int temp = 0;
for( int i = 0; i < N_ELEMS; i++ )
{
temp = array[i];
array[i] = temp;
}
}``````

Running the above code (found in ArraysAccess.java) renders the following result:

``Java traverse took 53``

So it takes on average 53 milliseconds to traverse an array of 10 million entries! Implementing the equivalent C++ code (ArraysAccess project included in the JavaVsCPP.zip code download) is a bit different from the previous examples because C++ allows for up 65,535 elements in an array by default. To overcome that, this example uses a bit of Windows API again and incorporates the GlobalAlloc function, which allows for the allocation of large chunks of memory:

``````int main(int argc, char* argv[])
{
int * randoms;
HGLOBAL h = GlobalAlloc( GPTR, sizeof(int) * N_GENERATED );
randoms = (int *)h;
generate_randoms( randoms );

CStopWatch watch;
long double timeNative[N_ITERATIONS];

for( int i = 0; i < N_ITERATIONS; i++ )
{
watch.Start();
nativeTraverse(randoms);
watch.Stop();
timeNative[i] = watch.GetDuration();
}
printf( "C traversing took %lf\n", test_times(timeNative) );

GlobalFree( h );
return 0;
}``````

As you can see, you’re simply allocating 10 million int’s again using GlobalAlloc, and you then traverse this in the same manner as you do in Java. The average result of this operation is:

``C traversing took 10.857639``

So, it is about five times faster than Java. (However, compile this code with optimization disabled and the timings take about 2-3 times longer than those recorded in Java!)

Memory Allocation

One of the arguments C++ programmers are confronted with quite often is the memory management issue. While Java looks after this for you, C++ memory management is subject to constructors and destructors and the dreaded new/delete pair!

To compare memory allocation performance, start with a simple task: allocating an array of 1,000 bytes repeatedly and measuring the time it takes to allocate and de-allocate it. This will be a tricky task in Java because there is no definite way to free up memory (as a reminder, System.gc is merely a suggestion to the JVM that it may free up some memory if needed, but there is no guarantee). However, just accept that the measurement taken in Java might not include the freeing up of memory timing, as you will still use them in the comparison. (After all, as stated before, one of the main arguments between C++ and Java developers is that you don’t have to worry about memory management in Java whereas you do have to take time to handle it in C++!)

Notice that you are talking array of bytes—not integers! That’s because in Java the integer has a different size from the integer in C++, and allocating different types of arrays would be unfair!

In Java, the function for allocation therefore looks like this (IntAlloc.java):

``````   private void javaAlloc()
{
System.gc();
elements = new byte[N_ELEMS];
elements[0] = 0;
}``````

Note that you are first suggesting the garbage collection and then allocating the memory. In order to prevent the compiler from being “clever” and not allocating the memory until it’s actually accessed, you’re forcing it to by setting the first element in the array.

Also, for the purpose of this test, you're going to measure your time in nanoseconds because milliseconds might not be enough. In Java, you'll be using the System.nanoTime function for this exercise. (May I remind you that a millisecond = 1,000,000 nanoseconds!)

Bearing this in mind, the average timing of running the above looks like this:

``Java memory allocation took 11,755,693``

So it took about 11 milliseconds for 10,000 bytes to be allocated. Let’s try the above without suggesting the GC (simply commenting out the System.gc line in the javaAlloc function above):

``Java memory allocation took 12,994``

This indicates that occasionally, as a result of your suggesting the GC via System.gc(), the garbage collection kicks in and recollects the memory used. It also suggests that if you don’t worry about the garbage collection, it takes less than a millisecond to allocate 10,000 bytes—more specifically, it takes about 13,000 nanoseconds—a rough average of 1.3 nanoseconds per byte!

Now let’s try this in C++ (IntAlloc project included in the JavaVsCPP.zip code download):

``C allocation took 3661.273463``

This might come as a surprise to you, but guess what: The memory allocation in C++ is comparable to Java's. It’s hard to tell based on these measurements which one is faster, as they are all less than a millisecond. However, they are of similar magnitude. And again, bear in mind that this is the result of the C++ optimized code you're comparing against!

As for not using the System.gc, ask any Java programmer and you will find that apart from those programming for real-time systems, very few in fact actually use the System.gc. They mostly leave the memory management to the JVM—as it is implemented cleverly enough to “know” when to kick in so it doesn’t affect system performance that much.

It is interesting to note though, for those of you who do have the patience to go through my code and change it a bit, that if you change the allocation to int—neither the Java nor the C++ “suffers” that much!

You’ve seen how both languages perform when it comes to primitive data types. What if you take this one step further: how about objects? For this, you'll consider a class that maps the “complex number” data type, a type with two components: a “real” part and an “imaginary” part—both being floating point numbers. So let’s try to implement this class in both languages and then try to instantiate it a few times and see whether you can draw any conclusions from that.

In Java, as you would imagine, you simply have two private members that store the data and expose it via getters and setters. Also, as to be expected, on top of the default constructor you'll provide a constructor that takes two values and initializes the members with the values provided (Complex.java):

``````public class Complex
{
private double real;
private double imaginary;

public Complex()
{
this( 0.0, 0.0 );
}

public Complex( double real, double imaginary )
{
this.real = real;
this.imaginary = imaginary;
}

public double getReal()
{
return real;
}
public void setReal(double real)
{
this.real = real;
}

public double getImaginary()
{
return imaginary;
}
public void setImaginary(double imaginary)
{
this.imaginary = imaginary;
}
}``````

Similarly, C++ supports the following implementation:

``````class Complex
{
public:
Complex(double real = 0, double imaginary = 0)
{
this->real = real;
this->imaginary = imaginary;
}

double getReal()
{
return real;
}
void setReal( double real )
{
this->real = real;
}

double getImaginary()
{
return imaginary;
}
void setImaginary( double imaginary )
{
this->imaginary = imaginary;
}

private:
double real;
double imaginary;
};``````

The code used (in the ComplexCreate project included in the JavaVsCPP.zip code download and in ComplexCreate.java) is very simple. However, a major difference is that the instantiation process in Java takes two steps:

1. Allocate memory for the actual array
2. Create each object

In C++, however, you can do this in one step:

``arr = new Complex[N_GENERATED];``

Using the new operator will in fact create and initialize the items in one step. Looking at the execution of the C++ and Java code side by side, you get the following results:

``````(Java) Java create took 710788

(C++) C instantiation took 29348.262052``````

These times are again in nanoseconds. And for those who can’t figure it out, the difference is about 710,000 (Java) compared with 29,000 (C++). Ahem, no comment :D

C++ Wins—with Some Help

There certainly is a lot more to a programming language than memory allocation, looping, and floating-point operation, but you will struggle to find a program that doesn’t use at least one of those. And when it does—whether you like it or not—an optimized C++ compiler can deliver faster code than Java. However, that does raise the question of whether Java compilation (I’ve used the standard JDK compiler with no extra flags) itself needs more optimizations that will improve execution times in cases similar to the ones shown above.