Salivating over the idea of using Amazon EC2 GPU Instances for Map-Reduce via Cuda 4.x

Cost is only about $2.10 per hour ($1500+ per month) but just think of how cool a Map-Reduce process might be with all those Cuda Cores churning away on the data.

Amazon offers GPU Instances for something like $1540 per month; Tesla GPUs (2x NVIDIA Tesla “Fermi” M2050 GPU has 448×2 Cuda Cores) can do a whole lot of Map-Reduce for the money; one might not need very many of these connected to compressed EBS to realize some potential for some form of Analytics process, for instance.  The real question is just how many customers can a single Amazon GPU Instance handle before we need to scale and then what is the real scaling cost going forward.  Each customer’s data could be stored in a separate compressed EBS Volume all handled by a single Amazon GPU Instance or Amazon GPU Cluster.  We might be able to greatly reduce cost while providing very good performance across the board.

The storage cost might be reduced by using a compressed file system for Linux, like fusecompress or we might build a custom solution using FUSE.  Seems like a fair number of people are using fusecompress for Ubuntu, might also be available for Red Hat.  This could help to mitigate the Amazon EBS cost.  All that remains is checking to see how we might leverage micro instances at Amazon to store the Hadoop data combined with a real-time solution for gathering aggregations – who knows this could reduce deployment costs by a fair margin.

Keep in mind raw data can easily be compressed by a factor of 1/30th the original size.

I may even go out and buy a slick Nvidia 560 Ti or the like just to get my hands on something close to 500 Cuda Cores rather than the 64 I have in my 2+ yr old laptop that just happens to have a dual GPU SLI setup; not bad for development and testing.

On-Demand High-Power Map-Reduce via NVidia CUDA and Stackless Python might just be too much to let sit idle… might have to begin paying Amazon for some time on their Massive GPU Instances just for fun some weekend.

Java is half the performance of Python and Ruby Fails to run this benchmark !

Benchmarks are about as valuable as… lies, damned lies and…

However given the limited value in doing benchmarks this particular use-case can highlight the relative weakness of Java while highlighting the relative strength of Python while highlighting the fact that even though Ruby was built for recursion it can FAIL completely or at-best give sloppy performance.

Be warned, if you love Java and/or Ruby you might not like this benchmark and the results presented here…

First the problem:

Factorial… plain and simple.

I wanted to see how well Stackless Python fares against standard Python where Stackless used recursion or something closer to recursion and regular Python 2.5.5.x used generators.  Bottom line is, generators smoke but you already knew this.

Then I wanted to see how well Ruby 1.8.7 versus Ruby 1.9.2 might fare since Ruby is known to be heavily optimized for recursion.  Bottom line is Ruby 1.8.7 falls dead-even with Stackless Python (DUH!) and well behind the Python Generator however Ruby 1.8.7 fares very well against Java and Java came in dead last across the board.  Ruby 1.9.2 fails completely for the most extreme use-case, as shown below.

The Results:

Factorial program using Stackless Python.
1000! / 998! = 999000
         8 function calls in 0.010 CPU seconds
10000! / 9998! = 99990000
         8 function calls in 0.234 CPU seconds

Factorial program using Python Generators.
1000! / 998! = 999000
         2004 function calls in 0.002 CPU seconds
10000! / 9998! = 99990000
         20004 function calls in 0.135 CPU seconds

Factorial program using Ruby 1.8.7.
1000! / 998!1000! / 998! = 999000
  0.016000   0.000000   0.016000 (  0.010002)
             user     system      total        real
10000! / 9998!10000! / 9998! = 99990000
  0.421000   0.062000   0.483000 (  0.555111)

Factorial program using Ruby 1.9.2.
1000! / 998!1000! / 998! = 999000
  0.000000   0.000000   0.000000 (  0.003000)
             user     system      total        real
10000! / 9998! FAILS due to Recursion Limit

Factorial program using Java.
1000!/998! = 999000
Runtime is :0.017

10000!/9998! = 99990000
Runtime is :0.389

Results Analysis: (From Fastest to Slowest, Top-down)

All runtimes are given in seconds unless otherwise indicated.
Factorial program using Ruby 1.9.2.
10000! / 9998!                                                          FAILS due to Recursion Limit

Factorial program using Python Generators.
1000! / 998! = 999000                                                    0.002 CPU seconds

Factorial program using Ruby 1.9.2.
1000! / 998! = 999000                   0.000000   0.000000   0.000000 (  0.003000)

Factorial program using Stackless Python.
1000! / 998! = 999000                                                    0.010 CPU seconds

Factorial program using Ruby 1.8.7.
1000! / 998!1000! / 998! = 999000         0.016000   0.000000   0.016000 (  0.010002)

Factorial program using Python Generators.
10000! / 9998! = 99990000                                                0.135 CPU seconds

Factorial program using Java.
1000!/998! = 999000                                                      0.017

Factorial program using Stackless Python.
10000! / 9998! = 99990000                                                0.234 CPU seconds

Factorial program using Java.
10000!/9998! = 99990000                                                    0.389

Factorial program using Ruby 1.8.7.
10000! / 9998!10000! / 9998! = 99990000   0.421000   0.062000   0.483000 (  0.555111)

Ruby 1.9.2 came in first or last, depending on how you look at such things, but only because it failed to run for the most extreme case.
Python 2.5.5.x Generators smoke, as expected !
Ruby 1.9.2 comes in just behind Python Generators. So much for Ruby 1.9.2 being the Python killer.
Stackless Python comes in just ahead-of or dead-even with Ruby 1.8.7 depending on how one takes the Ruby profiler results.
Java comes in ahead of Ruby 1.8.7 cross the board but only because one has to use BigDecimal to get Java to handle the large numbers. Python handles the transition from regular integers to large numbers as they are being processed - Java might fare better if this were the case for Java but sadly it is not.
Ruby 1.8.7 comes in dead last.
Download the code.

Final Thoughts

Obviously Ruby Generators would have fared better than recursion however given that Ruby recursion is supposed to smoke I felt it was only fair to test recursion in Ruby with Python Generators because Python Generators are supposed to smoke.  Test what is supposed to smoke against what is supposed to smoke.
Java sucks for any problem not covered by the typical design of the JVM however since Python has been tweaked over the years it tend to perform very well and more than adequately given it is dynamic where Java is more static at runtime.  If Python had been more heavily optimized it surely would have done much better however the goal here was to use off-the-shelf solutions rather than the hand-optimized kind.
Needless to say, Python could be made to run a whole lot faster than Java especially when the typical boundaries are being pushed well beyond the typical limits.  This is the ugly secret about Java but then look at how Java is being used in the real-world and you too will see where and how Java fails to please.

Python is the other white meat !

Java is the Enterprise favorite now but... Python is the language that gets the most lovin' when it comes to squeezing the most out of any programming problem.  I have personally pushed Python to the mat in terms of performance and it comes back asking for more.  I do not see anyone in the Enterprise doing anything like this with Java however I do see more multi-tier solutions with Java than would be required to make the systems work.

First Impressions of iMac Core i7 + OS/X Lion

OS/X Lion tends to break some OS/X Leopard software…

I don’t have a list for you here… suffice it to say if you wanted to upgrade to Lion right now, you might want to wait a good long while to let everybody else update their Open Source code or you may run into some issues.

SVN Clients for OS/X Lion are a pain…

Apple needs to ship Parallels Desktop in OS/X

OS/X cannot do it all !

This was the first thing I had to learn about OS/X… not being used heavily by software developers… some Open Source goodies one might want to use just won’t be available or won’t be as nice as one might find when working with Windows.

For instance, there ain’t nothing like Tortoise SVN for OS/X and close don’t cut it.  Parallels Desktop running Windows 7 Ultimate takes care of this problem.

Setting up a simple SSH Server in OS/X is a pain – simple to do with Linux but not so simple with OS/X – seems like Steve Jobs feels like nobody should be able to use OS/X Leopard as a headless file server cuz there ain’t no easy way to get a simple SSH Server up and running without Parallels Desktop running Ubuntu, for instance.  Or maybe I just don’t get the genius behind OS/X which could be my problem… in any case, if I gotta spend 3+ hrs trying to make a simple SSH Server run in OS/X and it’s not working to the point of I gotta do some research then you can forget me thinking it’s gonna happen anytime soon.  Setting up an SSH Server literally takes seconds for just about any Linux including installing them keys so I can disable password logins – nobody needs to use my SSH Server unless they got a key with a passphrase – seems like I just gotta have security.

Parallels Desktop running Ubuntu Server 11.04 solves this problem.

The bottom line is… OS/X alone is a huge waste of time for a software developer unless you make sure to install and use Parallels Desktop 6 cuz you will have to use Windows 7 and/or Ubuntu or some combination of these flavors.

Microsoft should cozy up with the Parallels folks too

I have been saying this for several years by now but… Windows needs to be combined with Linux and Ubuntu makes the most sense.

Parallels for Windows with Ubuntu 11.04 would do the trick.  VmWare and VirtualBox can’t hold a candle to Parallels Desktop 6 – sorry but I have been smitten by Parallels Desktop 6.

Makes more sense to run Mac OS/X in Windows that vice-versa

Every so often someone asks me to take some App I wrote for Windows and run it on a Mac – Adobe AIR 2.6+ makes this a breeze especially when I can simulate the Windows File Naming conventions in Mac OS/X using a single AS3 function… been there and done that.

What sucks about this is having to drop the bucks for a Mac just to make this happen with all the subtleties of Mac OS/X you just gotta have the hardware to make this happen sometimes.  Saves money and time if Parallels Desktop for Windows would be as good as Parallels Desktop for OS/X so someone like me could run Mac OS/X in Windows since using Windows makes development a breeze as compared to how this happens in Mac OS/X or doesn’t happen as the case may be.

Mac OS/X handles Apps in a childish way

Why does Steve Jobs get so anal about how you can use Apps in Mac OS/X ?!?

I can only use one App at a time… the App I want to use must be in focus… this is the only way the one menu bar for the whole OS will allow me to interact with the App… me thinks Steve Jobs has never used OS/X for doing anything serious.

Windows allows me to do whatever I want whenever I want however I want.

Windows just works better for Apps.  Sorry but this is how I like to roll… tons of windows open at the same time, I get to use them all without having to tell the OS which one I want to use next.

Mac OS/X would be way to clumsy for serious development let alone power use.

Getting back to work using Windows…

By the way… I happen to own an iMac Core i7 fully loaded with all the bells and whistles… along with not one but two Mac Minis.  Just didn’t want you to think I am some kind of Windows snob.  I just happen to think Windows lets me rock-and-roll the way I want to and OS/X wants me to do just one thing at a time before moving along to the next one thing I want to do.  Easier to handle multitasking this way and not only because this reminds me of cooperative multitasking which is far easier to code than real multitasking but then Microsoft is just more serious about their OS’s than Apple could ever be based on my experience with using OS’s for hard-core programming.

PogoPlug is FASTER than DropBox for my needs !!!

Been using PogoPlug for a while in parallel with DropBox with PogoPlug Agent running on one of my Mac Mini boxes (seems Apple is good for something after-all albeit a bit pricey as compared with the Atom alternatives since Mac Mini’s run pretty cheap).

PogoPlug is FASTER !!

Well FASTER than DropBox anyway…

FASTER than that damned silly NAS I wasted money on last year – it has something like 8 TB but is as slow as molasses even though it is connected to my LAN via 1 Gbps.  Go figure.

PogoPlug Agent is fast enough to be useful when it says I have 14 TB spread between the slow NAS and the Mac Mini that has 6 TB in the form of 3 TB USB 2.0 drives.  Damn Apple for not supporting USB 3.0 !   LOL   But at-least USB 2.0 is FASTER than SATA II NAS !!!  If you can believe that !!!  Or maybe the Mac Mini is just FASTER than whatever OS is running in the SATA II NAS ?!?   Who knows why it is FASTER… I just like FASTER.

PogoPlug Agent is FAST-enough to allow me to go searching for files from Windows 7 while at work with reasonably quick results even when no file indexing was done by me other than whatever PogoPlug is doing for me.

Is PogoPlug Secure ?

Who knows… all I know is if I didn’t build it then it is likely not secure enough for my tastes.

PogoPlug is likely not doing SSH when shipping files to and fro – seems too fast for any kind of encryption but for all I know they use SSL or nothing.

Back to learning more about the PogoPlug while I trust their software is not allowing every one on the planet into my files like DropBox has been known to do.

Needless to say, I am getting closer to ditching DropBox for my most important files… just getting some other things done before I get around to it while giving PogoPlug a spin for a while.


MySQL for BigData

If all you have to work with is MySQL but you have PetaBytes to store… you could be in trouble unless… you happen to be me…

Assumption #1

Relational databases love executing really small SQL Statements.

Assumption #2

Relational databases do NOT have to use any relational features.

Assumption #3

Networked Object-Oriented data models are very efficient when all you have to work with is a Relational Db as the data management platform.

Assumption #4

BigData solutions tend to use really big heaps of key/value storage systems because the data can be spread-out over a large number of modes easily.

Assumption #5

Many instances of MySQL can execute the same query faster than a single instance because the distributed query can be executed in parallel.

Assumption #6

Forget everything you ever thought you knew about how to cluster MySQL because all that crap won’t help you when you have PetaBytes to store and manage efficiently.

Solution #1

Store your BigData in many instances of MySQL (think 10’s or 100’s) using a Networked Object-Oriented Data Model where key/value pairs are linked to form objects using nothing but Metadata in the form of key/value pairs while spreading the data out to all available MySQL nodes and then execute the SQL required to retrieve Collections of Objects in parallel and MySQL can be nice and fast for BigData.

Caveat #1

Do you know what is meant by “Networked Object-Oriented Data Model” ?!?  Probably not but this gives you something to figure-out while looking for all those cheap computers you will use to form your MySQL Network.

Caveat #2

Do you know what is meant by “executing the same SQL Statement in Parallel” ?!?  Probably not but this gives you something to figure-out while you think about the prior Caveats.

Caveat #3

Do you know the process of fetching data from all those MySQL Instances can be done using a single SQL Statement ?!?  Probably not, but then you probably forgot to read-over and understand Assumption #6 from above.  Think about Collections of Objects more than Rows of Data.

Caveat #4

Keep it super-simple.  Super-Simple runs faster than the other thing.

Computers are really stupid but can be fast.

Stupid requires simple.

Simple is FAST.

BigData is FAST when the solution is parallel but stupid simple.

Caveat #5

Try to optimize each MySQL Instance by increasing the available RAM to a minimum of 4 GB per instance using 32-bit MySQL running in a 32-bit Linux OS but use VmWare Workstation to run each instance using a separate CPU Core with a minimum of 1 VmWare Workstation Instance per CPU Core.  Unless you can find a MySQL Implementation that automatically uses multiple cores and then you have to give some serious thought to how to make all them MySQL Instances execute the same SQL Statements in parallel – better think about this one for a while… I already know how to do this but you might not.


HADOOP Optimization Technique #1

HADOOP is slow !

BigData should be FAST !

Single Server installations for HADOOP tend to want to use the entire multi-core CPU for one single HADOOP instance.

Assumption #1

The Java JVM has NOT been optimized for multiple cores for anything other than garbage collection when one uses an out of the box JRE.

Assumption #2

The HADOOP has NOT been optimized for multiple cores for anything other than garbage collection based on Assumption #1.

Assumption #3

Most servers HADOOP might run on probably have multiple cores especially when Intel or AMD chips are being used due to the need to keep Moore’s Law alive in a Universe where the upper bound for CPU performance is the RAM bus speed.

Assumption #4

VmWare Workstation Appliances can be run each using a separate core when the host OS is Windows Server 2008 R2.

Assumption #5

VmWare Workstation Appliance Instances will be run at the HIGH Priority setting (one level below Real-time for Windows Server 2008 R2).

Assumption #6

VmWare Workstation Appliance Instances will be given 4 GB RAM using 32-bit HADOOP in a 32-bit Linux OS; all software being used is 32-bit.  No 64-bit code will be used.

Possible Solution #1

If the server has 4 cores when run 4 instances of HADOOP each in a separate VmWare Appliance where each VmWare Workstation instance is dedicated to one of the available cores.

Scale for the number of cores.

Continue packing-in separate VmWare Instances using VmWare Workstation until the aggregate performance begins to degrade and then use empirical performance data to determine the optimal configuration.

Caveat #1

Solution #1 has not yet been tried however based on the available information it should produce better performance for HADOOP and/or Java in general.


DropBox.Com is out, PogoPlug.Com is in !!!

Read the fine print from your latest DropBox Policy Statement and you will notice your files belongs to them !

DropBox has been known to leave the door open whenever they wish which means your files can and have been viewed by others.

DropBox likes to play games with your files and mine to get the most from their storage back-end systems.

DropBox is about to get the boot from me simply because there is a better solution with a much lower cost.

The better solution, this month, is PogoPlug.Com

What is PogoPlug ?

If you want to buy the optional hardward you can do so however it ain’t required.  This may be why there are so many PogoPlug units up for sales at eBay lately.

All you need to use PogoPlug is their FREE software.

Download the FREE software, install it on every computer you want to access remotely and then you got your own DropBox without letting others get into your files, potentially.

PogoPlug works GREAT for small files, obviously

A test on a 2.5 GB file resulted in a very slow process of getting that file into my LAN at home but after several hours it was done.

Small files are a breeze, obviously.

Keep in mind I am not using the PogoPlug device when copying files to my own PogoPlug Storage connected to my Mac Mini where the PogoPlug software is installed, so who knows how much faster the PogoPlug device can copy files when the action is performed using the PogoPlug device.  I have concerns as to how easily the PogoPlug device can share files on a LAN – that use-case is more interesting to me than being able to use my own DropBox… time will tell.  Who knows, I may be selling my PogoPlug device on eBay once I have gotten my fun out of it.  Had I known how the PogoPlug software works I would not have bothered with the PogoPlug device, but that’s life !

Now all I need is to get rid of Catch.Com !

I am not all that comfy with allowing others to handle my valuable data with TLC.  Heck, I am not all that into letting Google do it and I surely don’t want DropBox messing around with my goodies any more.

We all should be skeptical of those who claim to provide FREE services because they all got to make some $$ somehow and most of the time they will try selling whatever they can to raise money – they could be selling your information.

Beware… and be safe.  Always use safe computing, use encryption !!!

%d bloggers like this: