Cloud Security

Secure your Virtual Machines in the cloud by doing the following:

  1. Reduce your open ports to as few as possible.
  2. Do NOT allow any process serving as a TCP/IP listener to run as root.
  3. Change your SSH port from 22 to something other than 22.
    1. Change this every day if you are paranoid.
    2. Change this every hour if your are crazy paranoid.
  4. Use only Public Key Encryption for SSH access.
    1. Change your Public Keys every day if you are paranoid.
    2. Change your Public Keys every hour if you are crazy paranoid.
    3. Use only 2048 bit keys whether paranoid or not.
  5. Deny root level access for all users other than via the console and ensure the console requires physical access in a secure building.
    1. Deny root level access completely if paranoid.
  6. Encrypt your disks – this keeps those who may steal your VM Image from being able to use it.

Hackers will exploit those details you have neglected !!!

Leave too many ports open with root level access and YOU will be hacked !!!

Make things too convenient for your own use and YOU will be hacked !!!

Remain ignorant of how hackers work and YOU will be hacked !!!

Be lazy and stupid and YOU will be hacked !!!

 

 

 

Advertisements

SalesForce.Com Issues/Gotchas aka. Bugs/Defects (Redux)

Yet another undocumented SalesForce Bug has been found !!!

This bring my personal count to 4 and climbing…

It is rather odd these kinds of defects can exist in such a prestigious framework as SalesForce/VisualForce/Apex with so many people using it when other web-based Frameworks (Django and others) do NOT exhibit such behaviors.

I have revised #2 and #3 while adding some additional issues since my last post:  (This is where experience counts… since I know what is expected when unexpected things begin to happen…)

SalesForce.Com Issues/Gotchas aka. Bugs/Defects

1). Cannot reference null values for <apex:outputText/> or <apex:inputText/>.
2). Cannot reference non-String objects with <apex:outputText/> or <apex:inputText/>.
2a). Under some circumstances VisualForce will allow String objects to be referenced by <apex:outputText/> or <apex:inputText/> tags however after a certain threshold has been reached the Force IDE will begin to complain about what it calls SObject references after which the only acceptable correction is to reference the result from a SOQL query – this can clash with #2 for some obvious reasons.
3). Cannot retrieve the Parameters from ApexPages.currentPage() more than exactly once because the associated [Parameters] information is lost for subsequent invocations.
4). Cannot pass back the ApexPages.currentPage() when expecting the user to make a correction in the <apex:inputText/> because the VisualForce page will remember the values from the Parameters thus ignoring user inputs for subsequent form posts.

Ubuntu Enterprise Linux 11.04

Everyone knows RHEL (Red Hat Enterprise Linux) is all “Enterprise” just because “Enterprise” is in the name – Doh !

Now I give you Ubuntu Enterprise Linux 11.04

Method #1

This is really super-simple because as my readers know, the simpler the better even when simpler is largely overlooked by the masses just because it may be perceived to be just-too-simple.

  1. Install RHEL (any version works) in the computer of your choice, the computer cannot be older than 2009 to work. (DO NOT USE A VM)
  2. Install the latest VirtualBox version 4 or later.
  3. Create a VM using VirtuBox 4.x in RHEL.
  4. Install Ubuntu Server or Desktop 11.04 in the VM you created in Step #3.
  5. Done !
Now all I/O will flow through the Magic Unicorn OS known as Red Hat – and now Ubuntu is using Red Hat for everything one might want to use Red Hat for.
Just in case some of you are reaching for your phones to call the nearest Asylum to have little ole me admitted on a 72 hour administrative hold… LOL  Keep in mind this is being done by your nearest Citrix Xen Server 5.6 because it too uses RHEL as the host OS in which you will be expected to run your Guest OS in a VM – hehe.
I actually proposed this to a Manager I have been working with but the idea was not embraced, probably because it just makes too damned much sense especially if the product you are managing runs in Debian but your I.T. support people are telling you they only want to support RHEL because it is the blessed OS for the Enterprise.
This same technique works great for any guest OS – Windows in Red Hat and other variations.

Method #2

Install apt-get in Red Hat – don’t laugh this has been done – google it.
At the end of the day, Linux code is Linux code and works in any Linux – all roads lead to the same China where Linux is concerned.  Even those who say they are adding special powers to their favorite Linux (RHEL) are really doing very little other than branding the same Linux everyone else is using for their own use.  If any Linux were gonna have Magic Unicorn Powers it would surely be RHEL because it says it is “Enterprise” and it should surely be measurable better than the rest but this just ain’t the case, in real terms.
Slap apt-get into RHEL and you get the best of Ubuntu in RHEL without Ubuntu.

Conclusions

Crazy ideas only seem “crazy” until they catch on with the masses.
Not all that long ago we all might have scoffed at the idea of using Virtual Machines rather than real computers until we learned just what the Cost of Operation is for a real server – deploy real servers by the thousands and then get ready to buy your own small power plant because that’s exactly what you will need every time you see the electric bill.
The idea of running a VM inside a host OS is nothing more than yet another way to achieve the same goal as VMWare for ESX and other products you have to spend real money to get.
Cheers.

SleepyMongoose meets the Python after dark

 

For a while now I have been looking for a reasonable

REST-based NoSQL Database I can use from the Google App Engine just to avoid
having to pay Google for their database fees.

Earlier today… I was able to make the SleepyMongoose
snuggle-up with the Python for some after-hours fun in the dark.

SleepyMongoose

You can read-up about SleepyMongoose here.

It’s a REST-based interface for MongoDB that runs as a lightweight Python HTTP
Server – I will eventually get around to making this SleepyMongoose wake-up by
shoving some Stackless Python into it just for fun and because Stackless always
makes Python work better.

So what does the SleepyMongoose need once I get around to
using it from the Google App Engine ?

SleepyMongoose needs a slick Python wrapper so I don’t gotta
deal with all that REST stuff which after-all the REST interface is only needed
because Google doesn’t seem to care much about bandwidth consumption for
anything other than their own database and it just makes sense to make Google
give me a FREE No-Cost MongoDb Proxy which is what this is all leading to
anyway.

SleepyMongoose meets the Python

 So, since I am a pretty good Python coder I chose to whip-up

a Magic little object that allows me to get away from REST and right into
Python objects.

I implemented the SleepyMongoose API as documented in their Wiki.

Keep in mind the code presented here requires the Vyper
Logix Library as found here (shameless
plug, of course
).

Here’s a sample of what the usage looks like:

from vyperlogix import
mongodb

 

if (__name__ == ‘__main__’):

    import os,sys

   

    sm = mongodb.SleepyMongoose(‘127.0.0.1:27080’).db(‘gps_1M’).collection(‘num_connections’)

    results = sm.connect()

    print ‘sm.connect() #1
–> ‘
,results

 

    results = sm.server(‘mongodb://127.0.0.1:65535’).connect()

    print ‘sm.connect() #2
–> ‘
,results

 

    results = sm.server(‘mongodb://127.0.0.1:65535’).name(‘backup’).connect()

    print ‘sm.connect() #3
–> ‘
,results

 

    results = sm.docs({“x”:2}).insert()

    print ‘sm.insert() #1
–> ‘
,results

 

    results = sm.find({“x”:2})

    print ‘sm.find() #1 –>
,results

 

    results = sm.criteria({“x”:2}).remove()

    print ‘sm.remove() #1
–> ‘
,results

 

    results = sm.docs().insert({“x”:3})

    print ‘sm.insert() #1
–> ‘
,results

 

    results = sm.find({“x”:3})

    print ‘sm.find() #2 –>
,results

 

    results = sm.remove({“x”:3})

    print ‘sm.remove() #2
–> ‘
,results

 

    results = sm.docs([{“x”:2},{“x”:3}]).insert()

    print ‘sm.insert() #3
–> ‘
,results

 

    results = sm.criteria({“x”:2}).remove()

    print ‘sm.remove() #3
–> ‘
,results

 

    results = sm.criteria([{“x”:3}]).remove()

    print ‘sm.remove() #4
–> ‘
,results

 

    results = sm.insert([{“x”:2},{“x”:3}])

    print ‘sm.insert() #4
–> ‘
,results

 

    results = sm.remove({“x”:2})

    print ‘sm.remove() #5
–> ‘
,results

 

    results = sm.remove([{“x”:3}])

    print ‘sm.remove() #6
–> ‘
,results

    pass

Here’s the rest of the code:  (Simple and to the point, no nonsense code,
in a few hours…)

__copyright__ = “””\

(c). Copyright 2008-2011,
Vyper Logix Corp., All Rights Reserved.

 

Published under Creative
Commons License

(http://creativecommons.org/licenses/by-nc/3.0/)

restricted to
non-commercial educational use only.,

 

http://www.VyperLogix.com
for details

 

THE AUTHOR VYPER LOGIX
CORP DISCLAIMS ALL WARRANTIES WITH REGARD TO

THIS SOFTWARE, INCLUDING
ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND

FITNESS, IN NO EVENT SHALL
THE AUTHOR BE LIABLE FOR ANY SPECIAL,

INDIRECT OR CONSEQUENTIAL
DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING

FROM LOSS OF USE, DATA OR
PROFITS, WHETHER IN AN ACTION OF CONTRACT,

NEGLIGENCE OR OTHER
TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION

WITH THE USE OR
PERFORMANCE OF THIS SOFTWARE !

 

USE AT YOUR OWN RISK.

“””

import logging

 

import urllib2

 

import simplejson

 

from vyperlogix import
misc

from vyperlogix.misc import _utils

from vyperlogix.classes import MagicObject

 

arg0 = lambda args:args[0] if (misc.isIterable(args)) and (len(args) > 0) and (misc.isString(args[0])) else None

list0 = lambda args:args[0] if (misc.isIterable(args)) and (len(args) > 0) and ( (misc.isList(args[0])) or (misc.isDict(args[0])) ) else []

int0 = lambda args:args[0] if (misc.isIterable(args)) and (len(args) > 0) and (misc.isInteger(args[0])) else None

bool0 = lambda args:args[0] if (misc.isIterable(args)) and (len(args) > 0) and (misc.isBooleanString(args[0])) else None

 

__only__ = lambda value,target:value if (str(value).lower().capitalize() == str(target).lower().capitalize()) else None

 

class SleepyMongoose(MagicObject.MagicObject2):

    def __init__(self,sleepy_mongoose):

        ”’

        See
also: https://github.com/kchodorow/sleepy.mongoose/wiki/

        ”’

        toks = sleepy_mongoose.split(‘:’)

        try:

            self.__sleepy_mongoose__ =
sleepy_mongoose if (misc.isString(toks[0]) and misc.isInteger(int(toks[-1]))) else None

        except:

            self.__sleepy_mongoose__ =
None

        self.__database__ = None

        self.__collection__ = None

        self.__server__ = None

        self.__server_name__ = None

        self.__last_exception__ =
None

        self.__n__ = None

        self.__docs__ = None

        self.__criteria__ = None

        self.__newobj__ = None

        self.__fields__ = None

        self.__sort__ = None

        self.__skip__ = None

        self.__limit__ = None

        self.__explain__ = None

        self.__batch_size__ = None

        self.__id__ = None

       

    def __http_get__(self,url):

        data = None

        try:

            response = urllib2.urlopen(url)

            json = response.read()

            data = simplejson.loads(json)

        except Exception, ex:

            self.__last_exception__ =
_utils.formattedException(details=ex)

            data = None

        return data

 

    def __http_post__(self,url,parms=()):

        from vyperlogix.url._urllib2 import
http_post

        data = None

        try:

            json = http_post(url,parms)

            data = simplejson.loads(json)

        except Exception, ex:

            self.__last_exception__ =
_utils.formattedException(details=ex)

            data = None

        return data

 

    def __handle_exceptions__(self):

        if (not self.__sleepy_mongoose__):

            logging.error(‘ERROR: Cannot understand “SleepyMongoose(%s)”‘
% (self.__sleepy_mongoose__))

        elif (not self.__database__):

            logging.error(‘ERROR: Cannot understand “db(%s)”‘ % (self.__database__))

        elif (not self.__collection__):

            logging.error(‘ERROR: Cannot understand “collection(%s)”‘
% (self.__collection__))

   

    def __getattr__(self,name):

        self.__n__ = name

        return super(SleepyMongoose, self).__getattr__(name)

 

    def __call__(self,*args,**kwargs):

        if (self.__n__ == ‘db’):

            self.__database__ = arg0(args)

            self.__reset_magic__()

        elif (self.__n__ == ‘collection’):

            self.__collection__ =
arg0(args)

            self.__reset_magic__()

        elif (self.__n__ == ‘server’):

            self.__server__ = arg0(args)

            self.__reset_magic__()

        elif (self.__n__ == ‘name’):

            self.__server_name__ =
arg0(args)

            self.__reset_magic__()

        elif (self.__n__ == ‘docs’):

            self.__docs__ = list0(args)

            self.__reset_magic__()

        elif (self.__n__ == ‘criteria’):

            self.__criteria__ = list0(args)

            self.__reset_magic__()

        elif (self.__n__ == ‘fields’):

            self.__fields__ = list0(args)

            self.__reset_magic__()

        elif (self.__n__ == ‘sort’):

            self.__sort__ = list0(args)

            self.__reset_magic__()

        elif (self.__n__ == ‘skip’):

            self.__skip__ = int0(args)

            self.__reset_magic__()

        elif (self.__n__ == ‘limit’):

            self.__limit__ = int0(args)

            self.__reset_magic__()

        elif (self.__n__ == ‘batch_size’):

            self.__batch_size__ =
int0(args)

            self.__reset_magic__()

        elif (self.__id__ == ‘id’):

            self.__id__ = int0(args)

            self.__reset_magic__()

        elif (self.__n__ == ‘explain’):

            self.__explain__ =
__only__(bool0(args),‘True’)

            self.__reset_magic__()

        elif (self.__n__ == ‘newobj’):

            self.__newobj__ = list0(args)

            self.__reset_magic__()

        elif (self.__n__ == ‘find’):

            if (self.__sleepy_mongoose__)
and (self.__database__) and (self.__collection__):

                p = []

                if (self.__criteria__):

                    p.append(tuple([“criteria”,simplejson.dumps(self.__criteria__)]))

                    self.__criteria__ = None

                if (self.__fields__):

                    p.append(tuple([“fields”,simplejson.dumps(self.__fields__)]))

                    self.__fields__ = None

                if (self.__sort__):

                    p.append(tuple([“sort”,simplejson.dumps(self.__sort__)]))

                    self.__sort__ = None

                if (self.__skip__):

                    p.append(tuple([“skip”,‘%d’%(self.__skip__)]))

                    self.__skip__ = None

                if (self.__limit__):

                    p.append(tuple([“limit”,‘%d’%(self.__limit__)]))

                    self.__limit__ = None

                if (self.__explain__):

                    p.append(tuple([“explain”,‘%s’%(self.__explain__)]))

                    self.__explain__ = None

                if (self.__batch_size__):

                    p.append(tuple([“batch_size”,‘%d’%(self.__batch_size__)]))

                    self.__batch_size__ = None

                if (self.__server_name__):

                    p.append(tuple([“name”,self.__server_name__]))

                url = http://%s/%s/%s/_%s&#8217;
% (self.__sleepy_mongoose__,self.__database__,self.__collection__,self.__n__)

                return self.__http_post__(url,tuple(p))

            else:

                self.__handle_exceptions__()

            self.__reset_magic__()

        elif (self.__n__ == ‘insert’):

            if (self.__sleepy_mongoose__)
and (self.__database__) and (self.__collection__):

                p = []

                if (self.__docs__):

                    p.append(tuple([“docs”,simplejson.dumps(self.__docs__)]))

                    self.__docs__ = None

                else:

                    p.append(tuple([“docs”,simplejson.dumps(list0(args))]))

                    self.__docs__ = None

                if (self.__server_name__):

                    p.append(tuple([“name”,self.__server_name__]))

                url = http://%s/%s/%s/_%s&#8217;
% (self.__sleepy_mongoose__,self.__database__,self.__collection__,self.__n__)

                return self.__http_post__(url,tuple(p))

            else:

                self.__handle_exceptions__()

            self.__reset_magic__()

        elif (self.__n__ == ‘update’):

            if (self.__sleepy_mongoose__)
and (self.__database__) and (self.__collection__):

                p = []

                if (self.__criteria__):

                    p.append(tuple([“criteria”,simplejson.dumps(self.__criteria__)]))

                    self.__criteria__ = None

                if (self.__newobj__):

                    p.append(tuple([“newobj”,simplejson.dumps(self.__newobj__)]))

                    self.__newobj__ = None

                if (self.__server_name__):

                    p.append(tuple([“name”,self.__server_name__]))

                url = http://%s/%s/%s/_%s&#8217;
% (self.__sleepy_mongoose__,self.__database__,self.__collection__,self.__n__)

                return self.__http_post__(url,tuple(p))

            else:

                self.__handle_exceptions__()

            self.__reset_magic__()

        elif (self.__n__ == ‘more’):

            if (self.__sleepy_mongoose__)
and (self.__database__) and (self.__collection__):

                p = []

                if (self.__id__):

                    p.append(tuple([“id”,‘%d’%(self.__id__)]))

                    self.__id__ = None

                if (self.__batch_size__):

                    p.append(tuple([“batch_size”,‘%d’%(self.__batch_size__)]))

                    self.__batch_size__ = None

                if (self.__server_name__):

                    p.append(tuple([“name”,self.__server_name__]))

                q = ‘?’+‘&’.join([‘=’.join(list(t)) for t in p]) if (len(p) > 0) else

                url = http://%s/%s/%s/_%s%s&#8217;
% (self.__sleepy_mongoose__,self.__database__,self.__collection__,self.__n__,q)

                return self.__http_get__(url)

            else:

                self.__handle_exceptions__()

            self.__reset_magic__()

        elif (self.__n__ == ‘cmd’):

            if (self.__sleepy_mongoose__)
and (self.__database__):

                p = []

                _c_ = list0(args)

                if (_c_):

                    p.append(tuple([“cmd”,simplejson.dumps(_c_)]))

                url = http://%s/%s/_%s&#8217; % (self.__sleepy_mongoose__,self.__database__,self.__n__)

                return self.__http_post__(url,tuple(p))

            else:

                self.__handle_exceptions__()

            self.__reset_magic__()

        elif (self.__n__ == ‘remove’):

            if (self.__sleepy_mongoose__)
and (self.__database__) and (self.__collection__):

                p = []

                if (self.__criteria____):

                    p.append(tuple([“criteria”,simplejson.dumps(self.__criteria____)]))

                    self.__criteria____ = None

                else:

                    p.append(tuple([“criteria”,simplejson.dumps(list0(args))]))

                    self.__criteria____ = None

                if (self.__server_name__):

                    p.append(tuple([“name”,self.__server_name__]))

                url = http://%s/%s/%s/_%s&#8217;
% (self.__sleepy_mongoose__,self.__database__,self.__collection__,self.__n__)

                return self.__http_post__(url,tuple(p))

            else:

                self.__handle_exceptions__()

            self.__reset_magic__()

        elif (self.__n__ == ‘connect’):

            _token = self.__n__

            if (self.__sleepy_mongoose__):

                p = []

                if (self.__server__):

                    p.append(tuple([“server”,self.__server__]))

                if (self.__server_name__):

                    p.append(tuple([“name”,self.__server_name__]))

                url = http://%s/_%s&#8217; % (self.__sleepy_mongoose__,_token)

                return self.__http_post__(url,tuple(p))

            else:

                self.__handle_exceptions__()

            self.__reset_magic__()

        else:

            logging.debug(‘DEBUG: Cannot understand “%s(%s,%s)”‘ % (self.n,args,kwargs))

        return self

 

 

Salivating over the idea of using Amazon EC2 GPU Instances for Map-Reduce via Cuda 4.x

Cost is only about $2.10 per hour ($1500+ per month) but just think of how cool a Map-Reduce process might be with all those Cuda Cores churning away on the data.

Amazon offers GPU Instances for something like $1540 per month; Tesla GPUs (2x NVIDIA Tesla “Fermi” M2050 GPU has 448×2 Cuda Cores) can do a whole lot of Map-Reduce for the money; one might not need very many of these connected to compressed EBS to realize some potential for some form of Analytics process, for instance.  The real question is just how many customers can a single Amazon GPU Instance handle before we need to scale and then what is the real scaling cost going forward.  Each customer’s data could be stored in a separate compressed EBS Volume all handled by a single Amazon GPU Instance or Amazon GPU Cluster.  We might be able to greatly reduce cost while providing very good performance across the board.

The storage cost might be reduced by using a compressed file system for Linux, like fusecompress or we might build a custom solution using FUSE.  Seems like a fair number of people are using fusecompress for Ubuntu, might also be available for Red Hat.  This could help to mitigate the Amazon EBS cost.  All that remains is checking to see how we might leverage micro instances at Amazon to store the Hadoop data combined with a real-time solution for gathering aggregations – who knows this could reduce deployment costs by a fair margin.

Keep in mind raw data can easily be compressed by a factor of 1/30th the original size.

I may even go out and buy a slick Nvidia 560 Ti or the like just to get my hands on something close to 500 Cuda Cores rather than the 64 I have in my 2+ yr old laptop that just happens to have a dual GPU SLI setup; not bad for development and testing.

On-Demand High-Power Map-Reduce via NVidia CUDA and Stackless Python might just be too much to let sit idle… might have to begin paying Amazon for some time on their Massive GPU Instances just for fun some weekend.

MySQL for BigData

If all you have to work with is MySQL but you have PetaBytes to store… you could be in trouble unless… you happen to be me…

Assumption #1

Relational databases love executing really small SQL Statements.

Assumption #2

Relational databases do NOT have to use any relational features.

Assumption #3

Networked Object-Oriented data models are very efficient when all you have to work with is a Relational Db as the data management platform.

Assumption #4

BigData solutions tend to use really big heaps of key/value storage systems because the data can be spread-out over a large number of modes easily.

Assumption #5

Many instances of MySQL can execute the same query faster than a single instance because the distributed query can be executed in parallel.

Assumption #6

Forget everything you ever thought you knew about how to cluster MySQL because all that crap won’t help you when you have PetaBytes to store and manage efficiently.

Solution #1

Store your BigData in many instances of MySQL (think 10’s or 100’s) using a Networked Object-Oriented Data Model where key/value pairs are linked to form objects using nothing but Metadata in the form of key/value pairs while spreading the data out to all available MySQL nodes and then execute the SQL required to retrieve Collections of Objects in parallel and MySQL can be nice and fast for BigData.

Caveat #1

Do you know what is meant by “Networked Object-Oriented Data Model” ?!?  Probably not but this gives you something to figure-out while looking for all those cheap computers you will use to form your MySQL Network.

Caveat #2

Do you know what is meant by “executing the same SQL Statement in Parallel” ?!?  Probably not but this gives you something to figure-out while you think about the prior Caveats.

Caveat #3

Do you know the process of fetching data from all those MySQL Instances can be done using a single SQL Statement ?!?  Probably not, but then you probably forgot to read-over and understand Assumption #6 from above.  Think about Collections of Objects more than Rows of Data.

Caveat #4

Keep it super-simple.  Super-Simple runs faster than the other thing.

Computers are really stupid but can be fast.

Stupid requires simple.

Simple is FAST.

BigData is FAST when the solution is parallel but stupid simple.

Caveat #5

Try to optimize each MySQL Instance by increasing the available RAM to a minimum of 4 GB per instance using 32-bit MySQL running in a 32-bit Linux OS but use VmWare Workstation to run each instance using a separate CPU Core with a minimum of 1 VmWare Workstation Instance per CPU Core.  Unless you can find a MySQL Implementation that automatically uses multiple cores and then you have to give some serious thought to how to make all them MySQL Instances execute the same SQL Statements in parallel – better think about this one for a while… I already know how to do this but you might not.

 

HADOOP Optimization Technique #1

HADOOP is slow !

BigData should be FAST !

Single Server installations for HADOOP tend to want to use the entire multi-core CPU for one single HADOOP instance.

Assumption #1

The Java JVM has NOT been optimized for multiple cores for anything other than garbage collection when one uses an out of the box JRE.

Assumption #2

The HADOOP has NOT been optimized for multiple cores for anything other than garbage collection based on Assumption #1.

Assumption #3

Most servers HADOOP might run on probably have multiple cores especially when Intel or AMD chips are being used due to the need to keep Moore’s Law alive in a Universe where the upper bound for CPU performance is the RAM bus speed.

Assumption #4

VmWare Workstation Appliances can be run each using a separate core when the host OS is Windows Server 2008 R2.

Assumption #5

VmWare Workstation Appliance Instances will be run at the HIGH Priority setting (one level below Real-time for Windows Server 2008 R2).

Assumption #6

VmWare Workstation Appliance Instances will be given 4 GB RAM using 32-bit HADOOP in a 32-bit Linux OS; all software being used is 32-bit.  No 64-bit code will be used.

Possible Solution #1

If the server has 4 cores when run 4 instances of HADOOP each in a separate VmWare Appliance where each VmWare Workstation instance is dedicated to one of the available cores.

Scale for the number of cores.

Continue packing-in separate VmWare Instances using VmWare Workstation until the aggregate performance begins to degrade and then use empirical performance data to determine the optimal configuration.

Caveat #1

Solution #1 has not yet been tried however based on the available information it should produce better performance for HADOOP and/or Java in general.

 

%d bloggers like this: