MySQL for BigData

If all you have to work with is MySQL but you have PetaBytes to store… you could be in trouble unless… you happen to be me…

Assumption #1

Relational databases love executing really small SQL Statements.

Assumption #2

Relational databases do NOT have to use any relational features.

Assumption #3

Networked Object-Oriented data models are very efficient when all you have to work with is a Relational Db as the data management platform.

Assumption #4

BigData solutions tend to use really big heaps of key/value storage systems because the data can be spread-out over a large number of modes easily.

Assumption #5

Many instances of MySQL can execute the same query faster than a single instance because the distributed query can be executed in parallel.

Assumption #6

Forget everything you ever thought you knew about how to cluster MySQL because all that crap won’t help you when you have PetaBytes to store and manage efficiently.

Solution #1

Store your BigData in many instances of MySQL (think 10’s or 100’s) using a Networked Object-Oriented Data Model where key/value pairs are linked to form objects using nothing but Metadata in the form of key/value pairs while spreading the data out to all available MySQL nodes and then execute the SQL required to retrieve Collections of Objects in parallel and MySQL can be nice and fast for BigData.

Caveat #1

Do you know what is meant by “Networked Object-Oriented Data Model” ?!?  Probably not but this gives you something to figure-out while looking for all those cheap computers you will use to form your MySQL Network.

Caveat #2

Do you know what is meant by “executing the same SQL Statement in Parallel” ?!?  Probably not but this gives you something to figure-out while you think about the prior Caveats.

Caveat #3

Do you know the process of fetching data from all those MySQL Instances can be done using a single SQL Statement ?!?  Probably not, but then you probably forgot to read-over and understand Assumption #6 from above.  Think about Collections of Objects more than Rows of Data.

Caveat #4

Keep it super-simple.  Super-Simple runs faster than the other thing.

Computers are really stupid but can be fast.

Stupid requires simple.

Simple is FAST.

BigData is FAST when the solution is parallel but stupid simple.

Caveat #5

Try to optimize each MySQL Instance by increasing the available RAM to a minimum of 4 GB per instance using 32-bit MySQL running in a 32-bit Linux OS but use VmWare Workstation to run each instance using a separate CPU Core with a minimum of 1 VmWare Workstation Instance per CPU Core.  Unless you can find a MySQL Implementation that automatically uses multiple cores and then you have to give some serious thought to how to make all them MySQL Instances execute the same SQL Statements in parallel – better think about this one for a while… I already know how to do this but you might not.



About Ray C Horn
See my profile at with more than 1286+ connections and growing all the time.

One Response to MySQL for BigData

  1. Pingback: Hadoopの最適化 | 大規模計算ドットコム

%d bloggers like this: