Thursday, December 20, 2012

Cloud Computing v/s Parallel Computing

Recently, I was asked a question on implementing Map-Reduce like system/platform/feature on a data that's Relational in nature. I couldn't think of a real good answer right at that moment, but the question kept me awake for next couple of nights, pondering for the good solution. That of-course lead me to the bigger question about over-all Cloud Computing v/s Parallel Computing or a Parallel Computational Framework in main-stream programming.

Cloud Computing and Map-Reduce

Well..we all know..lots and lots of data is getting written to the disks. We know the amount of tweets getting created, number of photos getting uploaded to Facebook everyday, and not to mention sites like WordPress, YouTube, Flickr, Tumblr and, Instagram. It is estimated that the amount of data we have created so is about to reach 500 billion GBs. It is also estimated that every day amount of information that flows on the internet can fill 168 Million DVDs. I guess, enough of reasons why we need a huge infra-structure to crunch all this data and make sense out of it.

Now if we look at this data being written, it's mostly written only once. Most of the times it's not even going to get updated, forget concurrent updates. Now, yes, Map-Reduce was invented for crunching such kinds of data only, which is entirely read-only, with no relations (as in RDBMS) and rather key-value pairs. Map-Reduce runs on the data which is scattered (or sharded ) across many disks with certain structure (or no structure, I would say). And most importantly Map-Reduce could run off of multiple CPUs on a commodity hardware in a data-center such as AWS, Rackspace or in a private cloud for that matters and not on a individual extra-ordinarily big hardware configuration. Of-course the virtualization softwares are really making it possible to create that illusion of a VM of any defined configuration that one can wish.

Cloud computing is not simply a SaaS model of doing business. With Big Data in the picture, it's really becoming something more than that. 

Cloud Computing coming to main-stream programming

What I am intrigued about is, many cloud computing technologies such as Map-Reduce, coming to the main-stream programming now. I can see more and more technology start-ups coming-up to offer framework and services on top-of Open Source Big Data technologies like Hadoop or MongoDB. Extremely young software engineers are using this technology, which is just amazing. I don't think if we have ever heard of technology start-ups coming up in every nook and corner based on Parallel Computing technologies and a young software programmer writing Parallel Computing code.

IMO, this can be attributed to following factors.
  • The availability of the Internet. 
  • Cheaper and better hardware - Multi-core and faster CPUs
  • Virtulization technologies such as VMware, Xen etc.
  • The Open-Source revolution created by GNU Free Software Foundation and Apache Software Foundation.
  • And above all, the Google phenomenon! 
    • Google has virtually destroyed pretty much all the boundaries that we are aware of (China is an exception and always will). 
    • Google has also destroyed most of the barriers to the free flow of knowledge on internet.

Getting back to the main point for this paragraph, yes now Cloud computing has indeed come to the main-stream programming and Big Data is already becoming more like a cliche', mostly due to above factors.

To me it seems like how the world-wide cloud computing software industry has evolved as epiphenomenon to above factors.

Then why not Parallel Computing to main-stream programming

After looking at all the above mentioned factors, I would like to press upon the factor - Cheaper and better hardware. The PC hardware is now completely commoditized and now even a household PC comes with multi-core CPU. Now the point being is software keeping pace with the advances in the hardware? Just today, I read a blog-post (http://blog.8thlight.com/uncle-bob/2012/12/19/Three-Paradigms.html). The post talks more about programming paradigms, but it highlights the exact same aspect which I am trying to stress in this post.

I think, just like Cloud Computing, pretty soon, Parallel Computing will also become reality in main-stream programming. Today's programming languages are not capable of truly exploiting the multi-core architectures and hence there is a need of platform for main-stream programming to exploit the real advances in the hardware. But, I do see things moving in this direction. And I wish it to fully grow just like cloud computing technologies. 

Parallel Programming at Intel
Intel is releasing parallel programming tools due next year - http://software.intel.com/en-us/intel-parallel-studio-xe

Parallel Programming in Java

There is a talk about Parallel Programming support in Java 7 - http://www.oracle.com/technetwork/articles/java/fork-join-422606.html.

There are Java language extensions available such as Ateji - http://en.wikipedia.org/wiki/Ateji_PX.

Parallel Programming in .NET
Microsoft is also trying to add Parallel Computing support in .NET stack - http://msdn.microsoft.com/en-us/vstudio/bb964701. I also recently read an article where a research team in Microsoft have created thread-safe constructs on top of C# to take care of parallelism on multi-core architectures.

It will be very interesting to see how Parallel Programming evolves just like Cloud Computing is evolving.

How about Functional Programming?
One of the promises, languages like Haskell (and Scala to some extent)  bring is the lack of state getting maintained anywhere in the system (since there is a lack of assignment operation). The side effect of this is, since there is no state maintained, there is no concurrency issue to be handled at any level (the application or system level). Everything is just a function call!

But I believe it's a complete change in the programming paradigm that we are all used to (excluding the academic or scientific community). So I am doubtful if it can really come to main-stream programming. Perhaps Scala may catch-up a bit as it has OOP features as well, so it will be good to see how that goes.

It will be interesting to see how things progress during 2013.

1 comment:

  1. Thanks for your grateful informations, am working in FNT software solutions, so it will be helpful info for my works.

    ReplyDelete