Photo
justmigrate:

Hi,
I just moved my posts from Posterous! Do go though my blog for all the new posts.
Its easy to migrate try JustMigrate
3Crumbs app - Are you the local thrifter we all have been looking for? 

justmigrate:

Hi,

I just moved my posts from Posterous! Do go though my blog for all the new posts.

Its easy to migrate try JustMigrate

3Crumbs app - Are you the local thrifter we all have been looking for? 

Text

Hardware Heterogeneity in AWS EC2 and Impact on Instance Performance

If you have been using AWS EC2 for long enough time then would have noticed certain instances being slow compared to other instances of the same family type (like m1.large or m1.xlarge). Now, there is an interesting study presented at Usenix HotCloud12 confirms that underlying hardware heterogeneity in AWS EC2 indeed has an impact on the performance of Instance from CPU, Memory and Disk perspective.

This benchmarking has used UnixBench to measure the CPU, Redis to measure the memory,and Dbench to measure the disk subsystems of similar instance families in US-EAST region across different availability zones and confirms that you can gain from 30% to 60% improvement in system performance based on the actual underlying hardware on which it is running. 

Pro and lazy tip : Launch instances in new availability zones of your AWS region to exploit the probablistic chance of running your instance on a better hardware :)

Text

BarcampBangalore Talk : Scalable Load Testing using JMeter in AWS Cloud

I did a presentation at BarcampBangalore Techlash session on “Scalable Load Testing using JMeter in AWS Cloud”. The idea was to showcase, how to leverage JMeter and Amazon EC2 to generate massive load tests at cheap cost without having to worry about complexity of setups or configurations.

I have used jmeter-ec2 script as a base for this talk and you can find the presentation below.

Would love to hear your feedback and do check out our new web based load testing service called Minjar CloudLoad and you can run tests with upto 1000 concurrent requests for free.

 

Text

AWS India Summit - 2012

Amazon Web Services is organizing their second cloud summit in India across Bangalore, Chennai and Mumbai. You can find more details on AWS India Summit site.

AWS summit focusses on bringing togther partners, existing customers and propsects to discuss and share ideas around Cloud Computing including best practices around AWS infrastructure. It’s must attend event for architects and startups planning to use cloud infrastructure.

Top 10 reasons mentioned on the AWS Summit page on why you should attend it?

  • Hear the opening keynote by Amazon.com CTO, Dr. Werner Vogels, on the future of the AWS Cloud and learn about the 7 major transformations of cloud computing.
  • Ask questions directly to our customer panel about how they leverage AWS in their own line of business applications.
  • Learn about The Total Cost of (Non) Ownership in the Cloud and cost savings using AWS.
  • Discover the latest services and features in the AWS Cloud and learn how to put them to use in your business applications.
  • Deep dive into common solutions and workloads in the AWS Cloud: Enterprise Applications, Content Delivery, Disaster Recovery, Big Data, and more.
  • Gain understanding of AWS best practices for developing, architecting, and securing applications in the Cloud.
  • Hear how AWS customers have successfully built and migrated their applications to the AWS Cloud.
  • Learn from first-hand experiences about AWS empowering Agile development for both startups and greenfield projects.
  • Explore best practices for running Microsoft applications on AWS securely and effectively.
  • Meet AWS partners who offer consulting and technology solutions to help get you started in the Cloud

Hope to see you at Bangalore event on 4th October.

Text

Top 10 Things To Know About Google Compute Engine

Google took it’s first significant step to enter Infrastructure-as-a-Service market by providing Compute Engine to allow customers run on-demand virtual machines on their global network of data centers.For a while, it had good presence in Platform Cloud with GAE, Cloud Storage, BigQuery, Prediction and Translation APIs but their primary focus was to make it easier for developers to build new applications. With Compute Engine, it can now target developers to port their existing applications to Google Cloud.

Here are the top 10 things that you should know about Google Compute Engine :

1. Pricing is cheaper than Amazon EC2 or other public cloud platforms compute services. This might not be a significant advantage as Amazon is known to bring down costs almost every quarter.

2. Guest OS Support - Currently it supports only CentOS or Ubuntu and by default starts your instances with Ubuntu 12.04 TLS server image.

3. Google Compute Engine persistent disk can be attached to more than one instance in read-only mode. This would help usecases where you need to share certain docbase/config across instances without having to use rsync or nfs approaches.

4. Predictable Performance - Strong claims of reliable and highly predictable performance from instances in Google Cloud unlike variable performance issues with most of the public cloud platforms (mostly due to shared nature of underlying physical resources) for large scale workloads or heavy consumption by another tenant. Some of the early customers are raving about reliable performance from Google Compute Engine instances.

5. Data Security -  Another significant advantage where Google Compute Engine encrypts the data stored on the disks (both persistent and ephemeral) taking care of data-at-rest and also it encrypts data on the host before transmitting it to the network storage in case of persistent disk taking care of data-in-transit security issues. This would ease the data compliance and security constraints for enterprise applications.

6. Networking - Very high level of control to end users interms of creating and managing their instances network and firewalls. One interesting aspect is, you can have a private network and connect all your instances across different Google Cloud regions through it without having to go over public internet but using Google high performance global network.

7. HPC Focus - Google is currently focussing on bigdata, batch processing and hpc workloads for their compute engine which can offer very large scale computing resources. Given the predictable performance, high memory for core in any instance type, scalable cloud storage and data security it would be enticing for most large scale computations or workloads.

8. Maintenance Windows -  During their limited preview for developers, they would have pre-defined and notified maintenance windows in their data centers. It would cause your instances to be terminated and also your persistent disks won’t be available for use during maintenance period. They encourage distributed deployments to avoid any issues.

9. No IPV6 Support - It doesn’t support IPV6 but should be added in near future. Also if you need static ip address for your instances then need to request via email, I guess it would be fixed asap.

10. Limited Preview -   Google Compute Engine is in limited preview and you can place your request here for access.

With Amazon, Google, Microsoft having strong focus on public cloud market and each of them trying to out innovate their offerings will be a good sign for most developers, startups and enterprises to leverage the real advantages of on-demand computing. It’s more than pricing/platform war with accelerated innovation!

 

Text

Remote JMX monitoring of java application in AWS Cloud

Recently, one of our cloud engineer was enabling JMX monitoring for a Java application deployed using AWS BeanStalk. While he could connect to it locally using JConsole, it was giving connection refused exception while connecting from the remote machine using JConsole but could connect to JMX port using Telnet..

Standard configuration exported via CATALINA_OPTS of BeanStalk AMI to enable remote JMX monitoring:

-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=8090 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=falseI was troubleshooting the problem by enabling the debug option of JConsole on client side and found that RMI stub on server was running on a different port which was resulting in conneciton failure.

jconsole -J-Djava.util.logging.config.file=<path-to-log-properties-file>I created a following file with required log properties for JConsole

Logging.properties
handlers = java.util.logging.ConsoleHandler
.level = INFO
java.util.logging.ConsoleHandler.level = FINEST
java.util.logging.ConsoleHandler.formatter = \
java.util.logging.SimpleFormatter
javax.management.level = FINEST
javax.management.remote.level = FINEST

The debug log was showing the JConsole was unable to connect to RMI stub listening on a different port. From JMX monitoring documentation, it uses two ports - one for RMI registry (which was configured using about settings) and one where RMI connection objects are exported (this will be choosen by random unless we extend JMXServiceURL).

For us the challenge was enabling all ports from EC2 machines so we setup a dedicated machine where JConsole is installed and opened ports from our BeanStack group to that specific machine for accessing.


Text

Oracle Database Licensing In AWS Cloud

I was evaluating about running an Enterprise Edition of Oracle on Amazon EC2 for datawarehouse application and stumbled upon the licensing policy which was quite different compared to physical hardware environements. I have simplified it for a quick understanding.

Folllowing are the details of Standard and Enterprise Editions :

1. Oracle Standard Edition - On a single EC2 instance of upto 4 virtual cores, it would be considered 1 processor license. Also it’s important to note that you are running Oracle SE on 2 EC2 machines, each with 1 core then you would need 2 processors licenses.

2. Oracle Enterprise Edition - On a single EC2 instance of 8 virtual cores (platform with core processor licensing factor of 0.5) would require 8 * 0.5 = 4 processor licenses. So if we have 4 virtual cores then we would need 2 processor license.

You can find details about this from Oracle Cloud Documentation.

 

Text

Really FAST : High Performance JSON Parser for JAVA

I was evaulating JSON parsers with JAVA wrapper for use in one of our BigData projects. While there are 100’s of them, our first criteria was to look for high performance (serialization and de-serialization) and robustness (ability to deal/scale well with large payloads). In our usecase,even 100ms of difference in parsing or creating JSON would mean good gains for us as it will be operating on 100’s of million input records in HDFS through Map Reduce.

Here is what I stumbled upon:

1. Test cases and results for most of the JAVA JSON wrappers.

2. JSON Peformance on Andriod with warmup.

3. Google-gson performance page.

After looking at above benchmarking results and further digging, we choose to go with Jackson as it proves to be FAST.

p.s : we like json-simple and google-gson as well but speed matters to us :

Text

Hadoop Optimization : Dealing with small files problem

Hadoop is not really good at dealing with tons of small files and rather good at handling large files. Also too many small files increase the number of mappers, job coordination effort (task scheduling), less work for each map task and overall processing time.

Input to Hadoop MapReduce process is abstracted by InputFormat used and FileInputFormat is a default implementation that deals with files in HDFS. With FileInputFormat, each file is splited into one or more InputSplits typically upper bounded by block size. This means the number of input splits is lower bounded by number of input files. This is not an ideal environment for MapReduce process when it’s dealing with large number of small files, because overhead of coordinating distributed processes is far greater than when there is relatively small number of large files. 

You can overcome this problem by extending CombineFileInputFormat, implementing RecordReader and then control the number of maps by specifiying mapred.max.split.size value to your Custom Jar command.

Having a right number of mappers depending on the capacity of your cluster size, you can improve the overall effieciency at each map side by providing right configuration (io sort buffer and jvm heapsize). We have noticed significant performance improvements (over 40 to 50%) for some customers while using CombineFileInputFormat over large number of small files.

You need to be aware of the fact that processing large amount of data per  map is bad in case of task failures as recovery would take more time and would hurt overall processing latency. So you need to tune the data split for each map depending on your processing complexity, available resoources (especially memory) and intermediate output size.

Text

How To Guide : Tata Docomo 3G on Ubuntu 11.10

It’s pretty straight forward and follow below steps.

1. Connect your 3G stick and boot up Ubuntu.

2. Select network connections and click on the New Mobile Broadband Connection.

3. Now select continue in the dialog box and select India as country & then click continue.

4. Now it wil show list of service providers and DON’T select “Tata Docomo” as it’s for Photon+ not for 3G. Instead select, I don’t know my provider option and enter “TATA DOCOMO UMTS”, click continue.

5. Under billing dialog, select “My plan is not listed” option and enter “tatadocomo3g” as APN, just click confirm and save your settings.

6.  Now under Network Connections, you could see “TATA DOCOMO UMTS connection” option and you can click on it. If it doesn’t connetc to internet then just unplug & then re-plug you 3G stick.

That’s it, you can use Tata Docomo 3G stick with your Ubuntu 11.10 :)