Filter by Thinker: View All

Why we upgraded from Amazon Web Services Mark Marsiglio

As you may know, we provide our clients with enterprise-grade hosting using a complex architecture consisting of many servers dedicated to individual functions such as hosting databases, file servers, load balancers and front end servers. 

For many years, our servers were located in the Amazon Web Services US-East Region. It seemed to be a perfect platform for this, allowing our array of servers to increase and decrease in capacity (and cost) as our usage patterns demanded. When we implemented our scalable array of servers within AWS it offered many significant advantages over our previous dedicated-server hosting platform. 

However, there are two inherent flaws in the design of AWS that prevented it from being the best choice for us now. 

1) Downtime

First, there have been several high-profile failures of the AWS infrastructure resulting in downtime. I say "high profile" because they frequently make the, mainstream,news. The reason that I believe these failures are so well known is because the very nature of AWS's popularity prevents it from being able to handle them effectively.

If you read Amazon's epic 6,000 word explanation of their April 2011 downtime, they identified the design flaw and indicated they would be adding reserve capacity:

As a result, many users who wrote their applications to take advantage of multiple Availability Zones did not have significant availability  impact as a result of this event.

and:

We now understand the amount of capacity needed for large recovery  events and will be modifying our capacity planning and alarming so that  we carry the additional safety capacity that is needed for large scale  failures. We have already increased our capacity buffer significantly,  and expect to have the requisite new capacity in place in a few weeks.

At the same time, AWS continues to drive down the cost of their service with regular price cuts. They could be cutting prices and making the system more stable at the same time, but those goals seem at odds with each other. Does AWS have enough reserve capacity in their infrastructure to handle a massive increase in utilization that would result from the failure of an availability zone? 

Any competent system administrator will plan for failure of single instances or services within their cloud deployment. Even multiple instance failure. But AWS has had issues with instance failure and simultaneous API failures that prevent recovery. 

I would argue that as more customers deploy larger cloud-hosted platforms with AWS the impact of a infrastructure failure in one AZ is likely to disrupt service in other AZs or even regions. 

With the release of the new EBS Snapshot Copy feature they have made it easier to maintain a warm copy of your data in another AWS region. I suppose using a separate region is safer than a separate availability zone in the same region for failover, but why stop there?

Our choice - two completely separate providers

For the sake of argument, let's accept the premise that the best failover solution is a live backup isolated in a separate region (AWS US-West for instance) with failover handled by DNS. Then I would propose that there is still risk in depending on the same provider for both your primary and secondary systems.

The more AWS enables cross-region data transfer features the more likely it is that they will not have the reserve capacity to properly spin up the recovery resources required by their customer base. If an AZ fails, every AWS customer will be scrambling to spin up replacement instances all at the same time and this is likely to continue to overwhelm the infrastructure just as it has done multiple times in the past. 

Our solution depends on a primary data center and backup data center that are completely independent of each other. Separate companies, different technologies, thousands of miles apart geographically. We use DNS failover to monitor the primary system and switch to the backup if there is a problem. We sync the data daily and can run indefinitely on the backup platform if necessary.

2) I/O Performance

While there are an incredible array of services at AWS, our primary use was built around their EC2 servers and EBS storage volumes. We host enterprise web content management systems on this platform.

We are not building our own web app, architecting the system from scratch to overcome the performance limitations of EBS. We are optimizing the performance of an off-the-shelf CMS so we can't do much about its dependence on IO performance. 

With the release of High I/O instances and EBS-Optimized Instances and volumes AWS has attempted to address this limitation. The former adds 2TB of SSD and the latter increases the performance reliability of normal EBS volumes. But both add significantly to the cost. 

Consider the costs of a typical component in the platform:

Instance Type

Instance Monthly Cost

140GB Storage 

Total

AWS XLarge Instance $374 $14 $388
AWS EBS-Optimized XLarge Instance $410 $217* $627
AWS High IO Instance with SSD $2232 n/a** $2232
Non-AWS SSD  Instance $300 Included $300

*Configured for the maximum 2,000 IOPS  **Includes 2TB of SSD, currently the minimum configuration at AWS

While the AWS SSD instances are very performant, the associated resources (CPU, RAM, Disk size) are wrong for our system design. The EBS-Optimized instances in the example are priced out at the max available capacity of 2,000 IOPS. The SSD performance at competing providers is at least 10-50x that depending on whose benchmarks you believe. 

Our choice - cloud-based SSD servers

Our platform uses SSD for all primary services except our web servers' boot volume. Database, file serving (NFS) and search indexing services all run 100% SSD in production and in our backup platform. 

The improvement in performance and stability is significant and noticeable. The bottleneck with AWS was always IO wait and it often was so bad that it led to downtime. At AWS we used 4 EBS volumes in a striped RAID 0 configuration to improve the throughput but in practice what we found is that we were 4x as likely to have the volume hang because of inconsistent IO performance (we now had to have 4 volumes working consistently instead of just 1). 

Summary

We used AWS exclusively for our enterprise web content management system cloud hosting platform from 2007 until 2012. Our average uptime from 2007 to 2012 was 99.67%, far from great. The majority of the downtime was caused by infrastructure and performance problems at AWS.

Since we switched to SSD our uptime has been 99.97% with the primary downtime being a planned maintenance event. With our new system design we would expect a maximum of 10 minutes of downtime before requests are automatically rerouted to our backup platform. 

Performance and uptime are constantly constrained by cost and we continue to look for the best combination for our clients. 

Filed under: Hosting , CMS , Web Development

 

The Magnited States of America Ryan Gray

The Alamo Drafthouse has one simple rule: "If you talk or text during a movie, we kick you out ." People who disagree with this thoughtful, well-placed rule are clearly insolent and disrespectful. 

Fear not! Fire bad customers and preserve your brand. Don't compromise to make a couple extra bucks in the short-term when brand experience is on the line.

Your customers will love you and support you for it and you'll gain great opportunities to turn fools' drunkenness into Youtube gold.

 

Sending Visitors Away From Your Site Ryan Gray

As a marketer, I am aware of the value of a customer "in your house ." That "house" may be your physical branch, your website, mobile application or another important touchpoint. When they are in front of you, you gain opportunity . Got it. 

However, it's important not to be 100% isolationist in your marketing. Have some fun. I love this example of the Mailchimp chimp:

Mailchimp example 1
Mailchimp screenshot

First, they've given a voice (albeit, a primate voice) to the organization. Awesome. But notice, he's not trying to sell me anything! Whaaaaa? Check it out:

Mailchimp 3

Here's another:

Mailchimp 4

And another:

Mailchimp 5

Not only is he not selling me another valuable Mailchimp service, but he's actually distracting me with a video. And it's a Youtube video! It's not even on their site! (forgive the dramatic exclamations)

Don't they know I could be captured and enraptured by the the millions of time-wasting kitty and baby videos on Youtube. God forbid that I discover Schmoyoho and never return!

I love it. That is building a brand that people can enjoy. Any departure from hard sales is a good one, I think. Just make a good product, find your market and have fun with it.

 

Holy usability, Google! Ryan Gray

Forget all the "don't be evil" speak and even all of the big brother contradictions to that mantra. Forget the war with Apple, Microsoft and the others. Forget even the foreboding likelihood that we will one day be paying taxes to Google.

There is something FAR more important. Google has added a fancy notification that tells you when you forget an email attachment!

Gmail attachments

Google, all is forgiven.

Filed under: Google , Usability , Web Development

 

Branding in a "small, small world" Ryan Gray

It's easy to be overwhelmed with the fact that you are just one of almost 7,000,000,000 people on the planet. Don't be discouraged. Even in the seemingly unrelated series of events that transpire every day amongst so many, there is purpose and opportunity.

I was reminded of this yesterday when I spoke to a colleague in South Carolina. Michelle is the Marketing Manager for Health Facilities FCU, a credit union in Florence, South Carolina. We were catching up from seeing each other at the South Carolina Credit Union League conference in April when Michelle said, "Hey, were you at downtown Disney last week with your family ?" Now, I've only met Michelle once at the conference.

"Uh, yes ."

"I saw you with your family but by the time I got to where you were, you were gone. "

It turns out that Michelle recognized the ThinkCreative shirt I had on that day. Typically, these shirts are gobbled up at credit union conventions by the notorious adult trick-or-treaters (credit union board members) that attend. In this case, the shirt actually made it into the hands of a colleague.

Somehow, Michelle was able to single out a recognizable item (a tiny logo on a shirt) in the midst of the smorgasbord of enticing shops and restaurants and those pesky gigantic animal monsters that are so desperate for adolescent attention.

Think about that for a second. Michelle was surrounded by hundreds, if not thousands, of people she didn't know. She was in the pleasure capital of the world, Orlando, FL (what now Vegas?). She was surrounded by children overwrought with titillation demanding to get bedazzled like a princess or rub the hunchback's hump for good luck. Even in the midst of activities altogether separate from the context our relationship, Michelle focused in on a simple red and white mark she had seen just a handful of times and said, "ThinkCreative ."

Although, that it is a fun coincidence, it's also the fruit of consistent branding. Every chance for an interaction with a client, potential client, colleague or vendor should present a consistent and clear face. Every touchpoint should be well-placed, thoroughly thought through and recognizable. Opportunities to have your voice heard are few and if you miss it, it's gone. If you think you've done this already, dig deeper. You haven't . Don't sit back and settle. Keep pushing and you will see how much opportunity is truly out there.

There is always something that can be improved that only a second or third or forty fifth look will reveal.

For your enjoyment and annoyance...