Cloud Native Application Interfaces

_qc3o · on Oct 24, 2016

Reactive and evented models are much harder to reason about and design properly. When you're just doing things sequentially in a fabric script it is much easier to make sense of what is going on. As soon as you make things reactive and evented to support dynamic cloud topologies you are basically on another planet and none of the old rules apply. This is why it is hard to design "cloud native" systems. I don't think it is the lack of interfaces and standards but the model being inherently non-sequential. In many cases it is also non-transactional and barely eventually consistent.

drieddust · on Oct 23, 2016

> Let’s go back to first principles. To describe Cloud Native in one word, we'd choose "automatable". Most existing applications are not.

This is doubly true for Enterprise Customers. Most of them are cloud minded without understanding their applications which aren't cloud conditioned.

user5994461 · on Oct 23, 2016

Existing applications will work in the cloud. There is no need to use the fancy cloud stuff (multi region, auto scaling...) if it's not needed [or not possible].

At the minimum, renting an instance in AWS or GCE is equivalent to renting a physical server. Applications don't care what's the brand of the server they're running on.

seneca · on Oct 24, 2016

>Existing applications will work in the cloud. There is no need to use the fancy cloud stuff (multi region, auto scaling...) if it's not needed [or not possible].

I'd like to unpack this.

>Existing applications will work in the cloud.

They may work, but there's a good chance they will not work well. Doing a direct lift and drop of an application onto AWS can be catastrophic if you don't understand storage persistence, or VM availability. An EC2 VM is not like a physical server in that it will not continue to run indefinitely until something breaks. I would say that existing applications will likely not work well without a shift in the way you treat underlying infrastructure. There are a lot of considerations around IO and locality as well.

>There is no need to use the fancy cloud stuff (multi region, auto scaling...) if it's not needed

You've just said "There is no need... if it's not needed.

>[or not possible]

If it is not possible to use the surrounding services your application is probably a poor fit for a cloud platform. It can become prohibitively expensive to try to directly replicate your physical datacenter architecture on a cloud platform.

Is it possible to just drop your existing application onto some VMs? Sure, but it's probably a bad idea.

user5994461 · on Oct 24, 2016

> An EC2 VM is not like a physical server in that it will not continue to run indefinitely until something breaks.

Sorry to contradict but an EC2 VM does run indefinitely until something breaks ;)

There are differences in physical storage between local disks, SAN, NAS, Network Storage, NFS, EBS volumes and Google Volume. A sysadmin should know the characteristics of these, doesn't matter whether it's cloud tech or own tech or homelab tech.

People with all this knowledge are rare and expensive, yet critical for major migrations to go well. I can understand that this is an obstacle for major migrations to the cloud (and a benefit for my payroll).

> You've just said "There is no need... if it's not needed".

I think it's VERY important for legacy migrations. A migration should be done starting with the fundamentals, progressing in stages.

All articles and talks focus on shiny bleeding edge stuff, which is only the latest stage(s). Depending on the applications and organization, this stage may or may not be worthwhile, it should or should NOT be a goal in the first place.

> If it is not possible to use the surrounding services your application is probably a poor fit for a cloud platform. It can become prohibitively expensive to try to directly replicate your physical datacenter architecture on a cloud platform.

I'm talking to clients who have to run their own datacenter right now and want to migrate. It is prohibitively expensive.

dmourati · on Oct 24, 2016

One commentator put it thus:

Legacy: unreliable software on reliable hardware. Cloud: reliable software on unreliable hardware.

https://twitter.com/samj/status/25208089148

zeroxfe · on Oct 23, 2016

> At the minimum, renting an instance in AWS or GCE is equivalent to renting a physical server. Applications don't care what's the brand of the server they're running on.

This frequently trips people up. Cloud instances (typically) have far lower availability than physical servers. They also have no (cheap) persistence across VM evictions.

Prepare for an unpleasant surprise if you're blindly migrating applications from physical servers to cloud instances.

williamstein · on Oct 23, 2016

> This frequently trips people up. Cloud instances (typically) have far lower availability than physical servers. They also have no (cheap) persistence across VM evictions.

This may be true for AWS but (in my experience since 2014) it is not true for Google compute engine. On GCE, VM instances automatically migrate when the host is rebooted, and VM availability and uptime is impressive. Also, on GCE, the default recommended disks are all persistent, since they are block devices mounted over the network; this wasn't the case years ago, but is now, and it is very impressive.

Disclaimer: my startup is in the Google startup program.

zeroxfe · on Oct 24, 2016

Yep. GCE VM migration and persistent storage is really amazing, and getting better with every release, however there are still limitations and caveats that can get you in trouble (especially if you have very tight latency requirements.)

Glad you like it! Good luck with your startup.

Disclaimer: I work on infrastructure systems at Google :-)

user5994461 · on Oct 23, 2016

> Cloud instances (typically) have far lower availability than physical servers.

This is common belief that is just plain wrong.

Servers can stay online for years without issues, same things on AWS and physical servers.

> They also have no (cheap) persistence across VM evictions.

I don't even understand that this sentence mean. You can start by explaining what is "VM eviction"?

Whatever it is. Physical servers have no soft of failover or persistence for when one server suddenly dies.

People started migrating from physical servers to virtualized more than 10 years ago (i.e. VmWare). AWS/GCE are not fundamentally different.

zeroxfe · on Oct 24, 2016

> This is common belief that is just plain wrong.

Sorry, you'll need to back that up.

Evictions are when your VM is booted off a physical machine to a different one. This usually happens when something in the the underlying infrastructure is undergoing maintenance (e.g., networking, power, kernel, physical machine, etc.)

VMs have to deal with servers dying plus evictions plus control and management plane failures, which significantly reduces the availability of a single instance. So strictly speaking, cloud instances typically have lower availability than physical machines.

> Physical servers have no soft of failover or persistence for when one server suddenly dies.

Servers suddenly dying is not the failure mode that requires persistence -- it's evictions. This is why cloud services tend to prefer external distributed storage systems to local disk, and a multitude of ways to quickly and automatically reinstall machines on instance turnup.

BTW, GCE recently started supporting VM migration (which you can opt into), where they attempt to migrate your instance to another physical machine without losing any state. There are still limitations to this, but for a lot of simple cases can work well.

user5994461 · on Oct 24, 2016

It doesn't matter whether the application is abstracted on bare metal, on VmWare or on AWS.

There's still physical hardware down the chain that calls for maintenance. When the physical hardware require maintenance everything on top is fucked or need to be moved.

For planned maintenance:

- If you're bare metal. You cannot migrate to a working server.

- If you're virtualized, you have to stop the service and restart on another host.

- If you're virtualized on GCE or vSphere datacenter, they'll migrate the VM live while it's running.

GCE has supported live migrations for multiple years. VmWare supports that for about 10 years (yep, no kidding ^^)

> VMs have to deal with servers dying plus evictions plus control and management plane failures, which significantly reduces the availability of a single instance. So strictly speaking, cloud instances typically have lower availability than physical machines.

Bare metal servers have to deal with hardware dying plus evictions plus control and management plane failures, which significantly reduces the availability of a single server.

So strictly speaking, cloud instances and physical servers are the same thing. They both live on physical hardware.

boulos · on Oct 24, 2016

What do you mean recently? We've been doing this for about three years, and it was part of declaring Compute Engine as generally available.

As for "limitations", I mean we can't save you if the NIC or ToR fails, but we lose VMs really rarely. Feel free to ping me internally, and I'll point you at our dashboards.

Disclosure: I work on Google Cloud.

parasubvert · on Oct 24, 2016

In fairness, I could launch EC2 instances as a whole about 10 years ago so the whole of GCP is recent. (Good job on it though, quality is a differentiator)

boulos · on Oct 24, 2016

Agreed! My issue was which word "recently" bound to (GCE or "the universe"). I interpreted it as GCE.

zeroxfe · on Oct 24, 2016

Thanks, yes I know :-) I consider three years to be pretty recent (from the perspective of people managing their own servers.)

drieddust · on Oct 24, 2016

> Servers can stay online for years without issues, same things on AWS and physical servers

Why they don't back it up SLA promise then? Having a VM does not guarantee increased uptime. On premises virtualization can guarantee it,cloud cannot.

parasubvert · on Oct 24, 2016

"Servers can stay online for years without issues, same things on AWS and physical servers."

This is mostly luck of the draw, I typically have seen one VM failure out of 50 every 5-6 months on AWS. Ymmv.

Of course with EBS that usually just means "restart it".

cryptica · on Oct 24, 2016

I understand the benefit of designing software components (and stacks) to run and autoscale on Kubernetes - I actually did that with my open source project SocketCluster. See https://github.com/SocketCluster/socketcluster/blob/master/s...

I think that standardisation should happen at the level of the stack/component (not at the application level). Most application developers don't know enough about specific components like app servers, databases, message queues, in-memory data stores... to be able to effectively configure them to run and scale on K8s (it's difficult and requires deep knowledge of each component).

I think it should be the responsibility of open source project owners to standardize their components to run and autoscale on K8s. It's not practical to delegate this responsibility to application developers (whose primary focus is business logic).

Application developers should be able to use an OSS stack/component at scale on K8s without having to understand the details of how that stack/component scales itself.

So for example, if I wanted to run Redis as a cluster on K8s, I should be able to just upload some .yaml files (provided in the Redis repo) and it should all just work - Then I can start storing data inside Redis cluster straight away (without having to understand how the sharding works behind the scenes).

Rancher has the concept of a 'Catalog' which pretty much embodies this idea.

thorgaardian · on Oct 24, 2016

> I think that standardisation should happen at the level of the stack/component (not at the application level). Most application developers don't know enough about specific components like app servers, databases, message queues, in-memory data stores... to be able to effectively configure them to run and scale on K8s (it's difficult and requires deep knowledge of each component).

Can't agree more with this, but I would add that its not limited to the specific components listed like databases, message queues, and others. Getting any component or service configured to autoscale on K8s and work its way into a larger infrastructure can often require far more working knowledge than should be necessary. Standardizing the interface these components use to publish themselves would help K8s take on this responsibility more fully. I can only speak for myself, but I for one would happily adopt an interface like this if it meant seamless distribution, autoscaling, and consumption for peer components.

The last part about consumption for peers is important as well. Though the standardized interface would empower a higher level of scale automation, the standardization of this automation could be translated to interface assumptions for external components as well. In the Redis example above, a standardized interface for the service would mean that K8s can deploy it automatically, but also that other services can make similar assumptions about it's location in a deployed environment.

jdc · on Oct 24, 2016

Would be cool if this were a thing nowadays:

https://en.wikipedia.org/wiki/Single_system_image