The 6 principles of cloud computing architecture design, do you follow them?

16-022-supersoniccontract

Cloud computing has now penetrated into almost all industries and application scenarios. We don't necessarily feel the impact of cloud computing directly on our daily life, work, and learning, but as IT infrastructure, it quietly supports the various applications we are using.

We can recognize the overall architecture and service capability of cloud computing from another perspective, which is the cloud computing architecture system, which includes infrastructure, cloud computing operating system, product system (including security and compliance, monitoring and management), solution system, and service system. Based on cloud computing for architecture design, all technical solutions should follow certain principles, which is the goal to be pursued in the architecture design. These include six major principles:

     Reasonable deployment

The deployment of business systems on the public cloud includes the use of virtual machines in the form of cloud hosts, but also includes a stronger performance in the form of physical cloud hosts, hosting services including managed applications, managed physical servers.

Based on historical IT resource status and compliance requirements, many enterprises have not yet gone to the cloud. In response to this situation, the cloud computing operating system is extracted and packaged as independent software and services, and deployed in the user's private environment. Unlike public clouds, which are open for use by "any" user, private deployments are only available to a small number of designated users.

Hybrid architecture can unify the management and scheduling of resources such as public cloud and privatized deployment platforms, traditional VMware, OpenStack virtualization platform or physical servers, etc. Hybrid architecture enjoys the benefits of not changing the local environment and meeting compliance requirements, but also enjoys the advantages of abundant resources and sufficient service capabilities of the cloud platform. Hybrid architecture is also an intermediate state of the current enterprise transformation to the cloud, which will exist for a long time.

     Business continuity

Business continuity mainly refers to the three aspects of high availability, high reliability, and disaster recovery, and the design model is also in accordance with this logic.

High Availability refers to avoiding business interruption through redundancy and other designs when the resources for business operations fail.
Continuous Operations means that the resources running the business are fault-free and the business can continuously provide services.
Disaster Recovery refers to the ability to recover applications and data in different environments when the business operation environment is damaged.

     Resilient Scaling

Tightly coupled systems are not easy to scale, and it is difficult to troubleshoot problems when software bugs and system failures occur, calling each system component with different pressures, and small problems are magnified step by step, which can easily cause the entire business interruption. To keep the system flexible and scalable, we must first decouple the system components, including dynamic data and static data decoupling, and the decoupled components can realize functional unitization, each in its own way.

After decoupling, the components and services are then extended, and the migration of applications and data is also counted as the extension of the whole system, from one environment to another, and the system should remain resilient to scale and be able to implement migration quickly when it is needed. Finally, there is also a balance, and a unified access portal is needed after component decoupling, resource and service expansion to shield the bottom decoupling and expansion from the interface inconsistency and other problems.

     Performance efficiency

Very many solutions and cases involve the challenges to performance brought by high concurrency and traffic surge. In performance efficiency, the main goal is to discover and improve the performance of applications and improve the efficiency of resources and components.

The first is computational performance, which improves standalone performance by using highly configured cloud hosts or physical cloud hosts, and extends the overall service performance through cluster form. Second is storage and caching, through Redis to cache hot data, store temporary state data, in-memory computing can improve business performance. Once again, network performance is optimized, selecting the optimal data center when the business is deployed globally, and improving network performance based on global infrastructure network, CDN and global application acceleration to obtain the request acceleration effect.

Finally, we introduce application performance monitoring and stress testing to evaluate the current performance status, discover bottlenecks, and solve problems from the application perspective.

     Security Compliance

Security compliance on the one hand is to meet their own needs for business security protection, and on the other hand is to meet the compliance requirements of security regulation, and these two aspects will be crossed together in the concrete implementation.

First, from the user account and permission management, appropriate accounts and roles are assigned to the right people, and minimum privileges are granted; appropriate public keys, private keys and privileges are assigned to programs or people accessed through API or CLI; and Tokens for temporary access to object storage files are strictly managed.

Secondly, there are terminal security, data security, network security, application security, and auditing of logs, behaviors, and database operations in the whole security system. Finally, there are the requirements of Equal Protection 2.0, website filing requirements, and the system to meet the business and data privacy requirements of various regions such as GDPR.

     Continuous Operation

The resources and services provided by the cloud platform have SLAs, and the SLA of cloud hosting is usually 99.95%. The business systems built by users are based on the SLAs of cloud resources and cloud services, on which business systems with higher availability and reliability are built. For their own business systems, they also need to develop SLAs to indicate service availability or other indicators. After developing SLAs for the user's business, they can set high availability limit values according to SLA thresholds, comprehensively assess the overall business service availability and data reliability, and specify failure contingency measures.

Cloud resources, cloud services, events and user applications will be monitored in continuous operation and alarms will be set. When the alarm conditions are reached, the alarms will be given to the callback function, which can realize automated fault handling or corresponding contingency plans and reduce manual intervention.