Redundant Computer Systems

In this article we will see an introduction of different techniques that are used for computer systems are available and can be accessed even when some part of the system fails.

When you have critical systems that have to be available and running 24 hours a day, 365 days a year, try to minimize failures that may affect the normal operation of the system. Failures are going to happen, but there are techniques and configurations that help to have redundant systems, in which certain parts can fail without this affecting the operation of the same one.

In a current computer system, there are many components necessary for this to work, the more components, the more likely we have that something fails. These problems can occur in the server itself, disk failures, power supplies, network cards, etc and in the infrastructure necessary for the server to be used, network components, Internet access, electrical system,….

To continuation we will go commenting some of the techniques used to obtain redundant systems. The degree of redundancy of a system will depend on its importance and the money we lose when the system is not available for a failure. It is not worth investing in ‘ redundancy ‘, if the investment needed to have a redundant system costs more than you lose in money, reputation and hours of work, if the system failed.

The techniques and configurations we talk about below are not unique to Linux systems. They can be applied in their vast majority to other operating systems and platforms.

Server Component Redundancy

The most normal redundant components on a server are usually disks, network cards and power supplies. There are servers with multiple CPUs that even continue to work without problems with any CPU or memory module damaged.

Disks

Hard disks are the devices where the data is recorded. The most common fault on a server is the failure of a hard drive. If the server has a disk and this fails, the server fails to complete and we will not be able to access the data contained in it. There are techniques that help us to minimize this problem and that the server continues to work and does not lose data even when a hard disk fails. The most normal also, is that you can replace the disks that fail without having to turn off the server (HOTSWAP).

The most common technique is the call RAID (redundant array of independent disks) [Spanish | English]. With this technique we create a set of redundant disks that can help us, both to increase the speed and performance of the storage system, and to keep the system running even if some disk fails. There are software and hardware implementations and different RAID configurations, the most common being RAID1, RAID5 and RAID10.

Network cards

The network card is the device that allows the server to communicate with the rest of the world. It is therefore very common that the servers have at least 2 network cards, to ensure that this communication is not cut in case of failure of one of the cards.

In Linux there is also a technique called ‘ bonding ‘, by which we can use 2 or more network cards as if they were a single device, adding the capacities of them and having redundancy in the case that any of the cards fails.

Sources of food

The power supply is in charge of supplying electricity to the server. It is also common that the servers have 2 or more sources of power connected to different electrical systems, to guarantee the supply in the case that one of the sources or one of the electrical systems fail. The most common thing is that you can replace the power sources that fail without having to turn off the server (HotSwap). Other system components such as routers, switches, disc cabinets, etc. usually use the same redundancy technique.
Redundancy in electrical supply

Every electrical component, and a server could not be less, needs a constant supply of electricity to operate. Failures in this supply, even for very short periods of time, will have catastrofales consequences for our system. And not only do we need a constant supply, we also need not to have ups and downs brusquedas that can damage electronic components.

To achieve this you can use different components according to the degree of protection we want.

UPS: These are more or less advanced batteries that connect between the server and the power supply. They guarantee a constant and stable supply for a while, depending on the capacity of the same.
Electric generators: They are generally operated with diesel and are connected between the UPS and the electrical supply network. They only come into operation when the supply is cut for more than a certain time. They can supply electricity for an indefinite time as long as they have fuel in the tank.
Independent supply lines: in large data centers, there are usually at least 2 separate and independent connections to the power supply network.

If we want redundancy in the electrical system, needless to say that not only the servers have to have double connections, routers, switches and ultimately any component of the system that uses electricity should have sources of power Redundant (connected). As they say, your system will only be as safe, stable and redundant as the weakest component of it. It is not the first time, for example, that in a data center, groups of servers with redundancy at all levels have been left incommunicado because they were connected to a switch that has failed to have a redundant system of electrical supply.

Redundancy in network components

It is no use having servers with duplicate and redundant components and a constant electrical supply and equilibrarado if some of the components of the network fail and we cannot access the server.

The most normal components in a network are:

  • Routers — is a device that interconnects network segments or entire networks
  • Switch — is a device that interconnects two or more network segments
  • NIC or network card: an electronic device that allows a DTE (Data Terminal Equipment), computer or printer, to access a network and share resources
  • Network Cables: To interconnect the different components, there are many and varied types, the most common being the twisted pair cable and the optical fiber
  • Connection lines: To wide area network, WAN (e.g. Internet)

Any of these components may fail, leaving the system incommunicado. But there are techniques to prevent this from happening, what is usually done is to configure the network, so that there are at least 2 different paths between two components A and B. In the following graph you have a schema, in which you can see how to configure a network with double redundancy from the server to the Internet. This way you can damage a router, a switch and a network card at the same time without losing connectivity. The same scheme could be extended to have triple or quadruple redundancy of the components.

Server redundancy, load balancing

What happens if the power supply works and the network works, but our server fails in such a way that none of the redundant components that it has can avoid the failure and the fall of it. There are different types of configurations with multiple servers that can help us with this problem. are called clusters, there are different types, but among the most usales is the balancing of loads with fault tolerance. In this type of clusters, not only does it not matter that one or several of the servers stop working, but if we need more resources to provide a service, we can incorporate new servers that increase the process capacity of the cluster.

The most important components of this type of clusters are, the single storage systems between all the servers that provide a service and the load balancing device, which can be a specific hardware for this work or implementable by Software on a normal server. The most important Linux project on this topic is the so-called Linux virtual Server (LVS).

Below you have a number of examples of how these clusters can be organized, where the failure of a server, not for the operation of a service. When one or more servers fail in the cluster, the process capacity is reduced, so it is important to always have some unused capacity so that in the event of a failure the response time is not reduced much.

An example of a cluster with load balance connected to a disk array to store the information. Typical use for file and Web servers.

Software Development

LTD “Fraiteg” has a great positive experience in the development of both conventional and large cloud cross-platform client-server software.

Supported operating systems: Windows, Linux, MacOS, Android, iOS, SmartTV.

Supported devices: any computer and mobile devices, TVs, could be considered the possibility of writing programs for other devices.

To develop a software means to build it simply by its description. This is a very good reason to consider software development activity as an engineering. On a more general level, the relationship between a software and its environment is clear because the software is introduced in the world so as to provoke certain effects on it.

Those parts of the world that will affect the software and be affected by it will be the application domain. This is where users or customers will see if the software development has fulfilled its purpose.

One of the biggest shortcomings in the practice of software construction is the little attention that is given to the discussion of the problem. In general the developers focus on the solution leaving the problem unexplored. The problem to be solved must be deduced from its solution.

This solution-oriented approach can work in fields where all problems are well known, classified and investigated, where innovation is seen in the detection of new solutions to old problems.

But software development is not a field with such characteristics. The versatility of computers and their rapid evolution means that there is a repertoire of problems that are constantly changing and whose software solution is of great importance.

Software Development

When you develop a software involved many people as it is the client who is the one who has the problem in your company and want it to be solved, for this there is the system analyst who is in charge of getting all the requirements and Needs that the client has to the programmers who are the people in charge of carrying out the coding and design of the system and then testing it and installing it to the client. This is how several people intervene as a single person could not determine all the necessary as sure as you need some requirement or some part of the new system and the more are involved better to cover all the requirements of system.

Process

The software development process is shown graphically at the top, then a brief explanation of it will be developed.

The first step of the process is the analysis, this is where the analyst gets in touch with the company to see how this conformed, which is dedicated, know all the activities it does in itself, know the company in a general way to later see what their needs Needs or requirements that the company has at that time in order to perform an analysis of it.

It is important to know what are the requirements that the company has because many times the systems are developed but not thinking about the client and that is where the system does not meet or does not meet the needs that exist in the company, according to the requirements Start to make the relational diagram all must have a logical sequence of activities, all this is done manually to see how it will be its logical design and screen design is in this step where it is plasma everything and is perfectly defined as It’s going to do the system functionality.

The second step is the design here enter the whole system design ie the screens, database, all this must comply with certain standards which are taken into account to develop the design with quality and thus be able to offer a friendly design in cues Tión, button sizes, text boxes, etc.

The third step is the coding is here where you develop all the system code by the programmer this is done and depending on each programmer and that each programmer has its bases or ways to do it but if they must all reach the same goal of Provide functionality to the system as long as it adheres to the customer’s specifications.

The fourth step is the evidence, is where the system is tested as its word says so you can know what are the possible errors that are being generated from the system and thereby improve it to eliminate all errors that can be presented by a program co No less mistakes higher quality can get to have.