Linx Network – Network Services

Ensure network resiliency and uptime

Сетевые услуги на базе ЦОДов Linx

DIA & IPT - dedicated Internet access

Support for network infrastructure

Shared DIA -Internet access for testing and development

Point-to-point channel

Network services at the customer site

Case

Upgrading the network infrastructure for a large enterprise

How to upgrade networking equipment at a large enterprise without stopping production? 

Oleg Fedorov, Project Manager of Linxdatacenter, talks about the large-scale project in "open heart surgery" mode. 

Get a quote

Licenses and certificates

Upgrading the network infrastructure for a large enterprise

How to upgrade networking equipment at a large enterprise without stopping production?Oleg Fedorov, Project Manager at Linxdatacenter, tells about the large-scale project carried out in an “open heart surgery” mode. 

Over the past few years, we have seen increased customer demand for services related to the network component of the IT infrastructure. The need for connectivity of IT systems, services, applications, monitoring and operational business management tasks in almost any area are forcing companies today to pay increased attention to networks.  

The range of requests is from network resiliency to creation and management of client autonomous system with purchase of IP-address blocks, setting up routing protocols and traffic management according to the policies of organizations.

There is also a growing demand for integrated solutions for building and maintaining network infrastructure, primarily on the part of customers whose network infrastructure is built from scratch or is obsolete, requiring major modifications. 

This trend coincided with the period of development and complication of Linxdatacenter's own network infrastructure. We expanded the geography of our presence in Europe by connecting to remote sites, which, in turn, required improvements to our network infrastructure. 

The company has launched a new service for clients, Network-as-a-Service: we take care of all of our clients' networking needs, allowing them to focus on their core business.

In the summer of 2020, the first big project in this direction, which I would like to tell you about, was completed. 

At the start 

A large industrial complex approached us to upgrade the network part of the infrastructure at one of its enterprises. It was necessary to replace old equipment with new equipment, including the network core.

The last equipment upgrade at the company took place about 10 years ago. The new management of the company decided to improve connectivity, starting with an upgrade of the infrastructure at the most basic, physical level. 

The project was divided into two parts: upgrading the server fleet and network equipment. We were responsible for the second part. 

The basic requirements for the work included minimizing the downtime of the production lines of the company during the execution of the work (and in some areas even eliminating downtime altogether). Any stoppage is a direct monetary loss to the client, which should not happen under any circumstances. Due to the facility's 24x7x365 operation mode, and taking into account the total absence of any periods of planned downtime in the company's practice, we were tasked to perform, in fact, an open-heart operation. This was the main distinguishing feature of the project.

Let's go

The work was planned according to the principle of movement from the nodes of the network distant from the core to closer, as well as from less affecting the work of production lines to affect this work directly. 

For example, if we take a network node in the sales department, then a connection failure as a result of work in this department will not affect production. At the same time such an incident will help us as a contractor to check the correctness of the chosen approach to work on such nodes and, having corrected the actions, to work on the next stages of the project. 

It is necessary not only to replace the nodes and wires in the network, but also to properly configure all the components for the correct operation of the solution as a whole. It was the configurations that were tested in this way: starting work at a distance from the core, we kind of gave ourselves the "right to make a mistake", without putting the critical areas of the enterprise at risk. 

We identified areas that do not affect the production process, as well as critical areas - shops, loading and unloading unit, warehouses, etc. In the key areas, we agreed with the client on the permissible downtime for each network node individually: from 1 to 15 minutes. It was impossible to completely avoid disconnecting individual network nodes because the cable had to be physically switched from old equipment to new equipment, and in the process of switching it was also necessary to untangle the "beard" of wires that had formed during several years of operation without proper maintenance (one of the consequences of outsourcing cable line installation work).

The work was divided into several stages.

Stage 1 - Audit. Preparation and approval of the work planning approach and assessment of the readiness of the teams: the client, the contractor performing the installation, and our team.

Stage 2 - Development of a format for the work, with in-depth detailed analysis and planning. We chose a checklist format with the exact order and sequence of actions, down to the sequence of switching patch cords on the ports.

Stage 3 - Conducting work in cabinets that do not affect production. Estimating and adjusting downtime for subsequent work steps.

Stage 4 - Carrying out work in cabinets directly affecting production. Estimating and adjusting downtime for the final stage of work.

Stage 5 - Performing work in the server room to switch the remaining hardware. Running the routing on the new kernel.

Stage 6 - Consistently switched the system core from the old network configurations to the new ones for a smooth transition of the entire system complex (VLANs, routing, etc.). At this stage we connected all users and transferred all services to the new hardware, checked the correctness of connections, made sure that none of the enterprise services stopped, guaranteed that in case of any problems they would be connected directly to the core, which facilitated troubleshooting of possible problems and final configuration. 

Wire beard hairstyle

The project turned out to be difficult also because of the difficult initial conditions. 

First of all, there were a huge number of network nodes and sections, with a confusing topology and classification of wires according to their purpose. Such "beards" had to be pulled out of cabinets and painstakingly "combed," figuring out which wire leads from where and to where. 

It looked something like this:

or:

Or like this: 

Second, for each such task it was necessary to prepare a file describing the process. "Take wire X from port 1 of the old equipment, plug it into port 18 of the new equipment." Sounds simple, but when you have 48 completely clogged ports in the raw data, and there is no idle option (we remember the 24x7x365), the only way out is to work in blocks. The more wires you can pull out of your old equipment at one time, the faster you can comb them and put them into the new network "hardware", avoiding network failures and downtime. 

Therefore, during the preparatory phase, we broke down the network into blocks - each of them belonged to a particular VLAN. Each port (or its subset) on the old equipment is one of the VLANs in the new network topology. We grouped them this way: the first ports of the switch housed user networks, in the middle - production networks, and in the latter - access points and uplinks. 

This approach made it possible to pull and comb out 10-15 wires from old equipment in one go. This speeded up the work process several times over.  

By the way, this is what the wires in the cabinets look like after combing them: 

Or, for example, like this: 

After the completion of Phase 2, we took a break to analyze the errors and dynamics of the project.  For example, small defects appeared at once due to inaccuracies in the network diagrams provided to us (the wrong connector on the diagram - the wrong purchased patch cord and the need to replace it). 

The pause was necessary because when working with the server right even a small disruption in the process was unacceptable. If the goal was to ensure that the downtime for a section of the network did not exceed 5 minutes, then it was impossible to exceed it. Any possible deviation from the schedule had to be coordinated with the client. 

However, the pre-planning and breakdown of the project into blocks made it possible to meet the planned downtime at all sites, and in most cases to do without it at all. 

The Challenge of Time - a project under COVID 

However, it was not without additional complications. Of course, one of the obstacles was the coronavirus. 

The work was complicated by the fact that there was a pandemic, and it was impossible for all the specialists involved in the process to be present at the customer site during the work. Only employees of the installation organization were allowed on site, and monitoring was done through a room in Zoom - there was a network engineer from the Linxdatacenter side, myself as the project manager, a network engineer from the customer side responsible for the work, and the team performing the installation work.

In the course of work, unrecorded problems arose, and we had to make adjustments on the fly. In this way, we were able to quickly prevent human error (errors in the schematic, errors in determining the activity status of the interface, etc.).

Although the remote format of the work seemed unusual at the beginning of the project, we quickly adapted to the new conditions and reached the final stage of work. 

We ran a temporary network configuration to run the two network kernels in parallel, the old and the new, in order to make a smooth transition. However, it turned out that one extra line in the configuration file of the new kernel had not been removed, and the transition did not take place. This caused us to spend some time looking for the problem. 

It turned out that the main traffic was transmitted correctly, while the control traffic was not reaching the node through the new core. Thanks to the clear division of the project into stages, it was possible to quickly identify the network section where the problem arose, identify the problem and fix it. 

And as a result

Technical results of the project 

First of all, a new core of the new enterprise network was created, for which we built physical/logical rings. This was done in such a way that each switch in the network had a "second arm". In the old network, many switches were connected to the core by one path, one shoulder (uplink). If it broke, the switch was completely unavailable. And if several switches were connected through one uplink, a failure would take out an entire department or production line in an enterprise. 

In a new network, even a fairly serious network incident will not, in any scenario, "bring down" the entire network or a significant section of it. 

90% of all network equipment has been updated, media converters (converters of signal propagation medium) have been retired, and the need for dedicated power lines to power the equipment has been eliminated by connecting to PoE-switches, where power is supplied through Ethernet wires. 

Also, all optical connections in the server room and in the cabinets in the field - at all key communication nodes - were marked. This made it possible to prepare a topological diagram of equipment and connections in the network, reflecting its actual state today. 

Network diagram

The most important result in technical terms: quite extensive infrastructure work was carried out quickly, without creating any interference in the work of the enterprise and almost unnoticed by its staff. 

Business results of the project

In my opinion, this project is interesting not primarily from the technical but from the organizational side. The difficulty was primarily in planning and thinking through the steps for implementing the project tasks. 

The success of the project let us say that our initiative to develop the networking area within the Linxdatacenter service portfolio is the right choice of the company's development vector. Responsible approach to project management, competent strategy, accurate planning allowed us to perform the work at the proper level. 

Confirmation of the quality of work is a request from the client to continue to provide network modernization services at its remaining sites in Russia.

We at Linxdatacenter were the first in Russia to develop a turnkey solution for backup and disaster recovery of client infrastructure, which allows for a reliable and secure connection between the client and the global cloud service in the shortest possible time, and, if necessary, to get a fully working infrastructure within an average of 20 minutes after the initiation of a disaster recovery plan. 

How can we help you?

Thank you for your inquiry, we will get back to you shortly!

Request Demo Access