Crc error rate

crc error rate

Objectives. • We are interested to study the Undetected Error Probability (Pue) of the CRC used to compute the FCS field of the Ethernet frame. A CRC error occurs when a device (either a network device or a host connected to the network) receives an Ethernet frame with a CRC value in the. For arbitrary multiple errors spanning more than 16 bits, at worst 1 in failures, which is nonetheless over % detection rate. Ultimately, the CRC bits.

watch the video

CRC Error Troubleshooting

The PCIe® Specification Webinar Q&A: Error Detection and Correction with FEC

By Debendra Das Sharma, crc error rate, PCI-SIG® Board Member

The PCI Express® (PCIe®) specification will feature two primary mechanisms to correct errors: Forward Error Correction (FEC) and Cyclic Redundancy Check (CRC). Each byte FLIT comprises of crc error rate of payload which are protected by 8 Bytes of CRC. The bytes of payload and CRC are protected by 6 Bytes of FEC. FEC operates on the principle of sending redundant data that can be deployed to correct some errors at the Receiver while CRC is an error detection code used to detect errors. A receiver uses the FEC to correct any errors in a FLIT after which it applies the CRC check on the Bytes that are protected by the CRC. If a FLIT fails the CRC check, it is eventually corrected through the Link Layer Retry mechanism of PCIe. PCIe technology uses a unique method to achieve low-latency through a combination of relatively lower First Bit Error Rate (FBER) 10-6 combined with a lightweight, low-latency FEC to complete the initial correction. This blog provides detailed answers to questions about FEC that were asked during the PCIe Specification webinar.

  1. Will a higher Bit Error Rate (BER) such as 10-4 provide more channel length?

We have conducted extensive studies before settling on the 10-6 First Bit Error Rate (FBER). As mentioned in the presentation, 10-6 is a critical number to meet the latency requirement of FEC and Cyclic Redundancy Check (CRC) to be less than 2ns and reducing the bandwidth overhead in line with a less than 2% impact, crc error rate. Another point to note is that using BER would be about an order of magnitude worse than the FBER, due to burst errors in a Lane as well as Lane to Lane correlation. If the FBER was relaxed we would need a networking style FEC, even if we have the retry, to keep the retry probability less than 1E Based on our analysis, we are confident of securing the existing channel reach to 1E-6 FBER. For longer channels we can deploy Retimers.

Based on our experience over the last two decades, channels always improve over time. We always produce better materials with lower loss characteristics but, once we make a target FBER and deploy the FEC/CRC accordingly, that does not change over time. The FBER values are set for the life of the technology so, we needed to make the right set of trade-offs. A higher FBER might provide an extra inch or two of channel reach. However, that gain was not worth the loss of area, performance, cost, power penalty and above all, the substantial segment of latency and power sensitive usage models. The key metrics have crc error rate met, including channel reach, even with today’s materials that are deployed in volume.  

  1. How does a CRC error identify which byte has the error? 

The Cyclic Redundancy Check (CRC) evaluation happens after FEC decode and correction. Since FEC can correct errors, it has to know the exact location and magnitude of error, in order to perform correction. Therefore, its detection ability is limited. On the other hand, CRC is deployed to detect errors irrespective of where the errors occurred. As a result, the detection ability is much stronger. Once a FLIT fails the CRC check, crc error rate, it will be replayed. Upon replay, the FLIT is corrected.

  1. What is the code gain of the low latency FEC used in the PCIe specification?

PCI-SIG® deploys a lightweight FEC for correction. The goal was to pay crc error rate to zero latency penalty and then rely on a very robust CRC for detection, combined with a fast link level replay to handle any errors that the FEC could not correct. As long as the replay probability of a FLIT is around 10-6, there is no appreciable performance impact either due to the FEC latency or the replay latency in case of an undetected error. A combination of FBER of 10-6 with a three-way interleaved single symbol correct FEC gets us to this solution space. Unlike other standards, PCI-SIG does not rely on FEC alone for correction, nor do we view FEC as a means to obtain code gain in the channel. Instead, we leverage a combination of FEC correction and CRC detection that results in a replay that effectively corrects.

  1. Why does FEC force the move to the use of FLITs?

FEC works on a fixed number of symbols. If the code size were dynamically variable, we would need some kind of framing token with its independent FEC protection to say how many symbols the next FEC code size was. However, this would result in a very inefficient interconnect. Once we decided on fixed sized symbols protected by FEC, it was easy to move to FLITs since they are of fixed size. The FLIT is the basic unit of transfer where there can be variable crc error rate transactions or data link payload etc.

  1. What frequency was adopted to keep FEC latency within 2ns?

The Link frequency is 64 GT/s. The FEC logic can be run at any frequency. In general, crc error rate, we expect the logic to run at 1G (or MHz or 2G) and easily reach a latency much better than 2ns. We have run the logic at 1G and could perform the decode and correction in one clock cycle.

  1. Can the FEC be bypassed if the link is running at lower data rates?

The FEC can be bypassed at lower data rates and still result in a robust, operational Link. As the PCIe specification is finalized, PCI-SIG will decide whether it is beneficial to create the complexity associated with a different mode for the lower data rates while in FLIT Mode.

  1. Given that the effective BER with the FEC is still worse than 10 (ten to the power ), will that be an issue?

PCI-SIG does not expect an issue since we have link level retry that will correct the error. It is true that the probability of retry of a FLIT is about three orders of magnitude worse than the prior generations of PCIe specifications with NRZ signaling. However, as long as the retry probability per FLIT is in the range of 10-6 and the retry latency round-trip is in the ns range, we do not expect to see any noticeable performance impact. We are operating on the principle that it is better to keep the latency identical to prior generations and taking the ns latency hit with a probability of 10-6 than adding + ns of latency for each and every FLIT.

  1. What happens to latency when the FEC cannot successfully correct the error and how often this is expected to happen?

When the FEC process cannot successfully correct, the CRC evaluation will detect the error. A negative acknowledgement (NAK) will be issued to the Link Partner, which will then retry the same FLIT from its replay (or retry) buffer, crc error rate. We expect the probability of this event to be in the range of 10-6 and the retry latency round-trip in the ns range.

When the FLIT is correctly received, either the first time or after one or more retries, the Port sends an Ack to its Link Partner, which then retries the FLIT from its replay buffer.

Dive Deeper Into the PCIe Specification

The recording of the PCIe Specification webinar is available to watch anytime on the PCI-SIG YouTube channel. Also, this series of Q&A blogs will continue to provide answers to the questions asked by attendees during the live presentation. Follow PCI-SIG on Twitter on LinkedIn for updates about these blogs.

remainder = CRC Our CRC word is simply the remainder, i.e., the result of the last 6-bit exclusive OR operation, crc error rate. Of course, the leading bit of this result is always 0, so we really only need the last five bits, crc error rate. This is why a 6-bit key word leads to a 5-bit CRC. In this case, the CRC word for this message string isso when I transmit the message word M I will also send this corresponding CRC word. When you receive them you can repeat the above calculation on M with our agreed generator polynomial k and verify that the resulting remainder agrees with the CRC word I included in my transmission. What we've just done is a perfectly fine CRC calculation, and many actual implementations work exactly that way, crc error rate, but there is one potential drawback in our method. As you can see, the computation described above totally ignores any number of "0"s ahead of the first "1" bit in the message. It so happens that many data strings in real applications are likely to begin with a long series of "0"s, so it's a little bothersome that the algorithm isn't working very hard in such cases. To avoid this "problem", we can agree in advance that before computing our n-bit CRC we will always begin by exclusive ORing the leading n bits of the message string with a string of n "1"s. That's really all there is to computing a CRC, and many commercial applications work exactly as we've described. People sometimes use various table-lookup routines to speed up the divisions, but that doesn't alter the basic computation or change the result. In addition, people sometimes agree to various non-standard conventions, such as interpreting the bits in reverse order, or carrying out the division with a string of filler bits appended to the end of the message, but the essential computation is still the same. (Of course, it's crucial for the transmitter and receiver to agree in advance on any unusual conventions they intend to observe.) Now that we've seen how to compute CRC's for a given key polynomial, it's natural to wonder whether some key polynomials work better (i.e., give more robust "checks") than others. From one point of view the answer is obviously yes, because the larger our key word, the less likely it is that corrupted data will go undetected. By appending an n-bit CRC to our message string we are increasing the total number of possible strings by a factor of 2^n, but we aren't increasing the degrees of freedom, since each message string has a unique CRC word. Therefore, we have established a situation in which only 1 out of 2^n total strings (message+CRC) is valid. Notice that if we append our CRC word to our message word, the result is a multiple of our generator polynomial. Thus, of all possible combined strings, only multiples of the generator polynomial are valid. So, if we assume that any corruption of our data affects our string in a completely random way, i.e., such that the corrupted string is totally uncorrelated with the original string, then the probability of a corrupted string going undetected is 1/(2^n). This is the basis on which people say a bit CRC has a crc error rate of 1/(2^16) = E-5 of failing to detect an error in the data, and a bit CRC has a probability of 1/(2^32), which is about E (less than one in a billion). Since most digital systems are designed around blocks of 8-bit words (called "bytes"), it's most common to find key words whose lengths are a multiple of 8 bits. The two most common lengths in practice are bit and bit CRCs (so the corresponding generator polynomials have 17 and 33 bits respectively). A few specific polynomials have come into widespread use. For bit CRCs one of the most popular key words isand for bit CRCs one of the most popular is In the form of explicit polynomials these would be written as x^16 + x^12 + x^5 + 1 and x^32 + x^26 + x^23 + x^22 + x^16 + x^12 + x^11 + x^10 + x^8 + x^7 + x^5 + x^4 + x^2 + x + 1 The bit polynomial is known as the "X25 standard", and the bit polynomial is the "Ethernet standard", and both are widely used in all sorts of applications. (Another common bit key polynomial familiar to many modem operators iswhich is the basis of the "CRC" protocol.) These polynomials are certainly not unique in being suitable for CRC calculations, but it's probably a good idea to use one of the established standards, to take advantage of all the experience accumulated over many years of use. Nevertheless, we may still be curious to know how these particular polynomials were chosen. It so happens that one could use just about ANY polynomial of a certain degree and achieve most of the error detection benefits of the standard polynomials. For example, ANY n-bit CRC will certainly catch any single "burst" of m consecutive "flipped bits" for any m less than n, basically because a smaller polynomial can't be a multiple of a larger polynomial. Also, we can ensure the detection of any odd number of bits simply by using a generator polynomial that is a multiple of the "parity polynomial", which is x+1. A polynomial of our simplified kind is a multiple of x+1 if and only if it has an even number of terms, crc error rate. It's interesting to note that the standard bit polynomials both include this parity check, whereas the standard bit CRC does not. It might seem that this represents a shortcoming of the bit standard, but it really doesn't, because the inclusion of a parity check comes at the cost of some other desirable characteristics. In particular, much emphasis has been placed on the detection of two separated single-bit errors, and the standard CRC polynomials were basically chosen to be as robust as possible in detecting such double-errors, crc error rate. Notice that the basic "error word" E representing two erroneous bits separated by j bits is of the form x^j + 1 or, equivalently, x^j - 1. Also, an error E superimposed on the message M will be undetectable if and only if E is a multiple of the key polynomial k. Therefore, if we choose a key that is not a divisor of any polynomial of the form x^t - 1 for t=1,2,m, then we are assured of detecting any occurrence of precisely two erroneous bits that occur within m places of each other. How would we find such a polynomial? For this purpose we can use a "primitive polynomial". For example, suppose we want to ensure detection of two bits within 31 places of each other, crc error rate. Let's factor the error polynomial x^31 - 1 into it's irreducible components (using our simplified arithmetic with coefficients reduced modulo 2). We find that it splits into the factors x^31 - 1 = (x+1) *(x^5 + x^3 + x^2 + x + 1) *(x^5 + x^4 + x^2 + x + 1) *(x^5 + x^4 + x^3 + x + 1) *(x^5 + x^2 + 1) *(x^5 + x^4 + x^3 + x^2 + 1) *(x^5 + x^3 + 1) Aside from the parity factor (x+1), these are all primitive polynomials, representing primitive roots of x^31 - 1, so they cannot be divisors of any polynomial of the form x^j - 1 for any j less than Notice that x^5 + x^2 + 1 is the generator polynomial for the 5-bit CRC in our first example. Another way of looking at this is via crc error rate formulas. Crc error rate example, the polynomial x^5 + x^2 + 1 crc error rate to the recurrence relation s[n] = (s[n-3] + s[n-5]) modulo 2. Beginning with the initial values this recurrence yields

Understand Cyclic Redundancy Check Errors on Nexus Switches

Introduction

This document describes details surrounding Cyclic Redundancy Check (CRC) errors observed on interface counters and statistics of Cisco Nexus switches.

Prerequisites

Requirements

Cisco recommends that you understand the basics of Ethernet switching and the Cisco NX-OS Command Line Interface (CLI). For more information, refer to one of these applicable documents:

Components Used

The information in this document is based on these software and hardware versions: 

  • Nexus series switches starting from NX-OS software release (8) 
  • Nexus crc error rate switches starting from NX-OS software release (8) 

The information in this document was created from devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, ensure that you understand the potential impact of any command.

The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, crc error rate, ensure that you understand the potential impact of any command.

Background Information

This document describes details surrounding Cyclic Redundancy Check (CRC) errors observed on interface counters on Cisco Nexus series switches. This document describes what a CRC is, crc error rate, how it is used in the Frame Check Sequence (FCS) field of Ethernet frames, how CRC errors manifest on Nexus switches, how CRC errors interact in Store-and-Forward switching and Cut-Through switching scenarios, the most likely root causes of CRC errors, and how to troubleshoot and resolve CRC errors. 

Applicable Hardware

The information in this document is applicable to all Cisco Nexus Series switches. Some of the information in this document can also be applicable to other Cisco routing and switching platforms, such as Cisco Catalyst routers and switches.

CRC Definition

A CRC is an error detection mechanism commonly used in computer and storage networks to identify data changed or corrupted during transmission. When a device connected to the network needs to transmit data, the device runs a computation algorithm based on cyclic codes against the data that results in a fixed-length number. This fixed-length number is called the CRC value, but colloquially, it is often called the CRC for short. This CRC value is appended to the data and transmitted through the network towards another device. This remote device runs the same cyclic code algorithm against the data and compares the resulting value with the CRC appended to the data. If both values match, crc error rate, then the remote device assumes the data was transmitted across the network without being corrupted, crc error rate. If the values do not match, then the remote device assumes the data was corrupted during transmission across the network. This corrupted data cannot be trusted and is discarded.

CRCs are used for error detection across multiple computer networking technologies, such as Ethernet (both wired and wireless variants), Token Ring, Asynchronous Transfer Mode (ATM), and Frame Relay. Ethernet frames have a bit Frame Check Sequence (FCS) field at the end of the frame (immediately after the payload of the frame) where a bit CRC value is inserted. 

For example, consider a scenario where two hosts named Host-A and Host-B are directly connected to each other through their Network Interface Cards (NICs). Host-A needs to send the sentence “This is an example” to Host-B over the network. Host-A crafts an Ethernet frame destined to Host-B with a payload of “This is an example” and calculates that the CRC value of the frame is a hexadecimal value of 0xABCD. Host-A inserts the CRC value of 0xABCD into the FCS field of the Ethernet frame, then transmits the Ethernet frame out of Host-A's NIC towards Crc error rate Host-B receives this frame, it will calculate the CRC value of the frame with the use of the exact same algorithm as Host-A. Host-B calculates that the CRC value of the frame is a hexadecimal value of 0xABCD, which indicates to Host-B that crc error rate Ethernet frame was not corrupted while the frame was transmitted to Host-B. 

CRC Error Definition

A CRC error occurs when a device (either a network device or a host connected to the network) receives an Ethernet frame with a CRC value in the FCS field of the frame that does not match the CRC value calculated by the device for the frame. 

This concept is best demonstrated through an example, crc error rate. Consider a scenario where two hosts named Host-A and Host-B are directly connected to each other through their Network Interface Cards (NICs). Host-A needs to send the sentence “This is an example” to Host-B over the network. Host-A crafts an Ethernet frame destined to Host-B with a payload of “This is an example” and calculates that the CRC value of the frame is the hexadecimal value 0xABCD. Host-A inserts the CRC value of 0xABCD into the FCS field of the Ethernet frame, then transmits the Ethernet frame out of Host-A's NIC towards Host-B.

However, damage on the physical media connecting Host-A to Host-B corrupts the contents of the frame such that the sentence within the frame changes to “This was an example” instead of the desired payload of “This is an example”. 

When Host-B receives this frame, it will calculate the CRC value of the frame including the corrupted payload. Host-B calculates that the CRC value of the frame is a hexadecimal value of 0xDEAD, which is different from the 0xABCD CRC value within the FCS field of the Ethernet frame. This difference in CRC values tells Host-B that the Ethernet frame was corrupted while the frame was transmitted to Host-B. As a result, Host-B cannot trust the contents of this Ethernet frame, so it will drop it. Host-B will usually increment some sort of error counter on its Network Interface Card (NIC) as well, crc error rate, such as the “input errors”, “CRC errors”, or “RX errors” counters. 

Common Symptoms of CRC Errors

CRC errors typically manifest themselves in one of two ways: 

  1. Incrementing or non-zero error counters on interfaces of network-connected devices.
  2. Packet/Frame loss for traffic traversing the network due to network-connected devices dropping corrupted frames.

These errors manifest themselves in slightly different ways depending on the device you are working with. These sub-sections go into detail for each type of device. 

Received Errors on Windows Hosts

CRC errors on Windows hosts typically manifest as a non-zero Received Errors counter displayed in the output of the netstat -e command from the Command Prompt. An example of a non-zero Received Errors counter from the Command Prompt of a Windows host is here: 

>netstat -e
Interface Statistics 

                           Received            Sent 
Bytes                           
Unicast packets                    
Non-unicast packets               0               0 
Discards                          0               0 
Errors                                       0 
Unknown protocols                 0 

The NIC and its respective crc error rate must support accounting of CRC errors received by the NIC in order for the number of Received Errors reported by crc error rate netstat -e command to be accurate. Most modern NICs and their respective drivers support accurate accounting of CRC errors received by the NIC.

RX Errors on Linux Hosts 

CRC errors on Linux hosts typically manifest as a non-zero “RX errors” counter displayed in the output of the ifconfig command. An example of a non-zero RX errors counter from a Linux host is here: 

ifconfig eth0
eth0: flags=<UP,BROADCAST,RUNNING,MULTICAST>  mtu  
        inet   netmask   broadcast  
        inet6 fe  prefixlen 64  scopeid 0x20<link> 
        ether beb  txqueuelen   (Ethernet) 
        RX packets   bytes  crc error rate GiB) 
        RX errors   dropped 0  overruns 0  frame 0 
        TX packets   bytes  ( GiB) 
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0 

CRC errors on Linux hosts can also manifest as a non-zero “RX errors” counter displayed in the output of ip -s link show command. An example of a non-zero RX errors counter from a Linux host is here: 

ip -s link show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu  qdisc mq state UP mode DEFAULT group default qlen  
    link/ether f:6d brd ff:ff:ff:ff:ff:ff 
    RX: bytes  packets  errors  dropped overrun mcast 
                 0        
    TX: bytes  packets  errors  dropped carrier collsns 
    0       0       0       0 
    altname enp11s0 

The NIC and its respective driver must support accounting of CRC errors received by the NIC in order for the number of RX Errors reported by the ifconfig or ip -s link show commands to be accurate, crc error rate. Most modern NICs and their respective drivers support accurate accounting of CRC errors received by the NIC.

CRC Errors on Network Devices

Network devices operate in one of two forwarding modes - Store-and-Forward forwarding mode, and Cut-Through forwarding mode. The way a network device handles a received CRC error differs depending on its forwarding modes. The subsections here will describe the specific behavior for each forwarding mode.

Input Errors on Store-and-Forward Network Devices

When a network device operating in a Store-and-Forward forwarding mode receives a frame, the network device will buffer the entire frame (“Store”) before you validate the frame’s CRC value, make a forwarding decision on the frame, and transmit the frame out of an interface (“Forward”). Therefore, when a network device operating in a Store-and-Forward forwarding mode receives a corrupted frame with an incorrect CRC value on a specific interface, crc error rate, it will drop the frame and increment the “Input Errors” counter on the interface.

In other words, corrupt Ethernet frames are not forwarded by network devices operating in a Store-and-Forward forwarding mode; they are dropped on ingress.

Cisco Nexus and Series switches operate in a Store-and-Forward forwarding mode, crc error rate. An example of a non-zero Input Errors counter and a non-zero CRC/FCS counter from a Nexus or Series switch is here: 

switch# show interface
<snip> 
Ethernet1/1 is up 
  RX 
    unicast packets   multicast packets  5 broadcast packets 
    input packets   bytes 
    0 jumbo packets  0 storm suppression packets 
    0 runts  0 giants   CRC/FCS  0 no buffer 
     input error  0 short frame  0 overrun   0 underrun  0 ignored 
    0 watchdog  0 bad etype drop  0 bad proto drop  crc error rate down drop 
    0 input with dribble  0 input discard 
    0 Rx pause 

CRC errors can also manifest themselves as a non-zero “FCS-Err” counter in the output of show interface counters errors. The "Rcv-Err" counter in the output of this command will also have a non-zero value, which is the sum of all input errors (CRC or otherwise) received by the interface, crc error rate. An example of this is shown here: 

switch# show interface counters errors
<snip> 
 
Port          Align-Err    FCS-Err   Xmit-Err    Rcv-Err  UnderSize OutDiscards 
 
Eth1/1                0               0               0           0 

Input and Output Errors on Cut-Through Network Devices

When a network device operating in a Cut-Through forwarding mode starts to receive a frame, the network device will make a forwarding decision on the frame's header and begin transmitting the frame out of an interface as soon as it receives enough of the frame to make a valid forwarding decision. As frame and packet headers are at the beginning of the frame, this forwarding decision is usually made before the payload of the frame is received. 

The FCS field of an Ethernet frame is at the end of the frame, immediately after the frame’s payload. Therefore, a network device operating in a Cut-Through forwarding mode will already have started transmitting the frame out of another interface by the time it can calculate the CRC of the frame. If the CRC calculated by the network device for the frame does not match the CRC value present in the FCS field, that means the network device forwarded a corrupted frame into the network. When this happens, the network device will increment two counters: 

  1. The “Input Errors” counter on the interface where the corrupted frame was originally received. 
  2. The “Output Errors” counter on all interfaces where the corrupted frame was transmitted. For unicast traffic, this will typically be a single interface – however, for broadcast, multicast, or unknown unicast traffic, crc error rate, this could be one or more interfaces.

An example of this is shown here, where the output of the show interface command indicates multiple corrupted frames were received on Ethernet1/1 of the network device and transmitted out of Ethernet1/2 due to the Cut-Through forwarding mode of the network device: 

switch# show interface
<snip> 
Ethernet1/1 is up 
  RX 
    unicast packets   multicast packets  0 broadcast packets 
    input packets   bytes 
    15 jumbo packets  0 storm suppression bytes 
    0 runts  0 giants   CRC  0 no buffer 
     input error  0 short frame  0 overrun   0 underrun  0 ignored 
    0 watchdog  0 bad etype drop  0 bad proto drop  0 if down drop 
    0 input with dribble  0 input discard 
    0 Rx pause 
  
Ethernet1/2 is up 
  TX 
    unicast packets   multicast packets   broadcast packets 
    output packets   bytes 
    jumbo packets 
     output error  0 collision  0 deferred  0 late collision 
    0 lost carrier  0 no carrier  0 babble  0 output discard 
    0 Tx pause 

CRC errors can also manifest themselves as a non-zero “FCS-Err” counter on the ingress interface crc error rate non-zero "Xmit-Err" counters on egress interfaces in the output of show interface counters errors. The "Rcv-Err" counter on the ingress interface in the output of this command will also have a non-zero value, which is the sum of all input errors (CRC or otherwise) received by the interface, crc error rate. Crc error rate example of this is shown here: 

switch# show interface counters errors 
<snip> 
 
Port          Align-Err    FCS-Err   Xmit-Err    Rcv-Err  UnderSize OutDiscards 
 
Eth1/1                0                0                0           0 
Eth1/2                0          0        crc error rate       0          0           0  

The network device will also modify the CRC value in the frame’s FCS field in a specific manner that signifies to upstream network devices that this frame is corrupt. This behavior is known as “stomping” the CRC. The precise manner in which the CRC is modified varies from one platform to another, but generally, it involves inverting the current CRC value present in the crc error rate FCS field. An example of this is here: 

Original CRC: 0xABCD () 
Stomped CRC:  0x () 

As a result of this behavior, network devices operating in a Cut-Through forwarding mode can propagate a corrupt frame throughout a network. If a network consists of multiple network devices operating in a Cut-Through forwarding mode, a single corrupt frame can cause input error and output error counters to increment on multiple network devices within your network. 

Trace and Isolate CRC Errors

The first step in order to identify and resolve the root cause of CRC errors is isolating crc error rate source of the CRC errors to a specific link between two devices within your network. One device connected to this link will have an interface output errors counter with a value of zero or is not incrementing, while the other device connected to this link will have a non-zero or incrementing interface input errors counter. This suggests that traffic egresses the interface of one device intact is corrupted at the time of the transmission to the remote device, crc error rate, and is counted as an input error by the ingress interface of the other device on the link.

Identifying this link in a network consisting of network devices operating in a Store-and-Forward forwarding mode is a straightforward task. However, identifying this link in a network consisting of network devices operating in a Cut-Through forwarding mode is more difficult, as many network devices will have non-zero input and output error counters. An example of this phenomenon can be seen in the topology here, where the link highlighted in red is damaged such that traffic traversing the link is corrupted. Interfaces labeled with a red "I" indicate interfaces that could have non-zero input errors, while interfaces labeled with a blue "O" indicate interfaces that could have non-zero output errors.

Network topology showing interfaces that could have input and output errors due to a single faulty link connecting to a host.

Identifying the faulty link requires you to recursively trace the "path" corrupted frames follow in the network through non-zero input and output error counters, crc error rate, with non-zero input errors pointing upstream towards the damaged link in the network. This is demonstrated in the diagram here.

Network <i>crc error rate</i> showing how input errors can be traced to identify a single faulty link in a network.

A detailed process for tracing and identifying a damaged link is best demonstrated through an example, crc error rate. Consider the topology here:

Network topology showing two hosts connected through two switches in a series.

In this topology, interface Ethernet1/1 of a Nexus switch named Switch-1 is connected to a host named Host-1 through Host-1's Network Interface Card (NIC) eth0. Interface Ethernet1/2 of Switch-1 is connected to a second Nexus switch, named Switch-2, through Switch-2's interface Ethernet1/2. Interface Ethernet1/1 of Switch-2 is connected to a host named Host-2 through Host-2's NIC eth0.

The link between Host-1 and Switch-1 through Switch-1's Ethernet1/1 interface is damaged, causing traffic that traverses the link to be intermittently corrupted. However, we do not yet know that this link is damaged. We must trace the path the corrupted frames leave in the network through non-zero or incrementing input and output error counters to locate the damaged link in crc error rate network.

In this example, Host-2's NIC reports that it is receiving CRC errors.

Host-2$ ip -s link show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu  qdisc mq state UP mode DEFAULT group default qlen  
    link/ether f:6d brd ff:ff:ff:ff:ff:ff 
    RX: bytes  packets  errors  dropped overrun mcast 
                 0        
    TX: bytes  packets  errors  dropped carrier collsns 
    0       0       0       0 
    altname enp11s0 

You know that Host-2's NIC connects to Switch-2 via interface Ethernet1/1. You can confirm that interface Ethernet1/1 has a non-zero output errors counter with the show interface command.

Switch-2# show interface <snip> Ethernet1/1 is up admin state is crc error rate, Dedicated Interface RX unicast packets multicast packets broadcast packets input packets bytes 0 jumbo packets 0 storm suppression bytes 0 runts 0 giants 0 CRC 0 no buffer 0 input error 0 short frame 0 overrun 0 underrun 0 ignored 0 watchdog 0 bad etype drop 0 bad proto drop 0 if down drop 0 input with dribble 0 input discard 0 Rx pause TX unicast packets multicast packets broadcast packets output packets bytes 0 jumbo packets terror ist ein gespenst error 0 collision 0 deferred 0 late collision 0 lost carrier 0 no carrier 0 babble 0 output discard 0 Tx pause

Since the output errors counter of interface Ethernet1/1 is non-zero, crc error rate is most likely another interface of Switch-2 that has a non-zero input errors counter. You can use the show interface counters errors non-zero command in order to identify if any interfaces of Switch-2 have a non-zero input errors counter.

Switch-2# show interface counters errors non-zero <snip> Port Align-Err FCS-ErrXmit-ErrRcv-Err UnderSize OutDiscards Eth1/1 0 0 0 0 0 Eth1/2 0 0 0 0 Port Single-Col Multi-Col Late-Col Exces-Col Carri-Sen Runts Port Giants SQETest-Err Deferred-Tx IntMacTx-Er IntMacRx-Er Symbol-Err Port InDiscards

You can see that Ethernet1/2 of Switch-2 has a non-zero input errors counter. This suggests that Switch-2 receives corrupted traffic on this interface. You can confirm which device is connected to Ethernet1/2 of Switch-2 through the Cisco Discovery Protocol (CDP) or Link Local Discovery Protocol (LLDP) features. An example of this is shown here with the show cdp neighbors command.

Switch-2# show cdp neighbors <snip> Capability Codes: R - Router, T - Trans-Bridge, B - Source-Route-Bridge S - Switch, H - Host, I - IGMP, crc error rate, r - Repeater, crc error rate, V - VoIP-Phone, D - Remotely-Managed-Device, s - Supports-STP-Dispute Device-ID Local Intrfce Hldtme Capability Platform Port Crc error rate Switch-1(FDO) Eth1/2 R S I s N9K-CYC- Eth1/2

You now know that Switch-2 is receiving corrupted traffic on its Ethernet1/2 interface from Switch-1's Ethernet1/2 crc error rate, but you do not yet know whether the link between Switch-1's Ethernet1/2 and Switch-2's Ethernet1/2 is damaged and causes the corruption, or if Switch-1 is a cut-through switch forwarding corrupted traffic it receives. You must log into Switch-1 to verify this.

You can confirm Switch-1's Ethernet1/2 interface has a non-zero output errors counter with the show interfaces command.

Switch-1# show interface <snip> Ethernet1/2 is up admin state is up, Dedicated Interface RX unicast packets multicast packets broadcast packets input packets bytes 0 jumbo packets 0 storm suppression bytes 0 runts 0 giants 0 CRC 0 no buffer 0 input error 0 short frame 0 overrun 0 underrun 0 ignored 0 watchdog 0 bad etype drop 0 bad proto drop 0 if down drop 0 input with dribble 0 input discard 0 Rx pause TX unicast packets multicast packets 72 broadcast packets output packets bytes 0 jumbo packets crc error rate output error 0 collision 0 deferred 0 late collision 0 lost carrier 0 no carrier 0 babble 0 output discard 0 Tx pause

You can see that Ethernet1/2 of Switch-1 has a non-zero output errors counter. This suggests that the link between Switch-1's Ethernet1/2 and Switch-2's Ethernet1/2 is not damaged - instead, Switch-1 is a cut-through switch forwarding corrupted traffic it receives on some other interface. As previously demonstrated with Switch-2, you can use the show interface counters errors non-zero command in order to identify if any interfaces of Switch-1 have a non-zero input errors counter.

Switch-1# show interface counters errors non-zero <snip> Port Align-Err FCS-ErrXmit-ErrRcv-Err UnderSize OutDiscards Eth1/1 0 0 0 0 Eth1/2 0 0 0 0 0 Port Single-Col Multi-Col Late-Col Exces-Col Carri-Sen Runts Port Giants SQETest-Err Deferred-Tx IntMacTx-Er IntMacRx-Er Symbol-Err Port InDiscards

You can see that Ethernet1/1 of Switch-1 has a non-zero input errors counter. This suggests that Switch-1 is receiving corrupted traffic on this interface, crc error rate. We know that this interface connects to Host-1's eth0 NIC, crc error rate. We can review Host-1's eth0 NIC interface statistics to confirm whether Host-1 sends corrupted frames out of this interface.

Host-1$ ip -s link show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu  qdisc mq state UP mode DEFAULT group default qlen  
    link/ether f:6d brd ff:ff:ff:ff:ff:ff 
    RX: bytes  packets  errors dropped overrun mcast 
     0       0     0        
    TX: bytes  packets  errors  dropped carrier collsns 
    0       0       0       0 
    altname enp11s0 

The eth0 NIC statistics of Host-1 suggest the host is not transmitting corrupted traffic. This suggests that the link between Host-1's eth0 and Switch-1's Ethernet1/1 is damaged and is the source of this traffic corruption. Crc error rate troubleshooting will need to be performed on this link to identify the faulty component causing this corruption and replace it.

Root Causes of CRC Errors

The most common root cause of CRC errors is a damaged or malfunctioning component of a physical link between two devices. Examples include:

  • Failing or damaged physical medium (copper or fiber) or Direct Attach Cables (DACs).
  • Failing or damaged transceivers/optics.
  • Failing or damaged patch panel ports.
  • Faulty network device hardware (including specific ports, line card Application-Specific Integrated Circuits [ASICs], Media Access Controls [MACs], fabric modules, etc.),
  • Malfunctioning network interface card inserted in a host.

It is also possible for one or more misconfigured devices to inadvertently causes CRC errors within a network. One example of this is a Maximum Transmission Unit (MTU) configuration mismatch between two or more devices within the network causing large packets to be incorrectly truncated, crc error rate. Identifying and resolving this configuration issue can correct CRC errors within a network as well.

Resolve CRC Errors

You can identify the specific malfunctioning component through a process of elimination:

  1. Replace the physical medium (either copper or fiber) or DAC with a known-good physical medium of the same type.
  2. Replace the transceiver inserted in one device's interface with a known-good crc error rate of the same model. If this does not resolve the CRC errors, replace the transceiver inserted in the other device's interface with a known-good transceiver of the same model.
  3. If any patch panels are used as part of the damaged link, crc error rate, move the link to a known-good port on the patch panel. Alternatively, eliminate the patch panel as a potential root cause by connecting the link without using the patch panel if possible.
  4. Move the damaged link to a different, crc error rate, known-good port on each device. You will need to test multiple different ports to isolate a MAC, ASIC, or line card failure.
  5. If the damaged link involves a host, move the link to a different NIC on the host. Alternatively, connect the damaged link to a known-good host to isolate a failure of the host's NIC.

If the malfunctioning component is a Cisco product (such as a Cisco network device or transceiver) that is covered by an active support contract, you can open a support case with Cisco TAC detailing your troubleshooting to have the malfunctioning component replaced through a Return Material Authorization (RMA).

Related Information

--> cycle repeats Notice that the sequence repeats with a period of 31, which is another consequence of the fact that x^5 + x^2 + 1 is primitive. You can also see that the sets of five consecutive bits run through all the numbers from 1 to 31 before repeating. In contrast, the polynomial x^5 + x + 1 corresponds to the recurrence s[n] = (s[n-4] + s[n-5]) modulo 2, and gives the sequence

Dell EMC VxRail: High vSAN latency and CRC errors on network nic stats

This article may have been automatically translated. If you have any feedback regarding its quality, crc error rate, please let us know using the form at the bottom of this page.

Article Content


Symptoms

  • From cluster > Monitor > vSAN Crc error rate MTU check (ping with large packet size): ===> see KB VxRail: MTU check (ping with large packet size)
  • Hosts large ping test (MTU check) Fails intermittently
  • High CRC  Errors seen on  vSAN vmnics states counters

esxcli network nic stats get -n vmnic1 

host1
NIC statistics for vmnic1:
Receive packets dropped: 0
Transmit packets dropped: 0
Total receive errors:
Receive length crc error rate 0
Receive over errors: 0
Receive CRC errors:
Receive frame errors: 0

host2
NIC statistics for vmnic1:
Receive packets dropped: 0
Transmit packets dropped: 0
Total receive errors:
Receive length errors: 0
Receive over errors: 0
Receive CRC errors:
Receive frame errors: 0


host3
NIC statistics for vmnic1:
Receive packets dropped: 0
Transmit packets dropped: 0
Total receive errors: 0
Receive length errors: 0
Receive over errors: 0
Receive CRC errors: 0
Receive frame errors: 0


host4
NIC statistics for vmnic1:
Receive packets dropped: 0
Transmit packets dropped: 0
Total receive errors:
Receive length errors: 0
Receive over errors: 0
Receive CRC errors:

Cause

High CRC Errors are symptoms of Layer 1 issues. It could be a defective cable, SFP or NIC

The above stats is an indication that host1, 2 and 4 were receiving CRC (inbound) from host3 and therefore host 3 may be the source of the CRC errors causing vSAN performance issues.

Resolution

1. To see CRC Errors increment run command: watch esxcli network nic stats get -n=vmnic1

Cyclic Redundancy Checks

A re-formatted version of this article can be found here. One of the most popular methods of error detection for digital signals is the Cyclic Redundancy Check (CRC), crc error rate. The basic idea behind CRCs is to treat the message string as a single binary word M, and divide it by a key word k that is known to both the transmitter and the receiver. The remainder r left after dividing M by k constitutes the "check word" for the given message. The transmitter sends both the message string M and the check word r, crc error rate, and the receiver can then check the data by repeating the calculation, dividing M by the key word k, and verifying that the remainder is r. The only novel aspect of the CRC process is that it uses a simplified form of arithmetic, which we'll explain below, in order to perform the division. By crc error rate way, this method of checking for errors is obviously not foolproof, because there are many different message strings that give a remainder of r when divided by k, crc error rate. In fact, crc error rate, about 1 out of every k randomly selected strings will give any specific remainder. Thus, if our message string is garbled in transmission, crc error rate, there is a chance (about 1/k, assuming the corrupted message is random) that the garbled version would agree with the check word. In such a case the error would go undetected. Nevertheless, by making k large enough, crc error rate, the chances of a random error going undetected can be made extremely small. That's really all there is to it, crc error rate. The rest of this discussion will consist simply of refining this basic idea to optimize its effectiveness, crc error rate, describing the simplified arithmetic that is used to streamline the computations for maximum efficiency when processing binary strings. When discussing CRCs it's customary to present the key word k in the form of a "generator polynomial" whose coefficients are the binary bits of the number k. For example, suppose we want our CRC to use the key k= This number written in binary iscrc error rate, and expressed as a polynomial it is x^5 + x^2 + 1. In order to implement a CRC based on this polynomial, the transmitter and receiver must have agreed in advance that this is the key word they intend to use. So, for the sake of discussion, let's say we have agreed to use the generator polynomial By the way, it's worth noting that the remainder of any word divided by a 6-bit word will contain no more than 5 bits, so our CRC words based on the polynomial will always fit into 5 bits. Therefore, a CRC system based on this polynomial would be called a "5-bit CRC". In general, a polynomial with k bits leads to a "k-1 bit CRC", crc error rate. Now suppose I want to send you a message consisting of the string of bits M =and I also want to send you some additional information that will allow you to check the received string for correctness. Using our agreed key word k=, crc error rate, I'll simply "divide" M by k to form the remainder r, which will constitute the CRC check word. However, I'm going to use a simplified kind of division that is particularly well-suited to the binary form in which digital data is expressed, crc error rate. If crc error rate interpret k as an ordinary integer (37), it's binary representation,is really shorthand for (1)2^5 + (0)2^4 + (0)2^3 + (1)2^2 + (0)2^1 + (1)2^0 Every integer can be expressed uniquely in this way, i.e., as a polynomial in the base 2 with coefficients that are either 0 or 1. This is a very powerful form of representation, but it's actually more powerful than we need for purposes of performing a data check. Also, operations on numbers like this can be somewhat laborious, because they involve borrows and carries in order to ensure that the coefficients are always either 0 or 1. (The same is true for decimal arithmetic, except that all the digits are required to be in the range 0 to 9.) To make things simpler, let's interpret our message M, key word k, and remainder r, not as actual integers, but as abstract polynomials in a dummy variable x (rather than a definite base like 2 for binary numbers or 10 for decimal numbers). Also, we'll simplify even further by agreeing to pay attention only to the parity of the coefficients, i.e., if a coefficient is an odd number we will simply regard it as 1, and if it is an even number we will regard it as 0. This is a tremendous simplification, because now we don't have to worry about borrows and carries when performing arithmetic. This is because every integer coefficient must obviously be either odd or even, so it's automatically either 0 or 1. To give just a brief illustration, consider the two polynomials x^2 + x + 1 and x^3 + x + 1. If we multiply these together by the ordinary rules of algebra we get (x^2 + x + 1)(x^3 + x + 1) = x^5 + x^4 + 2x^3 + 2x^2 + 2x + 1 but according to our simplification we are going to call every 'even' coefficient 0, so the result of the multiplication is simply x^5 + x^4 + 1. You might wonder if this simplified way of doing things is really self-consistent. For example, crc error rate, can we divide the product x^5 + x^4 + 1 by one of its factors, say, x^2 + x + 1, to give the other factor? The answer is yes, and it's much simpler than ordinary long division. To divide the polynomial by (which is the shorthand way of expressing our polynomials) we simply apply the bit-wise exclusive-OR operation repeatedly as follows ______ crc error rate

0 Comments

Leave a Comment