Every communication system must handle transmission errors caused by noise, interference, jitter, or hardware faults. Some protocols only detect errors, while others can detect and correct them automatically that’s why Error Detection and Correction is needed.


1. Why Do Errors Happen?

  • Electrical noise
  • Crosstalk
  • Signal attenuation
  • Clock drift
  • Bit flipping (0 → 1 or 1 → 0)
  • Packet loss or corruption

2. How Protocols Ensure Reliable Communication

Protocols use two mechanisms:

MechanismPurpose
Error DetectionDetects that data is corrupted
Error CorrectionRepairs data (with or without retransmission)

3. Layers Responsible for Error Handling

+---------------------+
| Application Layer   | End-to-end validation   |
+---------------------+
| Transport Layer     | TCP checksum, retransmit|
+---------------------+
| Data Link Layer     | CRC, ACK/NAK            |
+---------------------+
| Physical Layer      | FEC, DFE                |
+---------------------+

4. Common Error Detection Techniques

4.1 Parity Bit (Simple)

Adds 1 bit to track even/odd number of 1s.

Data: 1011 → 3 ones (odd)
Parity (even) → 1
Send: 10111
  • Detects single-bit errors
  • Cannot detect double-bit errors
  • Cannot correct
  • Used in UART, simple memory

4.2 Checksum (Sum of Bytes)

  • Add all data bytes
  • Append result
  • Receiver recomputes and compares

Used in IP, TCP, UDP

Better than parity but still limited


4.3 CRC (Cyclic Redundancy Check)

Mathematical polynomial division on packet.

[Data][CRC Value]
  • Receiver recalculates CRC
  • If mismatch → error

Used in PCIe, Ethernet, USB, SATA

Very strong detection but cannot correct by itself


5. Error Correction Techniques (No Retransmission Needed)

5.1 Hamming Code (Single-Bit Correction)

Adds multiple parity bits.

Bit Pos1234567
ContentPPD1PD2D3D4

Corrects single-bit errors. Used in ECC memory.


5.2 Reed-Solomon

Corrects burst errors (multiple bits).
Used in CDs, DVDs, QR codes, DSL, satellite communication.


5.3 Convolutional + Viterbi Decoding

Used in wireless networks (3G/4G/5G).


5.4 FEC (Forward Error Correction)

Used in high-speed serial protocols (PCIe Gen3+, Ethernet 10G+).

  • Corrects small bit flips
  • Prevents retransmission
  • Increases reliability
  • Adds overhead and latency

6. ARQ: Automatic Repeat reQuest

Instead of correcting locally, request retransmission.

Types:

  • Stop-and-Wait ARQ
  • Go-Back-N ARQ
  • Selective Repeat ARQ

Used in TCP, PCIe Data Link, USB, SATA


7. ACK / NAK Protocol

Sender → [Packet] → Receiver
Receiver checks CRC
If OK  → ACK
If Error → NAK (retransmit)

PCIe uses Data Link Layer Packets (DLLPs) to carry ACK and NAK.


8. PCIe Error Handling (Layered Approach)

Transaction Layer: retries transaction logic
Data Link Layer: CRC, ACK/NAK, Replay Buffer
Physical Layer: FEC, DFE, Equalization

PCIe combines all three layers for high reliability.


9. Example of FEC Correction

Original:   10110110
Received:   10100110   (1 bit error)
Corrected:  10110110   (FEC restores)

10. Summary Table

TechniqueDetectCorrectUsed In
ParityYesNoUART
ChecksumYesNoIP/TCP/UDP
CRCYes+++NoPCIe, Ethernet
HammingYes++YesECC RAM
Reed-SolomonYes+++YesStorage, Wireless
FECYesYesPCIe Gen3+, Ethernet
ACK/NAKYesYes (via retransmission)PCIe DLLP

11. Why Some Protocols Only Detect Errors

Simple protocols (UART, SPI, I2C):

  • Lower complexity
  • Lower overhead
  • Less robust

High-speed protocols (PCIe, Ethernet, SATA):

  • Strong detection (CRC)
  • Correction (FEC)
  • Replay/Retry (ACK/NAK)
  • Error reporting

12. Error Reporting in PCIe and Similar Protocols

PCIe categorizes errors:

  • Correctable (auto recovered)
  • Non-fatal (affects some transactions, system continues)
  • Fatal (requires reset)

Advanced Error Reporting (AER) feature logs and reports errors to software or OS.


Scroll to Top