Every communication system must handle transmission errors caused by noise, interference, jitter, or hardware faults. Some protocols only detect errors, while others can detect and correct them automatically that’s why Error Detection and Correction is needed.
1. Why Do Errors Happen?
- Electrical noise
- Crosstalk
- Signal attenuation
- Clock drift
- Bit flipping (0 → 1 or 1 → 0)
- Packet loss or corruption
2. How Protocols Ensure Reliable Communication
Protocols use two mechanisms:
| Mechanism | Purpose |
|---|---|
| Error Detection | Detects that data is corrupted |
| Error Correction | Repairs data (with or without retransmission) |
3. Layers Responsible for Error Handling
+---------------------+
| Application Layer | End-to-end validation |
+---------------------+
| Transport Layer | TCP checksum, retransmit|
+---------------------+
| Data Link Layer | CRC, ACK/NAK |
+---------------------+
| Physical Layer | FEC, DFE |
+---------------------+
4. Common Error Detection Techniques
4.1 Parity Bit (Simple)
Adds 1 bit to track even/odd number of 1s.
Data: 1011 → 3 ones (odd)
Parity (even) → 1
Send: 10111
- Detects single-bit errors
- Cannot detect double-bit errors
- Cannot correct
- Used in UART, simple memory
4.2 Checksum (Sum of Bytes)
- Add all data bytes
- Append result
- Receiver recomputes and compares
Used in IP, TCP, UDP
Better than parity but still limited
4.3 CRC (Cyclic Redundancy Check)
Mathematical polynomial division on packet.
[Data][CRC Value]
- Receiver recalculates CRC
- If mismatch → error
Used in PCIe, Ethernet, USB, SATA
Very strong detection but cannot correct by itself
5. Error Correction Techniques (No Retransmission Needed)
5.1 Hamming Code (Single-Bit Correction)
Adds multiple parity bits.
| Bit Pos | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
|---|---|---|---|---|---|---|---|
| Content | P | P | D1 | P | D2 | D3 | D4 |
Corrects single-bit errors. Used in ECC memory.
5.2 Reed-Solomon
Corrects burst errors (multiple bits).
Used in CDs, DVDs, QR codes, DSL, satellite communication.
5.3 Convolutional + Viterbi Decoding
Used in wireless networks (3G/4G/5G).
5.4 FEC (Forward Error Correction)
Used in high-speed serial protocols (PCIe Gen3+, Ethernet 10G+).
- Corrects small bit flips
- Prevents retransmission
- Increases reliability
- Adds overhead and latency
6. ARQ: Automatic Repeat reQuest
Instead of correcting locally, request retransmission.
Types:
- Stop-and-Wait ARQ
- Go-Back-N ARQ
- Selective Repeat ARQ
Used in TCP, PCIe Data Link, USB, SATA
7. ACK / NAK Protocol
Sender → [Packet] → Receiver
Receiver checks CRC
If OK → ACK
If Error → NAK (retransmit)
PCIe uses Data Link Layer Packets (DLLPs) to carry ACK and NAK.
8. PCIe Error Handling (Layered Approach)
Transaction Layer: retries transaction logic
Data Link Layer: CRC, ACK/NAK, Replay Buffer
Physical Layer: FEC, DFE, Equalization
PCIe combines all three layers for high reliability.
9. Example of FEC Correction
Original: 10110110
Received: 10100110 (1 bit error)
Corrected: 10110110 (FEC restores)
10. Summary Table
| Technique | Detect | Correct | Used In |
|---|---|---|---|
| Parity | Yes | No | UART |
| Checksum | Yes | No | IP/TCP/UDP |
| CRC | Yes+++ | No | PCIe, Ethernet |
| Hamming | Yes++ | Yes | ECC RAM |
| Reed-Solomon | Yes+++ | Yes | Storage, Wireless |
| FEC | Yes | Yes | PCIe Gen3+, Ethernet |
| ACK/NAK | Yes | Yes (via retransmission) | PCIe DLLP |
11. Why Some Protocols Only Detect Errors
Simple protocols (UART, SPI, I2C):
- Lower complexity
- Lower overhead
- Less robust
High-speed protocols (PCIe, Ethernet, SATA):
- Strong detection (CRC)
- Correction (FEC)
- Replay/Retry (ACK/NAK)
- Error reporting
12. Error Reporting in PCIe and Similar Protocols
PCIe categorizes errors:
- Correctable (auto recovered)
- Non-fatal (affects some transactions, system continues)
- Fatal (requires reset)
Advanced Error Reporting (AER) feature logs and reports errors to software or OS.

