PTQ applies quantization to a model that has already been trained using floating-point precision. This approach offers simplicity but can sacrifice accuracy, particularly for smaller models or precision-sensitive tasks.
Key Characteristics:
- Process: Train model normally → Calibrate quantization parameters → Convert to quantized format.
- Calibration: Uses a representative dataset to determine scaling factors.
- Development Speed: Faster development cycle (no retraining required).
- Accuracy Impact: Typically, higher accuracy loss compared to QAT.
Advantages:
- Simpler workflow with existing trained models.
- No need to modify training procedures.
- Faster deployment path.
- Lower computational requirements for development.
Limitations:
- Can result in significant accuracy degradation.
- Less control over quantization effects.
- Particularly challenging for regression tasks like our sine function.
- Limited ability to compensate for quantization artifacts.