I need to look these values up all the time so heres a quick reference page. These are all rough numbers, prepare to add about ~20% for various overhead.
The general formula is:
VRAM_GB = (NUM_PARAMS * PRECISION) / 1024 / 1024 / 1024
For precision:
- float16/float16: 2 bytes
- int8: 1 byte
- int4: 0.5 byte
1b
- fp16: 1.8gb
- 8bit: 0.9gb
- 4bit: 0.46gb
3b
- 16bit: 5.5gb
- 8bit: 2.8gb
- 4bit: 1.4gb
8b
- fp16: 15gb
- 8bit: 7.4gb
- 4bit: 3.7gb
32b
- fp16: 59gb
- 8bit: 29gb
- 4bit: 15gb
70b
- fp16: 130gb
- 8bit: 65gb
- 4bit: 33gb