C++虚函数原理与内存布局

虚函数是如何实现动态绑定的呢? 这背后是编译器为我们做的一些工作, 主要依赖两个概念: 虚函数表 (Virtual Function Table, vtable) 和 虚函数指针 (Virtual Function Pointer, vptr).

class Shape {
public:
    // 在基类中声明为虚函数
    virtual void draw() const {
        std::cout << "Drawing a generic shape." << std::endl;
    }
    
    // 基类的析构函数通常也应该是虚函数(为了能够使用基类指针正确析构派生类对象)
    virtual ~Shape() {} 
};

class Circle : public Shape {
public:
    // 在派生类中重写(override)该函数
    // 'override' 关键字不是必须的, 但是一个好习惯, 它可以让编译器检查你是否真的重写了基类的虚函数
    void draw() const override {
        std::cout << "Drawing a circle." << std::endl;
    }
};

class Rectangle : public Shape {
public:
    void draw() const override {
        std::cout << "Drawing a rectangle." << std::endl;
    }
};

void drawShape(const Shape& shape) {
    shape.draw(); // 现在这里会发生动态绑定
}

1. 虚函数表 (vtable)

当一个类声明了虚函数（或者继承了有虚函数的基类）, 编译器会为这个类创建一个静态的数组, 这个数组就是 虚函数表 (vtable).
vtable 中存放的是该类所有虚函数的地址.
如果派生类重写了基类的虚函数, 那么在派生类的vtable中, 相应的位置会被替换为派生类重写的那个函数的地址.

图示:

Shape 的 vtable: [ address of Shape::draw, address of Shape::~Shape ]
Circle 的 vtable: [ address of Circle::draw, address of Circle::~Circle ] (draw函数地址被重写了)
Rectangle 的 vtable: [ address of Rectangle::draw, address of Rectangle::~Rectangle ]

2. 虚函数指针 (vptr)

当一个类拥有vtable时, 编译器会为这个类的 每一个对象 自动添加一个隐藏的成员, 这个成员就是一个指针, 叫做 虚函数指针 (vptr).
这个 vptr 在对象创建时（调用构造函数时）被初始化, 指向其所属类的 vtable.

图示:

// 当你创建一个 Circle 对象时
Circle c;

// c 对象的内存布局大致如下: 
[   vptr   ] --------> [ Circle's vtable   ]
[ 成员变量... ]         [ &Circle::draw     ]
                       [ &Circle::~Circle  ]
                       [ ...               ]

3. 调用过程

当执行 shape_ptr->draw(); 这样的代码时, 实际发生的步骤如下:

访问 vptr: 程序通过 shape_ptr 找到它所指向的对象（可能是 Circle 或 Rectangle 对象）.
找到 vtable: 程序读取该对象内部的 vptr, 找到对应的 vtable（如果是 Circle 对象, 就找到 Circle 的 vtable）.
调用函数: 程序在 vtable 中查找 draw 函数对应的地址（比如它在表中的第0个位置）, 然后调用该地址上的函数.

由于 Circle 对象的 vptr 指向 Circle 的 vtable, 而 Circle 的 vtable 中存放的是 Circle::draw 的地址, 所以最终调用的就是 Circle::draw().

整个过程在运行时完成, 所以实现了动态绑定.

4. 内存布局

当类中出现虚函数时, 编译器会进行一些额外的处理来支持多态.

为每个拥有虚函数的类创建一个 虚函数表 (vtable). 这个表是一个静态数组, 存储着虚函数的地址.
在每个对象的内存布局的最前端, 插入一个 虚函数指针 (vptr), 这个指针指向该对象所属类的 vtable.

class Base_V {
public:
    int b_data;
    virtual void func() {}
    virtual void funcWithoutOverride() {}
    virtual ~Base_V() {}
};

class Derived_V : public Base_V {
public:
    char d_data;
    void func() override {}
    ~Derived_V() override {}
};

虚函数表 (vtables)

vtable 是独立于对象存在的, 每个类只有一份.

Base_V 的 vtable (vtable_Base_V)
+------------------------------+
| &Base_V::func                |  (函数指针)
+------------------------------+
| &Base_V::funcWithoutOverride |  (函数指针)
+------------------------------+
| &Base_V::~Base_V             |  (函数指针)
+------------------------------+

Derived_V 的 vtable (vtable_Derived_V)
+------------------------------+
| &Derived_V::func             |  (重写了, 地址指向派生类的函数)
+------------------------------+
| &Base_V::funcWithoutOverride |  (未被重写, 地址仍指向基类的函数)
+------------------------------+
| &Derived_V::~Derived_V       |  (重写了, 地址指向派生类的析构函数)
+------------------------------+

基类对象 Base_V b_v;

对象内存的开始处是一个 vptr, 指向 Base_V 的 vtable, 后面跟着成员变量.

对象 b_v 的内存布局
(大小: 8 (vptr) + 4 (int) = 12 字节, 对齐后可能是 16 字节)

+---------------------+
|       vptr          | -------> [ vtable_Base_V ]
+---------------------+
|      b_data         | (int, 4 字节)
+---------------------+
|      (padding)      | (内存对齐填充, 可能 4 字节)
+---------------------+

派生类对象 Derived_V d_v;

同样, 派生类对象也包含一个基类子对象. vptr 也在最前面, 但最关键的是: 这个 vptr 指向的是派生类 Derived_V 的 vtable. 这正是多态得以实现的核心.

对象 d_v 的内存布局
(大小: 8 (vptr) + 4 (int) + 1 (char) = 13 字节, 对齐后可能是 16 或 24 字节)

+----------------------+
|   |     vptr      |  | -------> [ vtable_Derived_V ]
|   +---------------+  |
|   --- 基类部分 ---    |
|   +---------------+  |
|   |    b_data     |  | (int, 4 字节)
|   +---------------+  |
|   --- 派生类部分 ---   |
|   +---------------+  |
|   |    d_data     |  | (char, 1 字节)
|   +---------------+  |
|   |   (padding)   |  | (内存对齐填充)
+----------------------+

当执行 Base_V* ptr = new Derived_V(); ptr->func(); 时:

ptr 指向 d_v 对象的起始地址.
通过 ptr 找到对象头部的 vptr.
vptr 指向 vtable_Derived_V.
在 vtable_Derived_V 中查找 func() 对应的函数指针, 这个指针指向 Derived_V::func().
调用 Derived_V::func().

通过打印地址来认识虚函数表

#include <iostream>

class base
{
public:
    void print_nv()
    {
        std::cout << "base::print_nv" << std::endl;
    }

    virtual void print_v()
    {
        std::cout << "base::print_v" << std::endl;
    }

    virtual ~base() = default;

private:
    int a = 0;
};

class derived : public base {...};

int main()
{
    base* pArr[2]{new base(), new derived};

    // 下面两行注释输出1, 因为重载决议中匹配成员函数指针的 operator<< 函数只有将指针转换为 bool 后输出
    // std::cout << "address of base::print_v: " << reinterpret_cast<void*>(&base::print_v) << std::endl; // print: 1
    // std::cout << "address of derived::print_v: " << &derived::print_v << std::endl; // print: 1
    
    std::cout << "wtf is void(base::*)()" << std::endl; // 成员函数指针
    std::cout << "sizeof(void(base::*)()): " << sizeof(void(base::*)()) << std::endl; // 16
    std::cout << std::endl;

    std::cout << pArr[0] << "\t\t: address of *pArr[0] and also vptr" << std::endl;
    std::cout << *(reinterpret_cast<void****>(pArr[0]))
        << "\t\t: value of vptr, also address of vtable and function base::print_v" << std::endl;
    std::cout << **(reinterpret_cast<void****>(pArr[0]))
        << "\t\t: vtable[0], value of base::print_v" << std::endl;
    std::cout << ***(reinterpret_cast<void****>(pArr[0]))
        << "\t: 8 bytes of machine code in function base::print_v" << std::endl;

    delete pArr[0];
    delete pArr[1];
    return 0;
}

输出:

wtf is void(base::*)()
sizeof(void(base::*)()): 16

0x25540531a80           : address of *pArr[0] and also vptr
0x7ff6242c46d0          : value of vptr, also address of vtable and function base::print_v
0x7ff6242c2910          : vtable[0], value of base::print_v
0x20ec8348e5894855      : 8 bytes of machine code in function base::print_v