7

I'm reading this article "Virtual method table"

Example in the above article:

class B1 {
public:
  void f0() {}
  virtual void f1() {}
  int int_in_b1;
};

class B2 {
public:
  virtual void f2() {}
  int int_in_b2;
};

class D : public B1, public B2 {
public:
  void d() {}
  void f2() {}  // override B2::f2()
  int int_in_d;
};

B2 *b2 = new B2();
D  *d  = new D();

In the article, the author introduces that the memory layout of object d is like this:

          d:
D* d-->      +0: pointer to virtual method table of D (for B1)
             +4: value of int_in_b1
B2* b2-->    +8: pointer to virtual method table of D (for B2)
             +12: value of int_in_b2
             +16: value of int_in_d

Total size: 20 Bytes.

virtual method table of D (for B1):
  +0: B1::f1()  // B1::f1() is not overridden

virtual method table of D (for B2):
  +0: D::f2()   // B2::f2() is overridden by D::f2()

The question is about d->f2(). The call to d->f2() passes a B2 pointer as a this pointer so we have to do something like:

(*(*(d[+8]/*pointer to virtual method table of D (for B2)*/)[0]))(d+8) /* Call d->f2() */

Why should we pass a B2 pointer as the this pointer not the original D pointer??? We are actually calling D::f2(). Based on my understanding, we should pass a D pointer as this to D::f2() function.

___update____

If passing a B2 pointer as this to D::f2(), What if we want to access the members of B1 class in D::f2()?? I believe the B2 pointer (this) is shown like this:

          d:
D* d-->      +0: pointer to virtual method table of D (for B1)
             +4: value of int_in_b1
B2* b2-->    +8: pointer to virtual method table of D (for B2)
             +12: value of int_in_b2
             +16: value of int_in_d

It already has a certain offset of the beginning address of this contiguous memory layout. For example, we want to access b1 inside D::f2(), I guess in runtime, it will do something like: *(this+4) (this points to the same address as b2) which would points b2 in B????

Fihop
  • 2,977
  • 6
  • 36
  • 62

2 Answers2

4

We cannot pass the D pointer to a virtual function overriding B2::f2(), because all overrides of the same virtual function must accept identical memory layout.

Since B2::f2() function expects B2's memory layout of the object being passed to it as its this pointer, i.e.

b2:
  +0: pointer to virtual method table of B2
  +4: value of int_in_b2

the overriding function D::f2() must expect the same layout as well. Otherwise, the functions would no longer be interchangeable.

To see why interchangeability matters consider this scenario:

class B2 {
public:
  void test() { f2(); }
  virtual void f2() {}
  int int_in_b2;
};
...
B2 b2;
b2.test(); // Scenario 1
D d;
d.test(); // Scenario 2

B2::test() needs to make a call of f2() in both scenarios. It has no additional information to tell it how this pointer has to be adjusted when making these calls*. That is why the compiler passes the fixed-up pointer, so test()'s call of f2 would work both with D::f2() and B2::f2().

* Other implementations may very well pass this information; however, multiple inheritance implementation discussed in the article does not do it.

Sergey Kalinichenko
  • 675,664
  • 71
  • 998
  • 1,399
  • (1) what do you mean by "interchangeable", can you explain a little bit more detail? (2) If passing a `B2` pointer as `this` to D::f2(), what if we want to access the members of B1 class in D::f2()? For (2), please see the update of the question. – Fihop Jun 10 '15 at 21:18
  • Thanks so much!! Please verify me. `B2 b2; b2.test()`, it's a `B2` pointer passed as `this` to `B2::test()` definitely since `b2` is a standlone object. For `D d; d.test()`, the compiler passes the fixup pointer which actually points to the sub-object `B2` of `D` as the `this` to `test()` since the actually calling function is `B2::test()`. If the `this` does not point to `B2` inside `D`, it would cause problems when accessing the members of `B2` inside the function `B2::test()`. This is why I think we should pass a fixup pointer as `this`. This example cannot explain the update. Thanks still – Fihop Jun 10 '15 at 22:06
  • For `D d; d.test()`, I agree we do something like `d.test(B2* b2)` (means `this` points to sub-object `B2` of D). However, inside `B2:test`, `b2->f2()` should execute `D::f2`. Am I right? Now the problem becomes what kind of `this` is passed to `b2->f2()` – Fihop Jun 10 '15 at 22:17
  • 1
    Also consider the common situation of `void foo( const B2& b2) { b2.f2(); }`. Calling `foo()` with an object of type `D` needs to work, but there's no possibility for `foo()` to be able to pass a `this` pointer of type `D`. So the compiler must compile `D::f2()` assuming that the `this` pointer hidden parameter is to a `B2` type. However, it can assume that the hidden pointer parameter is pointing to the `B2` type that is embedded in a `D` type. So the function still has access to members of `D' (but must adjust the passed in `this` pointer appropriately and automatically). – Michael Burr Jun 10 '15 at 23:35
  • 1
    @FihopZz "inside `B2:test`, `b2->f2()` should execute `D::f2`" You are absolutely right! That's precisely the reason why `this` pointer passed to `test()` must be that of `B2` inside `D`, because `test()` has no way to adjust the pointer for the call of `D::f2()`. Therefore, it passes its own `this` regardless of how it was called, and it works, because both `B2::f2` and `D::f2` expect an identical layout of what is passed through `this` pointer. – Sergey Kalinichenko Jun 10 '15 at 23:50
1

Given your class hierarchy, an object of type B2 will have the following memory footprint.

+------------------------+
| pointer for B2 vtable  |
+------------------------+
| int_in_b2              |
+------------------------+

An object of type D will have the following memory footprint.

+------------------------+
| pointer for B1 vtable  |
+------------------------+
| int_in_b1              |
+------------------------+
| pointer for B2 vtable  |
+------------------------+
| int_in_b2              |
+------------------------+
| int_in_d               |
+------------------------+

When you use:

D* d  = new D();
d->f2();

That call is the same as:

B2* b  = new D();
b->f2();

f2() can be called using a pointer of type B2 or pointer of type D. Given that the runtime must be able to correctly work with a pointer of type B2, it has to be able to correctly dispatch the call to D::f2() using the appropriate function pointer in B2's vtable. However, when the call is dispatched to D:f2() the original pointer of type B2 must somehow be offset properly so that in D::f2(), this points to a D, not a B2.

Here's your example code, altered a little bit to print useful pointer values and member data to help understand the changes to the value of this in various functions.

#include <iostream>

struct B1 
{
   void f0() {}
   virtual void f1() {}
   int int_in_b1;
};

struct B2 
{
   B2() : int_in_b2(20) {}
   void test_f2()
   {
      std::cout << "In B::test_f2(), B*: " << (void*)this << std::endl;
      this->f2();
   }

   virtual void f2()
   {
      std::cout
         << "In B::f2(), B*: " << (void*)this
         << ", int_in_b2: " << int_in_b2 << std::endl;
   }

   int int_in_b2;
};

struct D : B1, B2 
{
   D() : int_in_d(30) {}
   void d() {}
   void f2()
   {
      // ======================================================
      // If "this" is not adjusted properly to point to the D
      // object, accessing int_in_d will lead to undefined 
      // behavior.
      // ======================================================

      std::cout
         << "In D::f2(), D*: " << (void*)this
         << ", int_in_d: " << int_in_d << std::endl;
   }
   int int_in_d;
};

int main()
{
   std::cout << "sizeof(void*) : " << sizeof(void*) << std::endl;
   std::cout << "sizeof(int)   : " << sizeof(int) << std::endl;
   std::cout << "sizeof(B1)    : " << sizeof(B1) << std::endl;
   std::cout << "sizeof(B2)    : " << sizeof(B2) << std::endl;
   std::cout << "sizeof(D)     : " << sizeof(D) << std::endl << std::endl;

   B2 *b2 = new B2();
   D  *d  = new D();
   b2->test_f2();
   d->test_f2();
   return 0;
}

Output of the program:

sizeof(void*) : 8
sizeof(int)   : 4
sizeof(B1)    : 16
sizeof(B2)    : 16
sizeof(D)     : 32

In B::test_f2(), B*: 0x1f50010
In B::f2(), B*: 0x1f50010, int_in_b2: 20
In B::test_f2(), B*: 0x1f50040
In D::f2(), D*: 0x1f50030, int_in_d: 30

When the actual object used to call test_f2() is D, the value of this changes from 0x1f50040 in test_f2() to 0x1f50030 in D::f2(). That matches with sizeof B1, B2, and D. The offset of B2 sub-object of a D object is 16 (0x10). The value of this in B::test_f2(), a B*, is changed by 0x10 before the call is dispatched to D::f2().

I am going to guess that the value of the offset from D to B2 is stored in B2's vtable. Otherwise, there is no way a generic function dispatch mechanism can change the value of this properly before dispatching the call to the right virtual function.

R Sahu
  • 196,807
  • 13
  • 136
  • 247