Does adding virtual to a C# method may break legacy clients?

Question

The question is very straightforward,

if I have a following class:

public class ExportReservationsToFtpRequestOld 
{
    public int A { get; set; }
    public long B { get; set; }
}

and change it to:

public class ExportReservationsToFtpRequestOld 
{
    public virtual int A { get; set; }
    public virtual long B { get; set; }
}

can it break a legay client dll?

There are ways it can break, but the only way to know for sure would be to try it. — Ron Beyer, Apr 18 '18 at 17:26
I'm pretty sure the answer here is no. Adding the option to override a method has no impact on dependencies. Of course, removing the option to override (delete virtual) is a breaking change. This is a change I make regularly for testability reasons. I've never had a break doing it. I don't see this listed either way here: https://stackoverflow.com/questions/1456785/a-definitive-guide-to-api-breaking-changes-in-net — P.Brian.Mackey, Apr 18 '18 at 17:29
Are you looking for source compatibility, binary compatibility, or both? Do you mind if adding making the properties virtual introduces a warning at build time, but not an error? (Questions of compatibility are rarely very straightforward, I'm afraid.) Admittedly I'll need to work to find examples of how it breaks things, but I strongly suspect there *are* such examples, if you want both source and binary compatibility for everything other than reflection. — Jon Skeet, Apr 18 '18 at 17:31
You didn't declare the class *sealed*. So yes, if the client code derived its own class from yours then it is borken. — Hans Passant, Apr 18 '18 at 17:34
@HansPassant: Can you give an example of how making the properties virtual breaks a derived class? That's what I'm trying to come up with at the moment. — Jon Skeet, Apr 18 '18 at 17:34
Also note that what you've got here are *properties* - whereas your question asks about *methods*. The answers may just be different for subtle reasons. Do you care about just properties, or properties and methods? — Jon Skeet, Apr 18 '18 at 17:35
You *can't* have virtual members in a sealed class. https://docs.microsoft.com/en-us/dotnet/csharp/misc/cs0549 — P.Brian.Mackey, Apr 18 '18 at 17:39
@RonBeyer: What are the ways it can break? Don't be coy; enumerate them. — Eric Lippert, Apr 18 '18 at 19:00
@RicardoAlves: Can you describe why you wish to make this change? Even if there is no immediate breaking change caused by moving from instance to virtual, there might be other issues or better techniques. — Eric Lippert, Apr 18 '18 at 19:01
@EricLippert The "way" I was thinking was that the OP means to make them virtual to derive them in a sub-class and then pass that sub class back to the legacy code, I believe (you are the expert and I haven't tried this) that it would call the base class implementation and not the derived one, or am I off base on that? — Ron Beyer, Apr 18 '18 at 19:08

Eric Lippert · Answer 1 · 2018-04-19T15:07:20.673

The answer from Dai is good but a little hard to read and it buries the lede. Let's not bury the lede.

can making a non-virtual instance method virtual break a legacy client dll?

Yes. The breakage is subtle and unlikely, but it is possible. Legacy clients should be recompiled when you make a member of a dependency virtual.

More generally: if you are changing anything about a base class's public or protected surface area, recompile all the assemblies that make derived classes.

Let's see how this specific scenario can break a legacy client. Suppose we have a dependency assembly with:

public class B {
  public void M() { }
}

and then we use this in a client assembly:

class C {
  static void Q() {
    B b = new B();
    b.M();
  }
}

What IL is generated?

    newobj instance void B::.ctor()
    callvirt instance void B::M()
    ret

Perfectly reasonable code. C# generates a callvirt to a non-virtual call because that means we do not have to emit a check to ensure that the receiver is non-null. This keeps the code small.

If we update B.M to be virtual, the call site does not need to change; it is already doing a virtual call. So everything is fine, right?

Now, suppose before the new version of the dependency ships, some super genius comes along and says oh, we can refactor this code into the obviously better code:

  static void Q() {
    new B().M();
  }

Surely that refactoring changes nothing, right?

WRONG. The code generated is now:

  newobj instance void B::.ctor()
  call instance void B::M()
  ret

C# reasons "I'm doing a call to a non-virtual method and I know the receiver is a new expression which never produces null, so I'll save that nanosecond and skip the null check".

Why not do that in the first case? Because C# does not do the control flow analysis in the first version and deduce that on every control flow, the receiver is known to be non-null. It just does a cheap check to see if the receiver is one of a handful of expressions known to be impossible to be null.

If you now change the dependency B.M to a virtual method and do not recompile the assembly with the call site, the code in the call site is now not verifiable because it violates the CLR security rules. A non-virtual call to a virtual method is only legal when the call is directly in a member of a derived type.

See my comment on another answer for a scenario that motivates this security design decision.

Aside: the rule applies even to nested types! That is, if we have class D : B { class N { } } then code inside N is not allowed to do a non-virtual call to a virtual member of B, though code inside D is!

So already we have a problem; we're turning verifiable code in another assembly we don't own into nonverifiable code.

But wait, it gets worse.

Suppose we have a slightly different scenario. I suspect this is the scenario actually motivating your change.

// Original code
public class B {
  public void M() {}
}
public class D : B { }

And a client

class C {
  static void Q() {
    new D().M();
  }
}

What code is generated now? The answer might surprise you. It is the same as before. C# does NOT generate

  call instance void D::M()

Rather it generates

  call instance void B::M()

Because after all, that's the method being called.

And now we change the dependency to

// New code
class B {
  public virtual void M() {}
}
class D : B { 
  public override void M() {}
}

The authors of the new code reasonably believe that all calls to new D().M() should be dispatched to D.M, but as we see, the not-recompiled client will still be doing an unverifiable non-virtual dispatch to B.M! So this is a not-breaking change in the sense that the client still gets the behaviour they used to get (assuming they ignore the verification failure) but that behaviour is no longer correct, and will change upon recompilation.

The basic problem here is that a non-virtual call can show up where you don't expect it, and then if you change the requirement to make the call virtual, that doesn't happen until recompilation.

Let's look at another version of the scenario we just did. In the dependency we have as before

public class B { public void M() {} }
public class D : B {}

and in our client we now have:

interface I { void M(); }
class C : D, I {}
...
I i = new C();
i.M();

All is good; C inherits from D, which gives it a public member B.M which implements I.M and we're all set.

Except that there's a problem. The CLR requires that a method B.M that implements I.M be virtual, and B.M is not. Rather than rejecting this program, C# pretends that you wrote:

class C : D, I 
{
  void I.M()
  {
    base.M();
  }
}

Where base.M() is compiled as non-virtual call to B.M(). After all, we know that this is non-null and B.M() is not virtual, so we can do a call instead of a callvirt.

But now what happens when we recompile the dependency without recompiling the client:

class B {
  public virtual void M() {}
}
class D : B { 
  public override void M() {}
}

Now calling i.M() will do a verifiable non-virtual call to B.M but the author of D.M expected that D.M would be called in this scenario, and on re-compilation of the client, it will be.

Finally, there are potentially more scenarios involving explicit base. calls where changing a dependency "in the middle" of a class hierarchy can produce bizarre unexpected results. See https://blogs.msdn.microsoft.com/ericlippert/2010/03/29/putting-a-base-in-the-middle/ for details of that scenario. This is not your scenario, but it further illustrates the dangers of non-virtual calls to virtual methods.

Thanks for the explanation! In my case, I really don't think it will break anything, as I want to turn obsolute old fields and make sure that our server still processes old data based on overriden properties. So, legacy code can still use whatever they were using, but my server has to make sure that, when reading old properties, they translate it to other property. — Ricardo Alves, Apr 18 '18 at 21:14
*"[...] will still be doing an unverifiable non-virtual dispatch to B.M!"* There is a good joke in there somewhere, like turning your call into excrement... Good explanation as always! — Ron Beyer, Apr 18 '18 at 21:29
@EricLippert : "if you are changing **anything** about a base class, recompile all the assemblies that make derived classes." Is this required even if the changes are only about private members of the base class? — Luca Cremonesi, Apr 19 '18 at 14:57
@LucaCremonesi: You are correct; I was too broad there. I've updated the text to be more clear. Thanks! — Eric Lippert, Apr 19 '18 at 15:07

Dai · Accepted Answer · 2018-04-18T18:35:24.107

C# is compiled to CIL (formerly known as MSIL).
Property accesses and assignments are compiled to method calls:
- value = foo.Bar becomes value = foo.get_Bar().
- foo.Bar = value becomes foo.set_Bar( value ).
Method calls are compiled to call or callvirt opcodes.
The call and callvirt opcodes' first operand is a "symbol"/identifier by-name, so changing your class's members from non-virtual to virtual will not break JIT compilation.
call and callvirt can both be used to call virtual and non-virtual methods with differing behaviour, and the opcodes are chosen by the compiler for various reasons, and importantly the compiler may use callvirt to call a non-virtual method ( http://www.levibotelho.com/development/call-and-callvirt-in-cil/ )
- .call NonVirtualMethod
  - The NonVirtualMethod is directly invoked.
- .callvirt NonVirtualMethod
  - The NonVirtualMethod is directly invoked (with a null-check).
- .call VirtualMethod
  - The VirtualMethod is directly invoked, even if the current object overrides it.
- .callvirt VirtualMethod
  - The current object's override of VirtualMethod is invoked.

So while a compiled application will still start and JIT after swapping your old binary assembly for a new one with virtual members there are still some scenarios to consider regarding the behavior of the consumer of your assembly depending on what opcode (call or callvirt) the consumer's compiler used:

A consumer binary-assembly has .call ExportReservationsToFtpRequestOld::get_A

Provided you don't pass any subclass of ExportReservationsToFtpRequestOld with overridden members to the consumer, then the correct property will be invoked. If you do pass a subclass with overridden virtual members then the overridding version will not be invoked:

It is valid to call a virtual method using call (rather than callvirt); this indicates that the method is to be resolved using the class specified by method rather than as specified dynamically from the object being invoked.

(I'm surprised that C# doesn't let consumer types do this explicitly, only classes in an inheritance tree can use the base keyword).
A consumer binary-assembly has .callvirt ExportReservationsToFtpRequestOld::get_A

If the consumer is working with a subclass then the subclass's override of get_A will be invoked, and not necessarily ExportReservationsToFtpRequestOld's version.
A consumer already subclasses ExportReservationsToFtpRequestOld and adds shadows (new) versions of get_A and get_B, then calls those properties:
```
class Derived : ExportReservationsToFtpRequestOld {

    public new int A { get; set; }
    public new long B { get; set; }
}
```
or even:
```
class Derived : ExportReservationsToFtpRequestOld {

    public new virtual int A { get; set; }
    public new virtual long B { get; set; }
}

// with:

class Derived2 : Derived {

    public override int A { get; set; }
    public override long B { get; set; }
}
```
As Derived's members have different internal identifiers then ExportReservationsToFtpRequestOld's get_A and get_B will not be invoked. Even if the consumer's compiler used .callvirt instead of .call, the virtual-method lookup will start with their subclass, not ExportReservationsToFtpRequestOld. Things get complicated with Derived2 however and what happens depends on how it's consumed, see here: what is "public new virtual void Method()" mean?

TL;DR:

If you're sure no-one is deriving from ExportReservationsToFtpRequestOld with shadow+virtual members then go ahead and change it to virtual - you won't break anything.

Regarding your parenthetical "I'm surprised that C# doesn't let consumer types do this explicitly" -- there are security and correctness concerns. Consider: `abstract class B { private B() {} public virtual void M() { Danger(); } private class D : B { public override void M() { SecurityCheck(); base.M(); } public static B GetD() { return new D(); }}` We would reasonable assume that low-trust code which gets its hands on a `D` should be unable to bypass the security check in `D.M` by calling `B.M` directly on an instance of `D`. — Eric Lippert, Apr 18 '18 at 18:57
Thanks for the explanation! In my case, I really don't think it will break anything, as I want to turn obsolute old fields and make sure that our server still processes old data based on overriden properties. So, legacy code can still use whatever they were using, but my server has to make sure that, when reading old properties, they translate it to other property. — Ricardo Alves, Apr 18 '18 at 21:14

Does adding virtual to a C# method may break legacy clients?

2 Answers2

TL;DR: