The answer from Dai is good but a little hard to read and it buries the lede. Let's not bury the lede.
can making a non-virtual instance method virtual break a legacy client dll?
Yes. The breakage is subtle and unlikely, but it is possible. Legacy clients should be recompiled when you make a member of a dependency virtual.
More generally: if you are changing anything about a base class's public or protected surface area, recompile all the assemblies that make derived classes.
Let's see how this specific scenario can break a legacy client. Suppose we have a dependency assembly with:
public class B {
public void M() { }
}
and then we use this in a client assembly:
class C {
static void Q() {
B b = new B();
b.M();
}
}
What IL is generated?
newobj instance void B::.ctor()
callvirt instance void B::M()
ret
Perfectly reasonable code. C# generates a callvirt
to a non-virtual call because that means we do not have to emit a check to ensure that the receiver is non-null. This keeps the code small.
If we update B.M
to be virtual, the call site does not need to change; it is already doing a virtual call. So everything is fine, right?
Now, suppose before the new version of the dependency ships, some super genius comes along and says oh, we can refactor this code into the obviously better code:
static void Q() {
new B().M();
}
Surely that refactoring changes nothing, right?
WRONG. The code generated is now:
newobj instance void B::.ctor()
call instance void B::M()
ret
C# reasons "I'm doing a call to a non-virtual method and I know the receiver is a new
expression which never produces null, so I'll save that nanosecond and skip the null check".
Why not do that in the first case? Because C# does not do the control flow analysis in the first version and deduce that on every control flow, the receiver is known to be non-null. It just does a cheap check to see if the receiver is one of a handful of expressions known to be impossible to be null.
If you now change the dependency B.M
to a virtual method and do not recompile the assembly with the call site, the code in the call site is now not verifiable because it violates the CLR security rules. A non-virtual call to a virtual method is only legal when the call is directly in a member of a derived type.
See my comment on another answer for a scenario that motivates this security design decision.
Aside: the rule applies even to nested types! That is, if we have class D : B { class N { } }
then code inside N
is not allowed to do a non-virtual call to a virtual member of B
, though code inside D
is!
So already we have a problem; we're turning verifiable code in another assembly we don't own into nonverifiable code.
But wait, it gets worse.
Suppose we have a slightly different scenario. I suspect this is the scenario actually motivating your change.
// Original code
public class B {
public void M() {}
}
public class D : B { }
And a client
class C {
static void Q() {
new D().M();
}
}
What code is generated now? The answer might surprise you. It is the same as before. C# does NOT generate
call instance void D::M()
Rather it generates
call instance void B::M()
Because after all, that's the method being called.
And now we change the dependency to
// New code
class B {
public virtual void M() {}
}
class D : B {
public override void M() {}
}
The authors of the new code reasonably believe that all calls to new D().M()
should be dispatched to D.M
, but as we see, the not-recompiled client will still be doing an unverifiable non-virtual dispatch to B.M
! So this is a not-breaking change in the sense that the client still gets the behaviour they used to get (assuming they ignore the verification failure) but that behaviour is no longer correct, and will change upon recompilation.
The basic problem here is that a non-virtual call can show up where you don't expect it, and then if you change the requirement to make the call virtual, that doesn't happen until recompilation.
Let's look at another version of the scenario we just did. In the dependency we have as before
public class B { public void M() {} }
public class D : B {}
and in our client we now have:
interface I { void M(); }
class C : D, I {}
...
I i = new C();
i.M();
All is good; C
inherits from D
, which gives it a public member B.M
which implements I.M
and we're all set.
Except that there's a problem. The CLR requires that a method B.M
that implements I.M
be virtual, and B.M
is not. Rather than rejecting this program, C# pretends that you wrote:
class C : D, I
{
void I.M()
{
base.M();
}
}
Where base.M()
is compiled as non-virtual call to B.M()
. After all, we know that this
is non-null and B.M()
is not virtual, so we can do a call
instead of a callvirt
.
But now what happens when we recompile the dependency without recompiling the client:
class B {
public virtual void M() {}
}
class D : B {
public override void M() {}
}
Now calling i.M()
will do a verifiable non-virtual call to B.M
but the author of D.M
expected that D.M
would be called in this scenario, and on re-compilation of the client, it will be.
Finally, there are potentially more scenarios involving explicit base.
calls where changing a dependency "in the middle" of a class hierarchy can produce bizarre unexpected results. See https://blogs.msdn.microsoft.com/ericlippert/2010/03/29/putting-a-base-in-the-middle/ for details of that scenario. This is not your scenario, but it further illustrates the dangers of non-virtual calls to virtual methods.