7

The Rust Guide states that:

The semicolon turns any expression into a statement by throwing away its value and returning unit instead.

I thought I got this concept down until I ran an experiment:

fn print_number(x: i32, y: i32) -> i32 {
    if x + y > 20 { return x }      
    x + y 
}

Which compiles fine. Then, I added a semicolon at the end of the return line (return x;). From what I understand, this turns the line into a statement, returning the unit data type ().

Nonetheless, the end result is the same.

Shepmaster
  • 274,917
  • 47
  • 731
  • 969
sargas
  • 4,422
  • 5
  • 43
  • 62

2 Answers2

5

Normally, every branch in the if expression should have the same type. If the type for some branch is underspecified, the compiler tries to find the single common type:

fn print_number(x: int, y: int) {
  let v = if x + y > 20 {
    3 // this can be either 3u, 3i, 3u8 etc.
  } else {
    x + y // this is always int
  };
  println!("{}", v);
}

In this code, 3 is underspecified but the else branch forces it to have the type of int.

This sounds simple: There is a function that "unifies" two or more types into the common type, or it will give you an error when that's not possible. But what if there were a fail! in the branch?

fn print_number(x: int, y: int) {
  let v = if x + y > 20 {
    fail!("x + y too large") // ???
  } else {
    x + y // this is always int
  };
  println!("{}", v); // uh wait, what's the type of `v`?
}

I'd want that fail! does not affect other branches, it is an exceptional case after all. Since this pattern is quite common in Rust, the concept of diverging type has been introduced. There is no value which type is diverging. (It is also called an "uninhabited type" or "void type" depending on the context. Not to be confused with the "unit type" which has a single value of ().) Since the diverging type is naturally a subset of any other types, the compiler conclude that v's type is just that of the else branch, int.

Return expression is no different from fail! for the purpose of type checking. It abruptly escapes from the current flow of execution just like fail! (but does not terminate the task, thankfully). Still, the diverging type does not propagate to the next statement:

fn print_number(x: int, y: int) {
  let v = if x + y > 20 {
    return; // this is diverging
    () // this is implied, even when you omit it
  } else {
    x + y // this is always int
  };
  println!("{}", v); // again, what's the type of `v`?
}

Note that the sole semicoloned statement x; is equivalent to the expression x; (). Normally a; b has the same type as b, so it would be quite strange that x; () has a type of () only when x is not diverging, and it diverges when x does diverge. That's why your original code didn't work.

It is tempting to add a special case like that:

  • Why don't you make x; () diverging when x diverges?
  • Why don't you assume uint for every underspecified integer literal when its type cannot be inferred? (Note: this was the case in the past.)
  • Why don't you automatically find the common supertrait when unifying multiple trait objects?

The truth is that, designing the type system is not very hard, but verifying it is much harder and we want to ensure that Rust's type system is future-proof and long standing. Some of them may happen if it really is useful and it is proved "correct" for our purpose, but not immediately.

Kang Seonghoon
  • 456
  • 2
  • 7
  • I feel like I understand more about how `return`, `fail!`, and *Diverging Functions* work. My code example compiles regardless of the presence of a semicolon at `return x`. So I guess that the bottom line is: adding it won't make a difference? – sargas Oct 20 '14 at 04:05
  • I edited my code example to clarify my question a little more. Still, either `return x` and `return x;` seem to done the same thing at the end and I don't know what difference it makes to use semicolon in this case. Does my edit change the context for your question? (Still trying to understand those low level concepts of Rust). – sargas Oct 20 '14 at 04:17
  • 1
    @sargas: Your edit completely changes the situation. Before, the function's tail was an `if` expression; now, the function's tail is the `x + y` expression. In both cases, the `return` expression can cause an early exit. However, in your edited sample, the type of the `if` expression becomes irrelevant (and we know it's `()` because it doesn't have an `else` clause, which is only allowed if the `if` clause evaluates to `()`). That said, personally, if I end an `if` with a `return`, I don't write an `else` and instead put that code right after the `if`, as you did in your edit. – Francis Gagné Oct 20 '14 at 04:34
  • @FrancisGagné Ok. But again, what difference does the *semicolon* make? None? Is `if`, `return`, or both responsible for returning type `()`? That is my question. – sargas Oct 20 '14 at 16:12
  • @sargas: It appears that a semicolon after a `return` expression or a diverging expression has no effect: it's as if the semicolon wasn't there. – Francis Gagné Oct 20 '14 at 23:46
  • @FrancisGagné Cool, now I just need an answer that includes what you said so I can accept it. – sargas Oct 21 '14 at 01:26
  • @sargas: I've edited both my answer and this answer (currently pending review). I've removed a significant part of this answer because I think the original author misread your question (as I did originally!) and basically contradicted (or perhaps, denied) the conclusion you came to in your question. – Francis Gagné Oct 21 '14 at 02:39
3

I'm not 100% sure of what I'm saying but it kinda makes sense.

There's an other concept coming into play: reachability analysis. The compiler knows that what follows a return expression statement is unreachable. For example, if we compile this function:

fn test() -> i32 {
    return 1;
    2
}

We get the following warning:

warning: unreachable expression
 --> src/main.rs:3:5
  |
3 |     2
  |     ^
  |

The compiler can ignore the "true" branch of the if expression if it ends with a return expression and only consider the "false" branch when determining the type of the if expression.

You can also see this behavior with diverging functions. Diverging functions are functions that don't return normally (e.g. they always fail). Try replacing the return expression with the fail! macro (which expands to a call to a diverging function). In fact, return expressions are also considered to be diverging; this is the basis of the aforementioned reachability analysis.

However, if there's an actual () expression after the return statement, you'll get an error. This function:

fn print_number(x: i32, y: i32) -> i32 {
    if x + y > 20 {
        return x;
        ()
    } else {
        x + y
    }
}

gives the following error:

error[E0308]: mismatched types
 --> src/main.rs:4:9
  |
4 |         ()
  |         ^^ expected i32, found ()
  |
  = note: expected type `i32`
             found type `()`

In the end, it seems diverging expressions (which includes return expressions) are handled differently by the compiler when they are followed by a semicolon: the statement is still diverging.

Shepmaster
  • 274,917
  • 47
  • 731
  • 969
Francis Gagné
  • 46,633
  • 3
  • 125
  • 120
  • Different how? what you the difference be without the semicolon? – sargas Oct 19 '14 at 03:07
  • Without a semicolon, `return` expressions and divergent expressions seem to be type-compatible with any expression, and they seem to retain that special characteristic even when they are followed by a semicolon. – Francis Gagné Oct 19 '14 at 04:51
  • This is exactly what they told me at the IRC channel. Basically, It makes no difference to the end result (at least for my example), but adding the semicolon can give peace of mind :) – sargas Oct 20 '14 at 03:46