19

An official tutorial on @tf.function says:

To get peak performance and to make your model deployable anywhere, use tf.function to make graphs out of your programs. Thanks to AutoGraph, a surprising amount of Python code just works with tf.function, but there are still pitfalls to be wary of.

The main takeaways and recommendations are:

  • Don't rely on Python side effects like object mutation or list appends.
  • tf.function works best with TensorFlow ops, rather than NumPy ops or Python primitives.
  • When in doubt, use the for x in y idiom.

It only mentions how to implement @tf.function annotated functions but not when to use it.

Is there a heuristic on how to decide whether I should at least try to annotate a function with tf.function? It seems that there are no reasons not to do it, unless I am to lazy to remove side effects or change some things like range()-> tf.range(). But if I am willing to do this...

Is there any reason not to use @tf.function for all functions?

problemofficer
  • 1,502
  • 2
  • 16
  • 29
  • 1
    Why add these tags? We could as well add `tensorflow0.1`, `tensorflow0.2`, `tensorflow0.3`, `tensorflow0.4`, `tensorflow0.5` and so on, as well as a tag for each of [these `tf` modules and classes](https://www.tensorflow.org/api_docs/python/tf) then. Also, why not add a tag for each of Python's standard modules and its functions and classes? – ForceBru Jan 21 '20 at 18:26
  • That is why I introduced the tensorflow2.x tag, because there are questions that are not related only to tensorflow2.0 but to tensorflow2.x tag. However, it would be unsuitable and unfeasible to add a tag for each and every version of a library. Take the example of Python. You don't have python3.4.6.....python.3.8.2, but python3.x – Timbus Calin Jan 25 '20 at 16:17
  • On one hand, the [`tf.function` guide](https://www.tensorflow.org/guide/function) it says "Decorate module-level functions, and methods of module-level classes, and avoid decorating local functions or methods". I seem to remember more explicit wording, like "do not decorate every function, use `tf.function` in higher-level functions, like a training loop", but I may misremember (or maybe it has been removed). OTOH, [this discussion](https://github.com/tensorflow/addons/issues/13) has interesting input from devs, in the end it seems to be okay to use it in about any function for tensors/vars. – jdehesa Apr 24 '20 at 10:47
  • @jdehesa AFAIK `@tf.function` annotated functions also compile the functions that they call themselves to graphs. So you would only need to annotate the entry point to the module which is coherent with what you describe. But it also would not hurt to manually annotate functions lower in the call stack. – problemofficer Apr 24 '20 at 11:08
  • @problemofficer Yes, so in the GitHub issue I linked there is some discussion about whether creating multiple intermediate functions could have a slight performance impact, but it seems that the graph optimizer (grappler) can "inline" functions if needed, but on the other hand if another non-`tf.function` is called multiple times it cannot prevent the "code duplication" in the graph, which is why widespread usage appears to be recommendable. – jdehesa Apr 24 '20 at 13:59
  • AFAIK I can suggest two best resources that describes with more details on when to use tf.function https://www.tensorflow.org/tutorials/customization/performance, https://www.tensorflow.org/guide/function. It is not helpful to answer all question you have mentioned but I want to post those two links for any new user to get started with @tf.function. Hope that helps. – Vishnuvardhan Janapati Apr 25 '20 at 22:47

3 Answers3

22

TLDR: It depends on your function and whether you are in production or development. Don't use tf.function if you want to be able to debug your function easily, or if it falls under the limitations of AutoGraph or tf.v1 code compatibility. I would highly recommend watching the Inside TensorFlow talks about AutoGraph and Functions, not Sessions.

In the following I'll break down the reasons, which are all taken from information made available online by Google.

In general, the tf.function decorator causes a function to be compiled as a callable that executes a TensorFlow graph. This entails:

  • Conversion of the code through AutoGraph if required (including any functions called from an annotated function)
  • Tracing and executing the generated graph code

There is detailed information available on the design ideas behind this.

Benefits of decorating a function with tf.function

General benefits

  • Faster execution, especially if the function consists of many small ops (Source)

For functions with Python code / Using AutoGraph via tf.function decoration

If you want to use AutoGraph, using tf.function is highly recommended over calling AutoGraph directly. Reasons for this include: Automatic control dependencies, it is required for some APIs, more caching, and exception helpers (Source).

Drawbacks of decorating a function with tf.function

General drawbacks

  • If the function only consists of few expensive ops, there will not be much speedup (Source)

For functions with Python code / Using AutoGraph via tf.function decoration

  • No exception catching (should be done in eager mode; outside of the decorated function) (Source)
  • Debugging is much harder
  • Limitations due to hidden side effects and TF control flow

Detailed information on AutoGraph limitations is available.

For functions with tf.v1 code

  • It is not allowed to create variables more than once in tf.function, but this is subject to change as tf.v1 code is phased out (Source)

For functions with tf.v2 code

  • No specific drawbacks

Examples of limitations

Creating variables more than once

It is not allowed to create variables more than once, such as v in the following example:

@tf.function
def f(x):
    v = tf.Variable(1)
    return tf.add(x, v)

f(tf.constant(2))

# => ValueError: tf.function-decorated function tried to create variables on non-first call.

In the following code, this is mitigated by making sure that self.v is only created once:

class C(object):
    def __init__(self):
        self.v = None
    @tf.function
    def f(self, x):
        if self.v is None:
            self.v = tf.Variable(1)
        return tf.add(x, self.v)

c = C()
print(c.f(tf.constant(2)))

# => tf.Tensor(3, shape=(), dtype=int32)

Hidden side effects not captured by AutoGraph

Changes such as to self.a in this example can't be hidden, which leads to an error since cross-function analysis is not done (yet) (Source):

class C(object):
    def change_state(self):
        self.a += 1

    @tf.function
    def f(self):
        self.a = tf.constant(0)
        if tf.constant(True):
            self.change_state() # Mutation of self.a is hidden
        tf.print(self.a)

x = C()
x.f()

# => InaccessibleTensorError: The tensor 'Tensor("add:0", shape=(), dtype=int32)' cannot be accessed here: it is defined in another function or code block. Use return values, explicit Python locals or TensorFlow collections to access it. Defined in: FuncGraph(name=cond_true_5, id=5477800528); accessed from: FuncGraph(name=f, id=5476093776).

Changes in plain sight are no problem:

class C(object):
    @tf.function
    def f(self):
        self.a = tf.constant(0)
        if tf.constant(True):
            self.a += 1 # Mutation of self.a is in plain sight
        tf.print(self.a)

x = C()
x.f()

# => 1

Example of limitation due to TF control flow

This if statement leads to an error because the value for else needs to be defined for TF control flow:

@tf.function
def f(a, b):
    if tf.greater(a, b):
        return tf.constant(1)

# If a <= b would return None
x = f(tf.constant(3), tf.constant(2))   

# => ValueError: A value must also be returned from the else branch. If a value is returned from one branch of a conditional a value must be returned from all branches.
prouast
  • 946
  • 1
  • 8
  • 14
  • 2
    This is a good summary. It's also worth noting that when called from eager mode, tf.function has an overhead of about 200 us (give or take) after the first call. Calling a tf.function from another tf.function is fine though. So you want to wrap as much computation as possible. If it wasn't for the limitations, you should wrap the whole program. – Dan Moldovan Apr 30 '20 at 12:39
  • This answer is tl;dr IMHO and it doesn't really answer my question but just gives the same fragmented info I found myself. Also, saying that I should not use `@tf.function` for production but only for development is not a feasible solution. Firstly, in machine learning (at least in research) the training during development stage also creates the final product (the trained model). Secondly, decorators are a significant change. I can't just put them in "after development" and be sure that the code behaves the same. This means that I have to develop and test with them already there. – problemofficer Apr 30 '20 at 15:59
  • @problemofficer Sorry for the confusion. When talking about production in my answer I was considering training (on large dataset) to be part of that. In my own research, I develop/debug my functions with a toy datasset in eager mode, and then add `tf.function` if appropriate. – prouast Apr 30 '20 at 21:24
3

tf.function is useful in creating and using computational graphs, they should be used in training and in deployment, however it isnt needed for most of your functions.

Lets say that we are building a special layer that will be apart of a larger model. We would not want to have the tf.function decorator above the function that constructs that layer because it is merely a definition of what the layer will look like.

On the other hand, lets say that we are going to either make a prediction or continue our training using some function. We would want to have the decorator tf.function because we are actually using the computational graph to get some value.

A great example would be constructing a encoder-decoder model. DONT put the decorator around the function the create the encoder or decoder or any layer, that is only a definition of what it will do. DO put the decorator around the "train" or "predict" method because those are actually going to use the computational graph for computation.

Drew
  • 51
  • 6
  • 1
    But what about side effects or e.g. `tf.range()`. AFAIK these can not be converted automatically. So I would need write my custom layers with auto graph in mind from the start. Therefore I can not just decorate the calling (prediction) function. – problemofficer Apr 30 '20 at 07:18
2

Per my understanding and according to the documentation, using tf.function is highly recommended mainly for speeding up your code since the code wrapped by tf.function would be converted to a graph and therefore there is a room for some optimizations (e.g. op pruning, folding, etc.) to be done which may not be performed when the same code is run eagerly.

However, there are also a few cases where using tf.function might incur additional overhead or does not result in noticeable speedups. One notable case is when the wrapped function is small and only used a few times in your code and therefore the overhead of calling the graph might be relatively large. Another case is when most of the computations are already done on an accelerator device (e.g. GPU, TPU), and therefore the speedups gained by graph computation might not be significant.

There is also a section in the documentation where the speedups are discussed in various scenarios, and at the beginning of this section the two cases above have been mentioned:

Just wrapping a tensor-using function in tf.function does not automatically speed up your code. For small functions called a few times on a single machine, the overhead of calling a graph or graph fragment may dominate runtime. Also, if most of the computation was already happening on an accelerator, such as stacks of GPU-heavy convolutions, the graph speedup won't be large.

For complicated computations, graphs can provide a significant speedup. This is because graphs reduce the Python-to-device communication and perform some speedups.

But at the end of the day, if it's applicable to your workflow, I think the best way to determine this for your specific use case and environment is to profile your code when it gets executed in eager mode (i.e. without using tf.function) vs. when it gets executed in graph mode (i.e. using tf.function extensively).

today
  • 27,220
  • 7
  • 64
  • 86