1

I'm opening this question by request from the author of ruamel.yaml at How to change an anchored scalar in a sequence without destroying the anchor in ruamel.yaml?.

In answer https://stackoverflow.com/a/55717146/5880190, the following code was given to solve the question of how to update aliased values in ruamel.yaml data:

def update_aliased_scalar(data, obj, val):
    def recurse(d, ref, nv):
        if isinstance(d, dict):
            for i, k in [(idx, key) for idx, key in enumerate(d.keys()) if key is ref]:
                d.insert(i, nv, d.pop(k))
            for k, v in d.non_merged_items():
                if v is ref:
                    d[k] = nv
                else:
                    recurse(v, ref, nv)
        elif isinstance(d, list):
            for idx, item in enumerate(d):
                if item is ref:
                    d[idx] = nv
                else:
                    recurse(item, ref, nv)

    if hasattr(obj, 'anchor'):
        recurse(data, obj, type(obj)(val, anchor=obj.anchor.value))
    else:
        recurse(data, obj, type(obj)(val))

This brilliant contributed code worked so well that I wrapped it in a function and used it in my project to handle performing all changes to data, as seen here (with mild renaming to fit the style of the code it was pasted into): https://github.com/wwkimball/yamlpath/blob/319585620abfab199f3e15c87e0a2dc2c900aa1d/yamlpath/processor.py#L739-L781

This worked extremely well for most use-cases of my project. To wit, I've been using this code with great success in production ever since. The values I work with are almost exclusively string data and by chance, any non-string data happens to be aliased because it's usually re-used service Port numbers.

Based on these successes and not critically reading the code (I wholly trust the author to know ruamel.yaml far better than me), I incorrectly believed this code was updating only the target node and any references to it. As such, I also thought it was safe to use this code for updating non-aliased data, too. I was mistaken. This is my fault.

As it turns out, whenever a non-String value is passed for update to this function, it replaces not only that target node but every node with the same value even though they are not references to each other. So, when the data looks like this:

---
key: 42
other_key: 42

A call into the function to change key: 42 to key: 5280 not only makes the expected change, but also changes other_key: to 5280. This does not occur when the value being changed is non-aliased string data, no matter how many other nodes have the same value (which is what led me to believe it was safe to use this function to update any value, aliased or not). This does also occur when the values are Boolean.

I didn't understand what the code was actually doing. I used the code in a way it was not designed for.

What I need is the function to accept a node for change, then change only that node when it is a non-aliased value and when it is an aliased value, also change all other so-aliased nodes without affecting other nodes which are unrelated aliases (*alias1 versus *alias2) with the same value or which are non-aliased values that happen to be the same as the alias being updated. When I invoke the function, I only have the whole of the data, the target node, and the expected new value for it.

I'm open to refactoring my own code if the function needs more information at invocation.

seWilliam
  • 89
  • 7

1 Answers1

2

Your problems occur because of a restriction (cq bug) in the recurse function as presented in the other answer.

When you load your sample YAML both "nodes" for the values 42 have the same id. This is a Python optimization, and it applies to booleans, a subset of the integers (up to 100 IIRC), etc. Since recurse tests for identity (using is), the if v is ref matches two times.

This is essentially because of the object obj being passed in, and what you would need is the parent object and the key/index on that object:

import sys
import ruamel.yaml

yaml_str = """\
- key: 42
  other_key: 42
  k: &xx 196
  l: *xx
"""

def update_aliased_scalar(data, parent, key_index, val):
    def recurse(d, parent, key_index, ref, nv):
        if isinstance(d, dict):
            for i, k in [(idx, key) for idx, key in enumerate(d.keys()) if key is ref]:
                d.insert(i, nv, d.pop(k))
            for k, v in d.non_merged_items():
                if v is ref:
                    if hasattr(v, 'anchor') or (d is parent and k == key_index):
                        d[k] = nv
                else:
                    recurse(v, parent, key_index, ref, nv)
        elif isinstance(d, list):
            for idx, item in enumerate(d):
                if item is ref:
                    d[idx] = nv
                else:
                    recurse(item, parent, key_index, ref, nv)

    obj = parent[key_index]
    if hasattr(obj, 'anchor'):
        recurse(data, parent, key_index, obj, type(obj)(val, anchor=obj.anchor.value))
    else:
        recurse(data, parent, key_index, obj, type(obj)(val))

yaml = ruamel.yaml.YAML()
data = yaml.load(yaml_str)

update_aliased_scalar(data, data[0], 'key', 43)
update_aliased_scalar(data, data[0], 'k', 197)
yaml.dump(data, sys.stdout)

which gives:

- key: 43
  other_key: 42
  k: &xx 197
  l: *xx
Anthon
  • 51,019
  • 25
  • 150
  • 211