Why does performing DFS with this code result in duplicate leaves?

I am writing an algorithm that discerns whether two trees have the same leaves.

These have the same leaf numbers in the same order so this returns true

These have the same leaf numbers in the same order so this returns true.

This is the code I wrote:

function leafSimilar(root1: TreeNode | null, root2: TreeNode | null): boolean {

    console.log(DFS(root1))

    const leavesRoot1 = DFS(root1);
    const leavesRoot2 = DFS(root2);

    for (let i = 0; i < Math.max(leavesRoot1.length, leavesRoot2.length); i += 1) {
        if (leavesRoot1[i] !== leavesRoot2[i]) {
            return false;
        }
    }

    return true;
};

function DFS(root, leaves = [] ) {

    if(!root) return leaves; 

    if (!root.left && !root.right) {
        leaves.push(root.val);
        return leaves;
    }

    // return DFS(root.left).concat(DFS(root.right)); // this is the correct answer

    return DFS(root.left, leaves).concat(DFS(root.right, leaves)); // why doesn't this work?
}

The last line in the code is what i initially thought, but it is wrong.

I couldn’t draw it in my mind so I logged them as seen in the second line.

It logs:

[
  6, 7, 4, 
  6, 7, 4, 
  6, 7, 4, 
  6, 7, 4, 9, 8,
  6, 7, 4, 9, 8
] 

After 2 hours, I cannot figure out why this is.

I would think it should be at least something similar to this:

[6, 
 6, 7, 
 6, 7, 4, 
 6, 7, 4, 9,
 6, 7, 4, 9, 8,
]

or just

[6,7,4,9,8]

which is the correct one.

Would someone be able to explain to me why?

The DFS function takes leaves array argument from the previous call.

This means that it should receive leaves from the node above, not below, so there shouldn’t be a repeat pattern in the leaves array because it is empty.

The DFS is in preorder according to the code I wrote, so the left-most nodes are evaluated first.

Please help me understand.

  • 3

    Why didn’t you just debug it? Consider the case where you’re at the node with value 1. If you use DFS(root.left, leaves) then you run into if (!root.left && !root.right) and then return all leaves so far because that’s what you passed as leaves. Instead of the leaves reachable from 1.

    – 




  • @VLAZ Thanks. I get that part, but the part I don’t get is this. When we are at value 1, isn’t leaves = [] ? I don’t get how it is [6,7,4] at that moment. We evaluate the left nodes first, but that shouldn’t influence the right nodes. In other words, at the VERY root(val: 3), how does the right nodes have access to what was returned from the left nodes?([6,7,4])

    – 




  • when we are at value 1, isn’t leaves = []” you’ve traversed depth first, left to right, so when you reach 1 you’ve visited 3 – 4 – 5- 2 – 7 – 4 (taking the left tree as an example) before reaching it. If you pass the same array of leaves around, then you have one array of leaves. It’s not empty because it’s already been modified.

    – 




Your approach would work if you would have done it this way:

    DFS(root.left, leaves);
    DFS(root.right, leaves);
    return leaves;

Here we ignore the array returned by DFS. Realise that when the first recursive call returns, leaves has some values from the left subtree. The second recursive call extends this array with more values. The right thing to do is to return leaves after that second population has happened.

Be aware that DFS returns a reference to the values array, so in essence each call returns the same reference. If you do values.concatenate(values) it is clear you duplicate values, but that is exactly what you are doing with:

DFS(root.left, leaves).concat(DFS(root.right, leaves));

It just is a bit more obscure… because values is getting populated in both recursive calls. But that is quite irrelevant to explain the problem. The key understanding is that both recursive calls return a reference to the same array.

To avoid such problems, it is good practice to never have a function return the reference to an object that was also passed as argument, and which it mutates. It is better to choose between these two patterns:

  • Have the function take the object (array in this case) as argument, but return undefined. This way the caller is made more aware that their argument has been mutated.

  • Have the function not take the object as argument, but have it construct one from scratch, which it then can return to the caller. In this pattern each call of the function returns a distinct object reference.

Mixing these two patterns can easily lead to the problems you have encountered. In this actual algorithm, I have a clear preference for the second pattern, and then there shouldn’t be a values argument. So:

function DFS(root) {
    if (!root) return []; 
    if (!root.left && !root.right) return [root.val];
    return DFS(root.left).concat(DFS(root.right));
}

The downside is that a lot of little arrays are created here, and concatenating all of them can be costly.

More elegant is to make use of a generator function. You get the best of both worlds: you avoid creating little arrays from scratch each time, and you avoid passing an array as argument that mutates.

function leafSimilar(root1, root2) {
    const leavesRoot1 = [...DFS(root1)]; // consume the iterator
    const leavesRoot2 = [...DFS(root2)];

    for (let i = 0; i < Math.max(leavesRoot1.length, leavesRoot2.length); i += 1) {
        if (leavesRoot1[i] !== leavesRoot2[i]) {
            return false;
        }
    }
    return true;
};

function* DFS(root) {
    if (!root) return; 
    if (!root.left && !root.right) yield root.val;
    yield* DFS(root.left);
    yield* DFS(root.right);
}

Leave a Comment