A Weird Imagination

Deleting deeply nested OPFS directories

Posted in

The problem#

The straightforward OPFS API for deleting a directory

await parentDir.removeEntry(name, {recursive: true});

doesn't work if the directory contains too many (several hundred) levels of nested directories. The sensible workaround is never create such a directory structure and wipe the OPFS storage entirely if you ever do by accident, but as discussed previously, I did get in that situation due to a bug and wrote some helpers to deal with it.

The solution#

For any reasonable real-world case, what you actually want is probably removeEntry():

await parentDir.removeEntry(name, {recursive: true});

or to delete everything from the root directory:

const root = await navigator.storage.getDirectory();
for await (const handle of root.values()) {
  await root.removeEntry(handle.name, {recursive: true});
}

or possibly to simply use your browser's settings to delete all site data, which should include OPFS data.

In case you do still want to delete a directory in OPFS without worrying about how deeply nested the directory structure is, you can use

async function removeDirectoryFast(dir) {
  const toDelete = [];
  let i = 0;
  let maxDepth = 0;
  for await (const fileHandle
             of getFilesNonRecursively(dir)) {
    maxDepth = Math.max(maxDepth, fileHandle.depth);
    toDelete.push(fileHandle);
  }
  async function deleteAtDepth(depth) {
    for (const f of toDelete) {
      if (f.depth === depth) {
        await f.parentDir.removeEntry(f.name,
                            {recursive: true});
      }
    }
  }
  const increment = 500; // Works empirically in Firefox.
  for (let depth = maxDepth; depth > 1; depth -= increment) {
    await deleteAtDepth(depth);
  }
  await deleteAtDepth(1);
}

This depends on the getFilesNonRecursively() helper from my previous blog post.

The details#

Slow solution#

My first attempt was to simply reverse the order of getFilesNonRecursively() to get a post-order traversal of the directory tree, and therefore calling removeEntry() in that order would ensure the children had already been deleted, so the recursive option would be unnecessary:

async function removeDirectoryNaive(dir) {
  const toDelete = [];
  for await (const fileHandle of getFilesNonRecursively(dir)) {
    toDelete.unshift(fileHandle);
  }
  // Enumerate in reverse order.
  for (const f of toDelete) {
    const p = f.parentDir;
    if (p) {
      await p.removeEntry(f.name);
    }
  }
}

Unfortunately, that's slow. In a test using buildNested(root, 'z', 1000); (from my previous post to create a 1000-level deep nested directory), followed by either of the two functions, it took more than 17 times as long:

15828ms for buildNested()
18075ms for removeDirectoryNaive()
 1012ms for removeDirectoryFast()

So I tried to minimize the number of removeEntry() calls.

Broken solution#

My first idea was to just skip every 500 entries, since removeEntry() seems to work fine on directories nested only 500 deep.

async function removeDirectoryBroken(dir) {
  const increment = 500; // Works empirically in Firefox.
  const toDelete = [];
  let i = 0;
  for await (const fileHandle
             of getFilesNonRecursively(dir)) {
    if ((i++ % increment) === 0) {
      toDelete.unshift(fileHandle);
    }
  }
  // Enumerate in reverse order.
  for (const f of toDelete) {
    const p = f.parentDir;
    if (p) {
      await p.removeEntry(f.name, {recursive: true});
    }
  }

  // Delete any remaining files fully recursively.
  for await (const handle of dir.values()) {
    await dir.removeEntry(handle.name, {recursive: true});
  }
}

But that's wrong in a way that's masked by the simple data I'm generating. As it's just a single string of nested directories with no other entries in those directories, every 500th entry really is 500 levels deeper in the nested directory structure. There's no reason why, in general, every 500th entry couldn't just be a file in one of those directories. We could probably craft an OPFS filesystem such that that happened, although doing so is a bit tricky, partially because Firefox and Chromium sort the entries differently, so it's difficult to construct a counterexample that works in both.

Fast solution#

What we actually want is recursively delete directories with no more than 500 levels below them. The way we're enumerating the files, it's not straightforward to compute the exact number of levels below a particular directory, but it is easy to compute the depth of every entry and from there the maximum depth of any entry. Then we can observe that no entry's height can exceed the max depth minus the entry's depth.

So if we start by deleting everything with a depth of exactly 500 less than the max depth, then removeEntry() should work because it won't have to recurse more than 500 levels to delete any of those entries. Furthermore, then the maximum depth of the remaining entries is 500 less than the original max depth, so we can repeat on the entries at depth 1000 less than the max depth and so on. Finally, we delete at depth 1, which is equivalent to the normal delete operation without worrying about overly nested directories.

This results in many fewer removeEntry() calls. In particular, in the case that there aren't any directories nested deep enough to cause a problem, it makes the exact same removeEntry() calls as the code above that doesn't worry about nested directories at all, albeit after walking the entire OPFS filesystem to confirm that.

Measuring performance#

As I wasn't trying to get the highest quality numbers, to get the performance numbers quoted above, I just subtracted the results of performance.now():

  const t1 = performance.now();
  await buildNested(root, 'z', 1000);
  const t2 = performance.now();
  await removeDirectoryNaive(root);
  const t3 = performance.now();
  await buildNested(root, 'z', 1000);
  const t4 = performance.now();
  await removeDirectoryFast(root);
  const t5 = performance.now();

  console.log((t2-t1) + "ms for buildNested()");
  console.log((t3-t2) + "ms for removeDirectoryNaive()");
  console.log((t5-t4) + "ms for removeDirectoryFast()");

For higher precision, you would probably want to do something more complicated involving running the functions multiple times and averaging, perhaps using a library to help with the details.

Comments

Have something to add? Post a comment by sending an email to comments@aweirdimagination.net. You may use Markdown for formatting.

There are no comments yet.