While writing tests for the QUERY dialect against some sample files in a directory, it ran into an issue of the order those files were given back. Operating system APIs generally do not return the list of files in a determined order, and the ordering across filesystems also varies.
This means that even with the same files, you could have the lists come back differently. One OS could say:
Domain-SQL>> select 'name 'date from %tests/file-tests/
where 'size > 100 and 'date = 26-Jul-2021
1 [%tests/file-tests/Disk50.txt 26-Jul-2021]
2 [%tests/file-tests/11barz99.txt 26-Jul-2021]
3 [%tests/file-tests/Apple3.txt 26-Jul-2021]
4 [%tests/file-tests/Banana1.txt 26-Jul-2021]
5 [%tests/file-tests/BANANA22.txt 26-Jul-2021]
...
While another would say:
Domain-SQL>> select 'name 'date from %tests/file-tests/
where 'size > 100 and 'date = 26-Jul-2021
1 [%tests/file-tests/Apple3.txt 26-Jul-2021]
2 [%tests/file-tests/Banana1.txt 26-Jul-2021]
3 [%tests/file-tests/BANANA22.txt 26-Jul-2021]
4 [%tests/file-tests/Disk50.txt 26-Jul-2021]
5 [%tests/file-tests/11barz99.txt 26-Jul-2021]
...
This made getting reproducible outputs to verify was hard.
I Made QUERY use SORT/CASE on the READ DIR Result
Getting determinism in the output meant using a function that guarantees an ordering for filenames:
Domain-SQL>> select 'name 'date from %tests/file-tests/
where 'size > 100 and 'date = 26-Jul-2021
1 [%tests/file-tests/11barz99.txt 26-Jul-2021]
2 [%tests/file-tests/Apple3.txt 26-Jul-2021]
3 [%tests/file-tests/BANANA22.txt 26-Jul-2021]
4 [%tests/file-tests/Banana1.txt 26-Jul-2021]
5 [%tests/file-tests/Disk50.txt 26-Jul-2021]
...
Having to pay for the sort adds a little bit of overhead, but it's not that significant.
Should READ DIR be Sorted By Default?
WASI in WebAssembly is looking to chase down sources of non-determinism and see what it can do to stop it. They mention directory listing order as one potential for problems:
Roadmap to determinism in WASI · Issue #190 · WebAssembly/WASI · GitHub
They seem to believe that on the same OS the directory ordering would be deterministic for the same files, but I don't know of any guarantee of that.
All This Points to Bigger Issues About Reproducibility
We can pick many examples... like whether a MAP! will always enumerate in the same order on different platforms, or with the same contents. Using a deterministically sorted implementation of map would seem to have a number of advantages.
Especially since there's a growing push in software for giving deterministic outputs by default. If you want some reasoning, see this article:
Determinism in software engineering • Buttondown
The more testing one does, the more important it seems.