You’re Risking Data Loss By Using This Linux Wildcard Wrong
Linux & macOS Terminal
Linux wildcards let you type a single command that acts on whole groups of files at the same time. That’s a great time saver, unless things go wrong. And they can. Destructively.
What Wildcards Are For
The well-known wildcards are the question mark, ?, and the asterisk, *. These can be used to create filename patterns. The question mark represents any single character, and the asterisk represents any sequence of characters, including zero characters.
Knowing this, we can construct patterns that match multiple filenames. Instead of typing all the filenames on the command line, we type the pattern instead. All files that match the pattern are acted on by the command.
If we have a collection of files in a directory like this:
We can select groups of files that match the patterns we provide.
ls taf_*
That gives us all files with “taf_” at the start of their names.
ls *.sh
ls s*.sh
The first command lists all the shell script files in the directory. The second command lists only files that start with “s” that are also shell script files.
That all seems simple enough, and with ls, it is. But other commands can make use of this type of pattern matching. Problems arise when the shell tries to help by pattern matching before the command gets a chance.
Using the Asterisk With the find Command
The action of expanding a pattern into a list of matching files is called globbing.
It started out as a standalone command in Unix version 6, then became a library that could be linked into other programs, and nowadays it is a shell built-in. The expansion of the pattern is performed by the shell, and the results of the expansion are passed to the command as command line parameters.
We’ll look at two examples using the find command. One does what you might expect, but the second one may well surprise you.
For this example, we’re going to use a directory with a single file in it, called readme.txt. There are two directories, src and inc. They contain a mix of C, H, MD and TMP files.
ls -R
We can use find to recursively find files (-type f) with names that match our pattern (-name *.c), giving us a list of the C files.
find . -type f -name *.c
We can add the -not option to invert the search, showing us everything apart from the C files.
find . -type f -not -name *.c
Having reviewed this list, we choose to delete everything apart from the C files. We can do this by adding the -delete option.
find . -type f -not -name *.c -delete
find .
The second find command recursively lists everything in and below the current directory. All that remains are our C files.
That worked the way most of us would have expected. Now we’ll do the exact same thing, but this time the file in the current directory isn’t a text file, it’s a C file.
ls -R
We’ll use the same find command and options to delete everything but the C files. That’s not what we wanted at all.
find . -type f -not -name *.c -delete
find .
That’s blithely deleted every single file in the directory tree, apart from the one C file in the current directory.
We’ll reset the files once more, and issue the command in the way we’re supposed to use it.
All the files are in place, and we have a C file in the current directory, just as we did before.
ls -R
This time, we’ll wrap the wildcard pattern in single quotes.
find . -type f -not -name '*.c' -delete
find .
That is what we wanted. Everything’s gone apart from our C files.
OK, So What Went Wrong?
The single quotes stop the shell from expanding the filename pattern. It’s passed to the command or program as is, for the command to act upon.
In the example that worked, we had a readme.txt file in the current directory. The shell couldn’t find a match to *.c, so it passed *.c to find to act upon.
In the example that deleted everything but the C files, we had a file called main.c in the current directory. The shell matched the pattern to that file, and passed the name of the file to the find command. So find’s instructions were to delete everything that wasn’t called main.c.
We can illustrate this with a small C program that does no more than display its command line parameters in the terminal window.
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
int i;
printf("You supplied %d arguments.n", argc-1);
for (i=1; i<argc; i++)
printf("%-2d) "%s"n", i, argv[i]);
exit (0);
}
I saved this as a file called glob.c, and compiled it with:
gcc -o glob glob.c
The variable argc holds the number of arguments we pass to the program. A for loop runs through the list of arguments and prints each one to the terminal window.
The for loop starts at argument one, not zero. There is an argument zero. It always holds the name of the binary itself. To avoid muddying the water, I’ve avoided printing it. The only arguments that get printed are ones we provide on the command line.
./glob one two 3 ant beetle cockroach
Let’s try that with *.c as the command line parameter.
ls *.c
./glob *.c
Without any C files in the current directory, the shell passes *.c to the find command. The find command then acts upon the wildcard pattern itself. But, when we have a C file in the current directory, the shell passes the name of the matching C file to the program.
ls *.c
./glob *.c
Our program receives the name of the C file as its parameter, and the same is true for the find command. So actually, find was doing what it was told to do: delete all files except for the main.c file.
This time, we’ll wrap the wildcard pattern in single quotes.
ls *.c./glob '*.c'
The shell ignores the chance to apply its globbing to the wildcard pattern, and passes it straight to the command for further processing.
A Simple Fix, You Can Quote Me
As a general rule, quote wildcard patterns that you’re passing to commands like find. That’s all it takes to prevent this type of potentially disastrous mishap.