Shell Variables (Learning the Korn Shell, 2nd Edition)

4.2.1. Positional Parameters

As we have already seen, you can define values for variables with statements of the form varname=value, e.g.:

$ fred=bob
$ print "$fred"
bob

Some environment variables are predefined by the shell when you log in. There are other built-in variables that are vital to shell programming. We look at a few of them now and save the others for later.

The most important special, built-in variables are called positional parameters. These hold the command-line arguments to scripts when they are invoked. Positional parameters have names 1, 2, 3, etc., meaning that their values are denoted by $1, $2, $3, etc. There is also a positional parameter 0, whose value is the name of the script (i.e., the command typed in to invoke it).

Two special variables contain all of the positional parameters (except positional parameter 0): * and @. The difference between them is subtle but important, and it's apparent only when they are within double quotes.

"$*" is a single string that consists of all of the positional parameters, separated by the first character in the variable IFS (internal field separator), which is a space, TAB, and newline by default. On the other hand, "$@" is equal to "$1" "$2" ... "$N", where N is the number of positional parameters. That is, it's equal to N separate double-quoted strings, which are separated by spaces. We'll explore the ramifications of this difference in a little while.

The variable # holds the number of positional parameters (as a character string). All of these variables are "read-only," meaning that you can't assign new values to them within scripts. (They can be changed, just not via assignment. See Section 4.2.1.2, later in this chapter.)

For example, assume that you have the following simple shell script:

print "fred: $*"
print "$0: $1 and $2"
print "$# arguments"

Assume further that the script is called fred. Then if you type fred bob dave, you will see the following output:

fred: bob dave
fred: bob and dave
2 arguments

In this case, $3, $4, etc., are all unset, which means that the shell substitutes the empty (or null) string for them (unless the option nounset is turned on).

4.2.1.1. Positional parameters in functions

Shell functions use positional parameters and special variables like * and # in exactly the same way that shell scripts do. If you wanted to define fred as a function, you could put the following in your .profile or environment file:

function fred {
    print "fred: $*"
    print "$0: $1 and $2"
    print "$# arguments"
}

You get the same result if you type fred bob dave.

Typically, several shell functions are defined within a single shell script. Therefore each function needs to handle its own arguments, which in turn means that each function needs to keep track of positional parameters separately. Sure enough, each function has its own copies of these variables (even though functions don't run in their own subprocess, as scripts do); we say that such variables are local to the function.

Other variables defined within functions are not local; they are global, meaning that their values are known throughout the entire shell script.[54] For example, assume that you have a shell script called ascript that contains this:

[54] However, see the section on typeset in Chapter 6 for a way of making variables local to functions.

function afunc {
    print in function $0: $1 $2
    var1="in function"
}
var1="outside of function"
print var1: $var1
print $0: $1 $2
afunc funcarg1 funcarg2
print var1: $var1
print $0: $1 $2

If you invoke this script by typing ascript arg1 arg2, you will see this output:

var1: outside of function
ascript: arg1 arg2
in function afunc: funcarg1 funcarg2
var1: in function
ascript: arg1 arg2

In other words, the function afunc changes the value of the variable var1 from "outside of function" to "in function," and that change is known outside the function, while $0, $1, and $2 have different values in the function and the main script. Figure 4-2 shows this graphically.

Figure 4-2. Functions have their own positional parameters

It is possible to make other variables local to functions by using the typeset command, which we'll see in Chapter 6. Now that we have this background, let's take a closer look at "$@" and "$*". These variables are two of the shell's greatest idiosyncracies, so we'll discuss some of the most common sources of confusion.

Why are the elements of "$*" separated by the first character of IFS instead of just spaces? To give you output flexibility. As a simple example, let's say you want to print a list of positional parameters separated by commas. This script would do it:
```
IFS=,
print "$*"
```
Changing IFS in a script is fairly risky, but it's probably OK as long as nothing else in the script depends on it. If this script were called arglist, the command arglist bob dave ed would produce the output bob,dave,ed. Chapter 10 contains another example of changing IFS.
Why does "$@" act like N separate double-quoted strings? To allow you to use them again as separate values. For example, say you want to call a function within your script with the same list of positional parameters, like this:
```
function countargs {
    print "$# args."
}
```
Assume your script is called with the same arguments as arglist above. Then if it contains the command countargs "$*", the function prints 1 args. But if the command is countargs "$@", the function prints 3 args.

Being able to retrieve the arguments as they came in is also important in case you need to preserve any embedded white space. If your script was invoked with the arguments "hi", "howdy", and "hello there", here are the different results you might get:
```
$ countargs $*
4 args
$ countargs "$*"
1 args
$ countargs $@
4 args
$ countargs "$@"
3 args
```
Because "$@" always exactly preserves arguments, we use it in just about all the example programs in this book.

4.2.1.2. Changing the positional parameters

Occasionally, it's useful to change the positional parameters. We've already mentioned that you cannot set them directly, using an assignment such as 1="first". However, the built-in command set can be used for this purpose.

The set command is perhaps the single most complicated and overloaded command in the shell. It takes a large number of options, which are discussed in Chapter 9. What we care about for the moment is that additional non-option arguments to set replace the positional parameters. Suppose our script was invoked with the three arguments "bob", "fred", and "dave". Then countargs "$@" tells us that we have three arguments. Upon using set to change the positional parameters, $# is updated too.

$ set one two three "four not five"   Change the positional parameters
$ countargs "$@"                      Verify the change
4 args

The set command also works inside a shell function. The shell function's positional parameters are changed, but not those of the calling script:

$ function testme {
>     countargs "$@"           Show the original number of parameters
>     set a b c                Now change them
>     countargs "$@"           Print the new count
> }
$ testme 1 2 3 4 5 6           Run the function
6 args                         Original count
3 args                         New count
$ countargs "$@"               No change to invoking shell's parameters
4 args

4.2.2. More on Variable Syntax

Before we show the many things you can do with shell variables, we have to make a confession: the syntax of $varname for taking the value of a variable is not quite accurate. Actually, it's the simple form of the more general syntax, which is ${varname}.

Why two syntaxes? For one thing, the more general syntax is necessary if your code refers to more than nine positional parameters: you must use ${10} for the tenth instead of $10. (This ensures compatibility with the Bourne shell, where $10 means ${1}0.) Aside from that, consider the Chapter 3 example of setting your primary prompt variable (PS1) to your login name:

PS1="($LOGNAME)-> "

This happens to work because the right parenthesis immediately following LOGNAME isn't a valid character for a variable name, so the shell doesn't mistake it for part of the variable name. Now suppose that, for some reason, you want your prompt to be your login name followed by an underscore. If you type:

PS1="$LOGNAME_ "

then the shell tries to use "LOGNAME_" as the name of the variable, i.e., to take the value of $LOGNAME_. Since there is no such variable, the value defaults to null (the empty string, ""), and PS1 is set just to a single space.

For this reason, the full syntax for taking the value of a variable is ${varname}. So if we used:

PS1="${LOGNAME}_ "

we would get the desired yourname_. It is safe to omit the curly braces ({}) if the variable name is followed by a character that isn't a letter, digit, or underscore.

4.2.3. Appending to a Variable

As mentioned, Korn shell variables tend to be string-oriented. One operation that's very common is to append a new value onto an existing variable. (For example, collecting a set of options into a single string.) Since time immemorial, this was done by taking advantage of variable substitution inside double quotes:

myopts="$myopts $newopt"

The values of myopts and newopt are concatenated together into a single string, and the result is then assigned back to myopts. Starting with ksh93j, the Korn shell provides a more efficient and intuitive mechanism for doing this:

myopts+=" $newopt"

This accomplishes the same thing, but it is more efficient, and it also makes it clear that the new value is being added onto the string. (In C, the += operator adds the value on the right to the variable on the left; x += 42 is the same as x = x + 42.)