LINUX AWK Tutorial Series

LINUX AWK Tutorial Series | Chapter 5

Hello Welcome... to this series. In this chapter, I am going to discuss the built-in variables in AWK

Built-In Variables

These variables are pre-defined and we can directly use them. Also, remember don't give user-defined variables the same as built-in variables.

Below are the built-in variables available in awk.

RS: record separator in the file being processed. The default value is a new line.

FS: Field separator. The default value is white space.

ORS: output record separator. The default value is a new line.

OFS: Output field separator. The default value is white space.

NR: Number of records processed by awk

NF: Number of fields in the current record

FNR: Current record number in each file. It is incremented each time a new record is read. It will be initialized to 0 once a new file is read

FILENAME: Name of file being read

ARGC: Number of arguments provided at the command line.

ARGV: Array that store command line arguments.

ENVIRON: Array of environnment varaibles and its values.

Remember they are case sensitive and have to be used in Capital only.

In this chapter, I am going to use a new file as well. You can create the same for practice.

random_file.txt

Name

Gender

Dept

Salary

Bob

20000

Marlin

300000

Peter

ADMIN

34455

Rosy

78098

Pete

89023

In the previous example for Employee Data, the field separator as comma(,) and record separator was a new line.

How to use value of FS variable

By default, the field separator is whitespace(which can be multiple space/tabs)

To overcome this we use -F (explained in earlier chapters)

But now we can use FS along with begin section to use it as a field separator.

Example 1:

In the previous chapter, we used an example to display the total salary of employees. There if you notice, I used "-F".

Now that can be replaced by using the FS built-in variable as below.

FS is used inside Begin in this example as a variable and assigned a value as comma(,).

Example 2:

The question here will arise that if FS is a variable then why only to be used with BEGIN.

See the below example and try to understand what happens when using FS without begin.

In the first awk command, I used FS="," as field separator and printing name and gender. But if you see the output the first line was printed fully from Employee data and then only name and gender were printed. So what AWK did here first it ran print(action) then came to FS to identify the separator and then from the second line the output was correct.

So in these scenarios either you should use BEGIN as it is pre-processing before the main action or use -F option of AWK command.

Example 3:

I want my records output to be separated by "|" then what can we do.

So here we can use the OFS variable.

Please also note from above that BEGIN can be used standalone as well without END.

In the Begin section, I have defined FS and OFS and then doing an action.

If I need to have a blank line after each record in output then I will use ORS as below. By default, the ORS value is a new line so I have used \n\n twice.

Example 4:

I have created a new random_file. In that file, if you see the fields are separated by a new line and records are separated by 2 new lines.

Now in this scenario, I will use FS and RS variables.

I hope this would be clear to understand. Any doubt, please update in the comments sections.

Example 5:

How we use NR variable

Here NR is printing the line numbers for each record displayed in output.

How to use NF variable.

This will display the total fields present in each record.

Example 6:

I want to print just the last field.

What should I do?? NF is the solution. print $NF and it prints the last fields of each record.

Find the total number of fields is less than 5.

Try at home: Write awk command to print first and last field of each record.

Example 7:

How to print the first 3 lines from a file.

I have used NR<4

Example 8:

How to read the first 3 lines from 2 different files.

Now I should use FNR

If observed, you can see FNR value was reset both times. But NR value will not reset with a new file and it holds a value from one file and increases the counter for the next file.

Example 9:

How to print file name being processed. I will use FILENAME variable. I am showing various methods below.

By now you should be easily understanding the commands used.

Try at Home: Print all employees data who are female and have a salary of more than 1000. Output fields should be separated by "|". Also, print the file name at the last.

The commands are getting bigger now, I am feeling difficult to write on the terminal. Is there any way to make a little easy ???

Yes, Try AWK scripts

AWK Scripts

In awk scripts, I will create a separate file with the awk command and actions and it will be passed to awk using the option "-f".

The command section is the part which we are writing in single quotes.

Example 10:

Here I have created a com.awk file and written the command which I was writing on the terminal and executing it by the following command.

awk -f <command_file> <file_name_for_processing>

No restriction on the command file extension is there.

More chapters will continue.

If you like please follow and comment

LINUX AWK Tutorial Series | Chapter 5