Extra commands
Overview
Teaching: 5 min
Exercises: 10 minQuestions
How can I search for a string in a file
What are pipes and how do they work?
What are sed and awk?
How can I see the differences between two files?
How can I zip files?
What are
.tararchives?Objectives
Be able to observe differences between files using
diffCreate and work with
.zipfiles and.tararchivesUnderstand the benefits of searching and using pipes

Searching
We will now look into one of the most useful tools in Linux. There is possibly enough material on this command for a whole episode, but we will keep it simple here.
The command for searching is grep, which stands for globally search for a regular expression. In English, it is
used to search for a string of characters in a specified file.
Let’s use grep to search for all lines in hamlet.txt that contain the word “hamlet”.
$ grep 'hamlet' hamlet.txt
You will notice running this that nothing has happened. We therefore need to use a case insensitive flag (i) to get
this working, we will also add in the display line occurrence flag (n). Case sensitivity is important in grep
$ grep -i "hamlet" hamlet.txt
As you can see, it produces a long output. But where grep really comes in useful is with pipes.
Pipes work on the concept that it is better to combine smaller commands into a more powerful and useful one. They can also help remove unnecessary temporary files, and sends the output of one command to another. The syntax is;
command_1 | command_2 | ... | command_n
Let’s work on this by piping the output of our previous command to less.
$ grep -i "hamlet" hamlet.txt | less
On the output we can see that our less window has opened up. Remember you can type Q to exit.
Piping Hamlet to a file
Use
grepand|to choose a word, “Hamlet”, “Ghost”, “Queen” or another of your choosing and pipe it to a new file.Solution
$ grep -n "Hamlet" | hamlet_occurrence.txt
Spot the difference with diff
We now move onto analysing the difference between files with the diff command. This compares one file to another to
ensure they are the same.
When we type the following command to compare our two files.
$ diff correct.txt incorrect.txt
The < corresponds to the first file entered, in this case correct.txt, and the > corresponds to the second file.
It will produce the following output.
1c1
< This message is correct
---
> This message is incorrect
The 1c1 highlights the lines that are different and what needs changing. This can be very helpful if you are looking
at two very similar files.
Let us change our files incorrect.txt and correct.txt with echo and see the resulting difference with diff.
We can press Enter before closing our quotes to spread the message onto the next line. You will notice something is
different when the prompt changes.
$ echo "This message is correct" > incorrect.txt
$ echo "This message is correct
> but not as correct as this." > correct.txt
Now we will use diff again to see what has changed.
$ diff correct.txt incorrect.txt
2d1
< but not as correct as this.
We see that the output 2d1 is telling us to delete a the second line in our file correct.txt to make it the same as
incorrect.txt.
So having the order right is important. The best way to describe diff is How can I change file 1 to make it the
same as file 2?
If we reverse the command, it will tell us to add a line to incorrect.txt to make it the same as correct.txt.
$ diff incorrect.txt correct.txt
1a2
> but not as correct as this.
Zipped files and archives
You may have encountered file type like .zip, .gz and .tar in your travels, and usually, operating systems such
as Windows might need specified programs to open them. Fortunately in Linux, you can create, zip, unzip, tar and un-tar
files like these.
Zip files are a single file containing one or more compressed files. Tar archives on the other hand are used to package files together for backup or distribution purposes. They are sometimes called “tape archives”. Many companies archive data on tape for long-term storage, which requires less floor space and energy. These archives can then be zipped with GNU Zip compression.
Zipping a file with GNU Zip compression is fairly straightforward, specify the file you want to zip.
$ touch new.txt
$ gzip new.txt
On running this command, we get no confirmation that the file has been created, but we can use ls to confirm that our
new file new.txt.gz has been created. To unzip the file, we use gunzip
$ gunzip new.txt.gz
This returns the file to its original state.
For .tar archives, we need to implement some flags to create it. To create an archive, we need the create c and
file f flags, as well as specitfying a name for the tar archive. You can also add the verbose flag (v), which lists
the files that are being added.
$ tar -cf archive.tar new.txt
If we check our directory, we can see that the original file, plus the tar archive itself are now present. Because this
is now an archive, not a directory, we cannot check its contents. Luckily there is a flag (t) that we can use to view
the archive contents.
$ tar -tf archive.tar
new.txt
As you can see, it lists out the contents of the archive in the same way that ls lists out the content of a directory.
Let us remove the original file, new.txt and then extract the contents of our .tar archive using the x flag.
$ rm new.txt
$ tar -xf archive.tar
Running our ls command will confirm that new.txt has returned to the directory by extracting the file. You can
remove .tar archives using the rm command.
tar-ing andgzip-ing a directoryUse
tarplus its flags to create a.tararchive of thewildcards/directory. Check the contents of the archive. Now zip the archive usinggzip.Finally, unzip and untar the archive.
Solution
$ tar -cf my_archive.tar wildcards/ $ tar -tf my_archive.tarwildcards/ wildcards/01.txt wildcards/00.txt wildcards/02.txt wildcards/07.txt wildcards/06.txt wildcards/10.txt wildcards/11.c wildcards/04.c wildcards/08.txt wildcards/03.c wildcards/09.txtgzip my_archive.tar gunzip my_archive.tar.gz tar -xf my_archive.tar
sed and awk sub-languages
We won’t go into too much detail about these, as they are primarily used in bash scripting for advanced topics, but it is still useful to know that such tools exist and that they can be used effectively.
sed the UNIX stream editor
The sed command is another handy tool in UNIX that can be used for;
- Searching
- Finding and replacing
- Inserting/deleting
It is mainly used to edit files without the need to go into a file and change it directly. The format is as follows;
sed 'opt/act/flag' file
awk - a language within UNIX
The awk command accesses a sub-language within UNIX. In a similar way that Python can be accessed through the
terminal, awk can be called within UNIX in a self-contained instance. It is a series of rules that take the form;
awk '(condition) {action}' file
This can be used for anything from printing to complex mathematical statements, and can be useful in bash scripting over multiple lines.
Key Points
grepselects lines in files that match patterns. It can be combined with pipes|to be even more useful.
.tararchives are very useful ways of converting a while folder into a single file. They are often used in data sharing.The creation of
.tararchives requires the use of flags to create and untar them.The
-cftar flags create an archive with a specified name. The-xfflag is used to extract the archive contents.