How to use awk command (string extraction)

This section explains how to use the awk command for string extraction, which can be used to extract the desired information from the results of a command output in tabular form.

TOC

Format

awk [options] [command] [file...]

Examples of Use

Display specified field
( '{print $1, $3}' )

Only specific fields are displayed among the fields separated by spaces.

$ cat test.txt 
1111 aaaa AAAA
2222 bbbb BBBB
3333 cccc CCCC
$ awk '{print $1, $3}' test.txt 
1111 AAAA
2222 BBBB
3333 CCCC

Change delimiter ( -F )

The -F option can be used to specify a delimiter.

$ cat test.txt 
1111,aaaa,AAAA
2222,bbbb,BBBB
3333,cccc,CCCC
$ awk -F',' '{print $1, $3}' test.txt 
1111 AAAA
2222 BBBB
3333 CCCC

Let’s specify two delimiters, : and /.

$ echo "111:aaa/bbb" | awk -F'[:/]' '{print $1, $3}'
111 bbb

Specify pattern

If you write 'pattern {action}', the action will be performed only on rows matching the pattern.

In the following example, only rows that match the conditions that the row number is greater than 1 and the fifth field is 256 are output.

$ ls -l
total 0
drwxr-xr-x  22 root  staff   704 Apr 12 23:38 Auth
drwxr-xr-x  14 root  staff   448 Apr 12 23:38 Broadcasting
drwxr-xr-x   6 root  staff   192 Apr 12 23:38 Bus
drwxr-xr-x  26 root  staff   832 Apr 12 23:38 Cache
drwxr-xr-x   4 root  staff   128 Apr 12 23:38 Config
drwxr-xr-x  12 root  staff   384 Apr 12 23:38 Console
drwxr-xr-x   7 root  staff   224 Apr 12 23:38 Container
drwxr-xr-x  31 root  staff   992 Apr 12 23:38 Contracts
drwxr-xr-x   6 root  staff   192 Apr 12 23:38 Cookie
drwxr-xr-x  29 root  staff   928 Apr 12 23:38 Database
drwxr-xr-x   5 root  staff   160 Apr 12 23:38 Encryption
drwxr-xr-x   6 root  staff   192 Apr 12 23:38 Events
drwxr-xr-x   8 root  staff   256 Apr 12 23:38 Filesystem
drwxr-xr-x  22 root  staff   704 Apr 12 23:38 Foundation
drwxr-xr-x   7 root  staff   224 Apr 12 23:38 Hashing
drwxr-xr-x  16 root  staff   512 Apr 12 23:38 Http
drwxr-xr-x   7 root  staff   224 Apr 12 23:38 Log
drwxr-xr-x  14 root  staff   448 Apr 12 23:38 Mail
drwxr-xr-x  20 root  staff   640 Apr 12 23:38 Notifications
drwxr-xr-x   9 root  staff   288 Apr 12 23:38 Pagination
drwxr-xr-x   6 root  staff   192 Apr 12 23:38 Pipeline
drwxr-xr-x  32 root  staff  1024 Apr 12 23:38 Queue
drwxr-xr-x   8 root  staff   256 Apr 12 23:38 Redis
drwxr-xr-x  36 root  staff  1152 Apr 12 23:38 Routing
drwxr-xr-x  16 root  staff   512 Apr 12 23:38 Session
drwxr-xr-x  27 root  staff   864 Apr 12 23:38 Support
drwxr-xr-x   8 root  staff   256 Apr 12 23:38 Translation
drwxr-xr-x  17 root  staff   544 Apr 12 23:38 Validation
drwxr-xr-x  13 root  staff   416 Apr 12 23:38 View
$
$ ls -l | awk 'NR>1 && $5==256 {print $5, $9}'
256 Filesystem
256 Redis
256 Translation

Extract specific rows to specific rows

Using awk, you can extract the contents from a line that contains a specific string to a line that contains a specific string.

<div>
  <h1>Hello World</h1>
  <ul class="list-main">
    <li>aaaaa</li>
    <li>bbbbb</li>
  </ul>
  <h3>Essential Links</h3>
  <ul class="list-sub">
    <li>11111</li>
    <li>22222</li>
    <li>33333</li>
  </ul>
</div>

To extract the lines from <ul class="list-main"> to </ul> in the above HTML, do the following

$ cat tmp.html | awk '/<ul class="list-main"/,/<\/ul/'
  <ul class="list-main">
    <li>aaaaa</li>
    <li>bbbbb</li>
  </ul>
Let's share this post !
TOC