Local terminal - Genetic Diversity Centre (GDC)

Learning Objectives

Main
◇ Be able to write simple Bash commands and understand more complex commands.
◇ Be able to redirect output and write log files.
Minor
◇ Be able to find help for terminal commands and applications in general.

Lecture Notes

⬇︎ Local Terminal (PDF)

Terminal

Hollywood uses the terminal when it is necessary to convince the audience that something extraordinary is happening and that the person is a highly skilled user (i.e. a hacker). In the course we use it for other, more practical reasons, but it is cool. Command line tools help you not only run programs, but also manage files efficiently, automate tasks, access powerful tools, and gain greater control over your system in a repeatable way, which is essential in many technical and scientific fields.

Get Started

First a few words about the following manual. The gray boxes are code chunks. Click on the page icon (⎘) at the top-right corner of each code block to copy the content directly into memory. Everything after the octothorp # (hashtag, pound or number sign) is a comment and will not be executed. The number of octothorp does not matter and it should be used to explain and document your code. It can also be used to format and structure your code to make it more readable.

Here is a simple bash example. You do not need to understand every line, but remember to use comments (#) to explain and structure your `code' to improve readability. We will talk more about comments and structure later.

## \\\\\|///// ##
##  ASCII Art  ##
## /////|\\\\\ ##

##   Get Ready 

mkdir -p ${HOME}/GDA/terminal # Create a working directory
cd ${HOME}/GDA/terminal       # Go to the newly created directory

## (a) Print an ASCII-Ant to a file

echo "My ASCII-Ant"    >  ant.txt # Write text and create a new file
echo "  \(¨)/ "        >> ant.txt # Write the head and the front legs
echo "  -( )- "        >> ant.txt # Write the torax and the middle legs
echo "  /(_)\ "        >> ant.txt # Write the abdomen and the hind legs
echo ""                >> ant.txt # Write some space

## (b) Print a ASCII-Bee to a file

echo "    <\         " >  bee.txt # Write first line to new file
echo " (¨)(_)(()))=- " >> bee.txt # Write second line to existing file
echo " \ |// /__     " >> bee.txt # Write line #3 to existing file
echo "    | )/ )     " >> bee.txt # Write 4th line
echo "     _  _      " >> bee.txt # Write last line
echo ""                >> bee.txt # Some space 

## (c) Show the insects

cat ant.txt

tac bee.txt

# tac is not working? try:
tail -r bee.txt                   # Alternative to `tac`

# My Comments:
# (!) tac is not a standard command, tail would be a more general workaround. 
# (?) Is there a way to flip the ant?  
# (?) Is there a better way to generate ASCII art?

After this example, we start slowly and explore different aspects of the Linux command terminal. I encourage more experienced students to jump from challenge to challenge and read the theory above if you struggle with the challenge.

Command Line Syntax

The general syntax pattern of a command line is:

# prompt> command [option(s)] (File)

Command example: List (ls) content of a directory (also know as folder).

ls         # show content of the current directory / folder
ls test/   # show content of folder test
ls -l -h   # list (option -l) content of current folder and use human-readable (option -h) file size 
ls -lh     # the same but shorter

Note

Most problems have more than one solution. Finding the most efficient way (e.g. -lh instead of -l -h) is very Linux. It is almost an obsession to find a faster and shorter solution to every Linux terminal command. In the beginning, you are stuck with what you know, and that is fine.

You may have noticed that the list command ls with the -lh option gives you more than just the directory contents. It shows you the permissions, the owner, the size of the file or folder, the (last modified) date and of course the contents (e.g. files and folders).

# - rw- r-- r-- jwalser users 161K Jan 17 12:22 p117_Help.txt
# - rw- r-- r-- jwalser users  92M Jan 17 16:51 p117_Metafile.txt
# d rwx r-x r-x jwalser users  16G Feb 19 15:43 data
# | --- --- ---  |       |     |    |            |
# |  |   |   |   |       |     |    |            └⊳ files / directories
# |  |   |   |   |       |     |    └⊳ date and time
# |  |   |   |   |       |     └⊳ size
# |  |   |   |   |       └⊳ group
# |  |   |   |   └⊳ owner
# |  |   |   └⊳ rights all (read/write/execute)
# |  |   └⊳ rights group (read/write/execute)
# |  └⊳ rights owner (read/write/execute)
# └⊳ d directory (folder) / - file

Permissions are important and must be set correctly. You will not be able to change a file if you only have read permissions (r--). See the Guru99 tutorial for more details on file permissions. This is an advanced topic and will be covered later. For now, remember that the ls command can do more than just show the contents of folders.

Basic Commands

Here a list of some basic commands.

pwd.................: absolute path name of the current working direction
man <command>.......: manual page for command (exit with q)
file <file>.........: determine file type
cd <where>..........: change directory/folder
cd .. ..............: go up one directory
cd .................: go home
mkdir <dir>.........: create directory
rmdir <dir>.........: remove directory (if empty)
ls <dir>............: list content of directory
ls -alh <dir>.......: more detail list
echo "message"......: prints content or message
cat <file>..........: print and concatenate files
head -n 5 <file>....: show first n=5 lines
tail -n 5 <file>....: show last n=5 lines
more <file>.........: read file (exit with q)
less <file>.........: similar to more but newer
> & >>..............: re-direct output (e.g. pwd > file.txt)
cp <ori> <copy>.....: copy file
mv <old> <new>......: move and/or rename file
rm <file>...........: remove file/directories - be careful and use option -i in the beginning
wc <file>...........: word, line, character, and byte count
grep "query" <file>.: search file(s) matching query
find................: find files based on characteristics (e.g. name)
sed.................: transform content of text files
clear...............: clear terminal 
history.............: show terminal history
date................: display date
cal.................: display calendar

▻ Check out the LinuxCheatSheet for more commands and ideas.

Help

Don't worry if you forget the options for a command, help is at your fingertips! A command line manual is built in. The command man <command> will help you find the option you are looking for. It can also be used to research a command. There are alternatives, and google would also work, but you may get information that is not appropriate for your system/version.

## Access the manual (Syntax: man <command>)
man cat # example

## Available commands (and aliases, functions, builtins and keywords)
compgen -c | less # scroll with [↑] and [↓] and exit with [q]

## Search for commands
apropos list | grep "directory"

## Help (Syntax: help <command>)
help cat # might not work for all commands

## Help
cat --help # might not work for all commands

## Version
cat --version # might not work for all commands

## Where is binary file for the command located?
which cat

Keyboard Shortcuts

Here are some useful shortcuts. These require the user to press and hold several keys or a sequence of keys at the same time.

Key	Action
Ctrl+A	Go to the beginning of the line.
Ctrl+E	Go to the end of the line.
Ctrl+L	Clear the screen.
Ctrl+U	Clear the line before the cursor position.
Ctrl+K	Clear the line after the cursor.
Ctrl+C	Kill the command that is currently running.
Ctrl+D	Exit the current shell.
Alt+F	Move cursor forward one word (OS X, Esc+F).
Alt+B	Move cursor backward one word (OS X, Esc+B).

System Variables

By convention, built-in shell variables are in uppercase. These are internal and reserved variables. Use them, but do not overwrite them!

echo ${BASH}          # Bash binary
echo ${BASH_VERSION}  # Bash version
echo ${SHELL}         # Gives present shell
echo ${USER}          # Displays username
echo ${HOME}          # Home directory of User
echo ${RANDOM}        # To get a random number
echo ${PWD}           # Current directory

User Defined Variables

You can define your own (local) variables if you like. Remember, they are local and not permanent.

## Create Working Directory (if missing):
[ ! -d ${HOME}/GDA/terminal ] && mkdir -p ${HOME}/GDA/terminal

## Got to Working Directory
cd ${HOME}/GDA/terminal

## Define two Variables
MyRegistry="NCC-1701"
MyName="USS_Enterprise"

## Define Resource Location 
RSSLocation="https://www.gdc-docs.ethz.ch/GeneticDiversityAnalysis/GDA/"

## Download A Text Files
curl -o NCC.txt ${RSSLocation}/images/NCC.txt

## Print variables and text file
clear; echo ""; echo "${MyName} (${MyRegistry})"; cat NCC.txt

# Remark: We can "chain" independent commands together on one line using semicolons (`;`).
#         But be careful, too many and too long chains can become unreadable.

Home Sweet Home

You should be in your home directory when you open the terminal.

## Where am I?
pwd # > current directory name
## Where is my HOME
echo ${HOME}
## Go HOME
cd ${HOME}
## Tilde is equal to HOME
cd ~ # > go HOME with tilde
## Shortcut HOME
cd

Directories / Folders

## Create a working directory/folder in your HOME
mkdir ${HOME}/workDIR          # Work folder (directory) in your home directory
mkdir ${HOME}/workDIR/subDIR   # Subfolder (sub-directory) inside you main folder

# Alternative:
# mkdir ~/workDIR
# mkdir ~/workDIR/subDIR

# Another (one-step) alternative
# mkdir -p ~/workDIR/subDIR

# HOME < you are here
# └── workDIR < created main folder
#     └──subDIR < created subfolder

## "Go to" the working directory

cd ${HOME}/workDIR/subDIR           # Change directory

# Alternatives:
# cd ~/workDIR/subDIR               # Tilde alternative
# cd workDIR/subDIR                 # alternative if you are already in your HOME directory
# cd ${HOME}; cd workDIR; cd subDIR # step-by-step

# HOME
# └── workDIR
#     └──subDIR < now you are here

The windows of your graphical desktop are called directories (folders) in the terminal. In fact, each window corresponds to a directory. We will learn how to move from directory to directory along a path.

# Tree       Path
# A          A
# ├── B      A/B
# │   └── C  A/B/C
# │   └── D  A/B/D
# └────── E  A/E

Summary:

# cd        > HOME 
# cd <path> > Move to specific folder
# cd ..     > Up one folder
# cd ../..  > Up two folders

In the previous example, we used mkdir several times to create subfolders. Would it not be convenient to do this with less typing?

## Create Test Folders
mkdir -p ${HOME}/TestFolder_A/TestFolder_B/TestFolder_C
# ➜ Option -p creates all intermediate folders

# WD < You are (still) here, but you have ceated a diretory and subdirectories. 
# └── TestFolder_A
#     └── TestFolder_B
#         └── TestFolder_C
#
# WD: working directory

## ------------------------------
## Go down and up / step-by-step 
## ------------------------------

# Down at once
cd TestFolder_A/TestFolder_B/TestFolder_C

# WD
# └── TestFolder_A
#     └── TestFolder_B
#         └── TestFolder_C < You are here now.

cd .. # go one folder up

# WD
# └── TestFolder_A
#     └── TestFolder_B < You are here now.
#         └── TestFolder_C

cd ../.. # go to folders up

# WD < You are back HOME.
# └── TestFolder_A
#     └── TestFolder_B
#         └── TestFolder_C 

## ------------------------
## Remove the test folders
## ------------------------

rmdir TestFolder_A

# ✘ does not work because there are sub-folders inside TestFolder_A
# ☛ rmdir deletes only empty folders
# ✔︎ we have to delete subfolder by subfolder

rmdir TestFolder_A/TestFolder_B/TestFolder_C

# WD < You are still here but subfolder C is gone.
# └── TestFolder_A
#     └── TestFolder_B

rmdir TestFolder_A/TestFolder_B

# WD < You did not move but you deleted sub-folder B.
# └── TestFolder_A

ls -l

❖ Challenge #1: The rmdir command only removes empty folders. Can you find an alternative way to remove the folder and all the subfolders at once? A Tip: Use the rm (remove) command and check the manual page for help (man rm).

Suggestion #1

We can use the remove rm command with the recursive -r option. Be careful with the force -f option - there is no undo and gone is gone!


  mkdir -p TestFolder_X/TestFolder_XX/TestFolder_XXX
  rm -ri TestFolder_X
  # ➜ Option -r, -R, --recursive: remove directories and their contents recursively
  # ➜ Option -i: safety belt (note: the order of the parameter matters) 

  mkdir -p TestFolder_X/TestFolder_XX/TestFolder_XXX  
  rm -fr TestFolder_X
  # ➜ Option -f, --force: ignore nonexistent files and arguments, never prompt

❖ Challenge #2: Does it matter whether you use capital letters or not? In other words, does test.txt == TEST.TXT or is FOLDER_A == folder_a? Find a way to test if your local terminal is case sensitive.

Suggestion #2

Some file systems are not case sensitive. It is important to know if TEST and test are the same on your computer.


  ## Test Files 
  touch file.txt                                                                      
  touch FILE.TXT
  ls -l
  echo "TEST" >> FILE.TXT
  cat file.txt
  rm FILE.TXT  
  # file.txt == FILE.TXT ➜ Terminal is not case sensitive

  ## Test Folder
  mkdir TEST
  mkdir test
  # mkdir: TEST: File exists ➜ Terminal is not case sensitive

❖ Challenge #3: Do the following:

Create two directories (RunA_210415 and RunB_210412) in your working directory (${HOME}/GDA/terminal/).
Create a subdirectory (infoA and infoB) for each directory.
Switch between the two subdirectories.

Suggestion #3


  ## Define working directory
  wd="${HOME}/GDA/terminal"

  ## Define user variables for directories 
  pathA="${wd}/RunA_210415/infoA/"
  pathB="${wd}/RunB_210412/infoB/"

  ## Create both directories
  cd ${wd}
  mkdir -p ${pathA} ${pathB}

  ## Move around
  cd ${pathA} # go to the first directory
  pwd
  cd ${pathB}
  pwd

  ## Cleanup
  cd ${wd}
  rm -fr ${pathA} ${pathB}

Data Streams

Every command (or programme) we run from the command line has three data streams. We can modify these streams in interesting and useful ways.

## Data streams
# STDIN  (0) - Standard input
# STDOUT (1) - Standard output (by default printed to the terminal)
# STDERR (2) - Standard error  (by default printed to the terminal)

Redirect Output (overwrite)

We change the default of STDOUT / STDERR from terminal to file.

## Print message on terminal
echo "Hello Terminal" # works and prints result to terminal
icho "Hello Terminal" # typo, does not work and prints error to terminal

## Redirect outputs (`STDOUT` and `STDERR`) to files
echo "Hello Terminal" 1> text1.txt 2> errors.txt # 1> standard output; 2> error output
more text1.txt  # you could also use [less] or [cat] instead of [more]
more errors.txt # empty because there was no error

## Redirect outputs (`STDOUT` and `STDERR`) to files
icho "Hello Terminal #1" 1> text1.txt 2> errors.txt
cat text1.txt  # empty because there was a typo in the command
cat errors.txt # error message

## Redirect outputs (`STDOUT` and `STDERR`) to one file
echo "Hello Terminal #2" > text2.txt 2>&1
cat text2.txt

## Redirect only output (`STDOUT`) in a file
echo "Hello Terminal #3" > text3.txt # you do not need 1> if you only print STDOUT
cat text3.txt

Be careful, you will overwrite a file if you redirect STDOUT and STDERR to an existing file with the same name.

## Overwrite the previous message
echo "Nothing in life is to be feared." > text1.txt
more text1.txt

You can print to different files and combine (concatenate) the files.

## Print another message to a different file
echo "It is only to be understood." > text2.txt
## Merge content of files
cat text1.txt text2.txt > text12.txt
cat text12.txt

Redirect Output (append)

There is an alternative to merging mulitple output files. We can use the double greater than operator (>>) to append the output to an existing file.

## Add (>>) a third line to the combined file
echo "Marie Curie" >> text12.txt
cat text12.txt

Redirecting from a File

We can use the less than operator (<) to change input direction.

<command> file.txt   # do something with the file
<command> < file.txt # feed the file to the command

It looks similar, but there is a subtle difference.

## Count the number of lines in a text file,
## but use two different approaches: 
wc -l text12.txt   >  count.txt # Version #1
wc -l < text12.txt >> count.txt # Version #2

## What was different?
cat count.txt
# 3 text12.txt
# 3 <file name is missing>

## Alternative solution to count the number of lines (more later)
cat text12.txt | wc -l

When we redirect the STDIN, we send the data "anonymously". The command does not know where the data comes from. A trick to avoid unwanted additional information. Here is an example of use:

## This is ugly
wc -l text12.txt > count.txt
echo "We have $(cat count.txt) lines."
## This is better
wc -l < text12.txt > count.txt
echo "We have $(cat count.txt) lines."

❖ Challenge #4: In the previous example, the file information was unwanted. Can you think of an example where it would be useful to have the filename with the line count?

Suggestion #4

Assume you have multiple files and you need to count the lines of each file.


  wc -l text1.txt text2.txt text12.txt
  # 1 text1.txt
  # 1 text2.txt
  # 3 text12.txt

  # Alternative Solution
  wc -l text*.txt

Piping

Sending data from one command (programme) (STDOUT) to another one (STDIN) is called piping. We use the vertical bar | to feed (pipe) the output from one command to the next.

## Create a multi-line message 
echo -e "Think Like a Proton\nStay Positive" > proton.txt
# ➜ \n stands for newline and divides the string into two lines
cat proton.txt

## Show first / last line
cat proton.txt | head -n 1 # first line
cat proton.txt | tail -n 1 # last line

## Count the number of lines (alternative)
cat text12.txt | wc -l

Example: Log-Files

You can use the redirect option to create log files of, for example, your terminal sessions. Some applications have a verbose (-v) option which you should not ignore when testing. You can redirect the output to a file and check for errors or warnings.

## Create an empty file and fill it
rm LOG.txt; touch LOG.txt
echo "-------------------------------"  >> LOG.txt
echo "Test Log File"                    >> LOG.txt
date +"Date: %d/%m/%y"                  >> LOG.txt
echo "User: ${USER}"                    >> LOG.txt
echo "-------------------------------"  >> LOG.txt
env | grep "LOGNAME" -A 1               >> LOG.txt
env | grep "SHELL"                      >> LOG.txt
echo "-------------------------------"  >> LOG.txt
echo "My working directory: ${PWD}"     >> LOG.txt
echo -n "My grep version: "             >> LOG.txt
grep --version                          >> LOG.txt
echo "-------------------------------"  >> LOG.txt
clear; cat LOG.txt

❖ Challenge #5.1: Let us create a virtual dice and store the results of three throws in a text file. We need a way to generate random numbers. We could use the special built-in variable `$RANDOM' which generates a random integer each time it is referenced by the internal bash function.

## Default use
echo ${RANDOM}

This will print a random number between 0 and 32767 and we need to find a way to restrict the range to [1-6].

## Restrict the range
echo $(( RANDOM % 7)) # range -> [0-6]

## We need a range from [1-6]

## Range between 1-6
echo $((1 + RANDOM % 6))
# or
echo $((RANDOM % 6 + 1))

So far, so good. Now we need to roll the dice three times and save the results in a file.

Suggestion #5.1

3.1 Step-by-Step


   echo $((1 + RANDOM % 6)) > random_number_1.tmp
   echo $((1 + RANDOM % 6)) > random_number_2.tmp
   echo $((1 + RANDOM % 6)) > random_number_3.tmp
   cat random_number_[123].tmp > random_numbers_S1.txt
   rm -i *.tmp
   cat random_numbers_S1.txt

3.2 No temporary files


   echo $((1 + RANDOM % 6)) >  random_numbers_2.txt
   echo $((1 + RANDOM % 6)) >> random_numbers_2.txt
   echo $((1 + RANDOM % 6)) >> random_numbers_2.txt
   cat random_numbers_2.txt

3.3 Using a FOR loop (for more advanced users)


   for ((i=0; i<3; i++)); do
     random_number=$((1 + RANDOM % 6))
     echo "Random number ${i}: $random_number"
     done > random_numbers.txt 
   cat random_numbers.txt

3.4 Another FOR loop solution


  for i in {1..5}; do
    echo $((RANDOM % 6 + 1))
    done > random_numbers.txt
  cat random_numbers.txt

❖ Challenge #5.2:

For debugging or any scenario where you need reproducible results, you need to ensure that your random number sequences are predictable. Do you have any idea how to do this?

Suggestion #5.2

Because `${RANDOM}` generates pseudo-random numbers, the sequence is deterministic based on the seed. This means that if you set the seed to the same value, you will get the same sequence of random numbers.


  RANDOM=123; echo $RANDOM
  echo $RANDOM
  RANDOM=123; echo $RANDOM

Seeding the Random Number Generator - In Bash, assigning a value to the `${RANDOM}` variable sets the seed for the pseudo-random number generator. This ensures that the sequence of random numbers generated by `${RANDOM}` will be predictable and reproducible from that seed value. Now we are going to apply this idea to the problem of dice:


  RANDOM=123
  for i in {1..5}; do
    echo $((RANDOM % 6 + 1))
  done > random_numbers_1.txt

If you change the seed value (i.e., the assignment to RANDOM), a different sequence of numbers will be produced.


  RANDOM=321
  for i in {1..5}; do
    echo $((RANDOM % 6 + 1))
  done > random_numbers_2.txt


  RANDOM=123
  for i in {1..5}; do
    echo $((RANDOM % 6 + 1))
  done > random_numbers_3.txt

Let us now compare the results:


  diff --brief random_numbers_1.txt random_numbers_2.txt
  diff --brief random_numbers_1.txt random_numbers_3.txt

You can make the code reproducible by seeding the random number generator at the beginning of the script. This way, the same sequence of random numbers will be generated every time you run the code.

Copy, Rename and Remove

## Copy a file - original is kept
cp text12.txt Marie_Curie.txt
ls -l
cat text12.txt Marie_Curie.txt
# Show differences between the two files:
diff text12.txt Marie_Curie.txt

## Rename (move) file - original is lost
mv ZERO.txt logfile.txt
ls -l

## Remove file(s)
rm -i text12.txt text.txt
ls -l

❖ Challenge #6.1: What is the difference between the two commands.

cp file.txt newfile.txt
cat file.txt > newfile.txt

Suggestion #6.1

- The first line makes a copy of the file. Any file can copied.


  cp file.pdf newfile.pdf # ✔︎

- The second command reads and writes the content of the text file into a new text file.


  cat file.pdf > newfile.pdf # !!!

Using the `cat` command to copy a PDF file can work in many cases, as `cat` simply reads the contents of a file, outputs it, and `>` redirects this output to another file. However, this is not the recommended way to copy files. The `dd` command is another powerful utility for copying files, particularly useful for copying large files or when you need to specify block size and other parameters.


  dd if=file.pdf of=newfile.pdf

cp or cat

cp and dd are optimized for copying files and may be faster and more reliable than cat for this purpose. These commands are designed to handle various errors and edge cases that may occur during the file copying process.

❖ Challenge #6.2: When would you use mv instead of cp?

cp file.txt newfile.txt
mv file.txt newfile.txt

Suggestion #6.2

cp is making a copy of a file, while mv is renaming/moving a file (folder). Copying a file is safer because you keep the original, but large files can take a long time to copy and use up disk space. So renaming/moving a file is much faster and more efficient.

Wildcards

In the previous chapter you created various text files. You can list all files in your working directory or select only specific files. Wildcards can be very handy for this task.

## List all text files
ls *.txt          # list all files with ending .txt
ls text?.txt      # list all files starting with text, followed by one character, and ending with .txt
ls text[123].txt  # list all files starting with text, followed by 1,2 or 3, and ending with .txt
# ➜ * any characters
# ➜ ? one charachter
# ➜ [123] a group - meaning 1, 2, or 3

## Remove multiple files
rm -i text1.txt text2.txt text3.txt
rm -i text[123].txt

❖ Challenge #7.1: Can you find a command line to delete the index (I1 or I2) samples but keep the forward (R1) and reverse (R2) reads?

Sample_GX0I1_R1.fq.gz
Sample_GX0I1_R2.fq.gz
Sample_GX0I1_I1.fq.gz
Sample_GX0I1_I2.fq.gz
Sample_GX0I2_R1.fq.gz
Sample_GX0I2_R2.fq.gz
Sample_GX0I2_I1.fq.gz
Sample_GX0I2_I2.fq.gz

Suggestion #7.1

There are usually more than just one possible solution. Some might be better (e.g. faster, more secure) than others but it is paramount you understand what you do.


  ## Suggestion 7.1a
  rm -i Sample_GX0I1_I1.fq.gz Sample_GX0I1_I2.fq.gz Sample_GX0I2_I1.fq.gz Sample_GX0I2_I2.fq.gz
  # ➜ Safe and it works but imagine you have a few hundred files.

  ## Suggestion 7.1b
  rm -i Sample_GX0I?_I?.fq.gz
  # ➜ Also safe and would work just fine as long as all samples follow the same name structure.

  ## Suggestion 7.1c
  rm -i *_I1.*
  # ➜ Short and precise but can be dangerous. 

  ## Tip: You might test your wildcards first?
  ls -ah *_I1.*

❖ Challenge #7.2: What is the problem with the following command? Can you correct it?

# cat sequence*.fa >> sequence_all.fa # ✖︎✖︎✖︎ Do not use!

Suggestion #7.2

The command will never finish until your hard drive is filled. The wildcard also includes the output file and this would create a "never ending" circle. Better/correct solutions would include:


  cat sequence*.fa >> all_sequence.fa
  cat sequence*.fa >> different_path/sequence_all.fa

Terminal History

You may be familiar with the history of your Internet browser. The Terminal also has a history. This is great because with the command history we can not only search the past, but it also means that we do not have to retype previous commands. Use the up and down arrows to move through your history. You can also access it:

history

With no options (default), you will see a list of previous commands with line numbers. Lines prefixed with a ‘*’ have been modified. An argument of n lists only the last n lines.

-c clear history 
-d offset Delete the history entry at position offset.
-d start-end Delete the history entries between positions start and end
-a Append the new history lines to the history file.
-n Append the history lines not already read from the history file to the current history list.
-r Read the history file and append its contents to the history list.
-w Write out the current history list to the history file.

Syntax examples:

history -5     # show last five entries
histroy 5-10   # show lines 5 to 10
history -d 2-3 # delete liens 2 and three
history -c     # clear entire history
history [-anrw] [filename]

❖ Challenge #8.1: Why would you need a history of your commands?

Suggestion #8.1

There are many good reason. Let me list a few, for me obvious ones. I am happy to learn new ones if you like to share.
- You might have noticed that with the keys you can navigate the history. Saves a lot of time typing the same or a similar command.
- You can also used the history for troubleshooting or to keep a log file of your session.

❖ Challenge #8.2: Preserve your history!

Create a text file with a title and your username.
Add the last 20 command lines you used to the file.
Add a date to the bottom of the file.

Suggestion #8.2


  echo "=== Safe My History ===" >  MyHistory.txt
  echo "${USER}"                 >> MyHistory.txt
  echo "-----------------------" >> MyHistory.txt
  history | tail -n 20           >> MyHistory.txt
  echo "-----------------------" >> MyHistory.txt
  date "+%A, %d.%B %Y"           >> MyHistory.txt
  clear; cat MyHistory.txt

Math Arithmetic

I am not saying it is perfect, but it is possible. Here some basic mathematical operations.

The legacy way to do math calculations with integer is using expr.

expr 7 - 2
expr 7 + 4
expr 7 \* 3
expr 9 \/ 3 # floating-point arithmetic does not work

For floating-point arithmetic you can use bc

echo "7-2" | bc
echo "7-4" | bc
echo "7*2" | bc
echo "7/2" | bc
echo "scale=2; 7/2" | bc

Working With Sequence Files

It is important to play around with a simple example to understand the idea behind a terminal command. However, once we have a solid foundation, we should move on to more practical problems and examples.

# Download a sequence fasta file
pwd # make sure this is the right place for the download
curl -O https://www.gdc-docs.ethz.ch/GeneticDiversityAnalysis/GDA/data/RDP_16S_Archaea_Subset.fasta

# A closer look at the file
ls -lh RDP_16S_Archaea_Subset.fasta

# Count the number of lines in the file 
wc -l RDP_16S_Archaea_Subset.fasta

# Have a look at the first 15 lines
head -n 15 RDP_16S_Archaea_Subset.fasta

# Have a look at the last 15 lines
tail -n 15 RDP_16S_Archaea_Subset.fasta

# Scroll through the fasta file
# Be careful with bigger file.
less RDP_16S_Archaea_Subset.fasta # remeber to exit with [q]

# Count the number of sequences
grep ">" RDP_16S_Archaea_Subset.fasta | wc -l
grep ">" -c RDP_16S_Archaea_Subset.fasta

# Find a specific sequence motif and highlight it
grep "cggattagatacccg" --color RDP_16S_Archaea_Subset.fasta

# How often does a sequence motif occur
grep "cgggaggc" -c RDP_16S_Archaea_Subset.fasta

# Find similar sequence motifs (step-by-step)
grep "cgggaggc" -c RDP_16S_Archaea_Subset.fasta
grep "cgggtggc" -c RDP_16S_Archaea_Subset.fasta
grep "cgggcggc" -c RDP_16S_Archaea_Subset.fasta
grep "cggggggc" -c RDP_16S_Archaea_Subset.fasta

# A faster alternative 
grep "cggg[atcg]ggc" -c RDP_16S_Archaea_Subset.fasta

# Find a long serie of e.g. Gs
grep "g" -c RDP_16S_Archaea_Subset.fasta
grep "gg" -c RDP_16S_Archaea_Subset.fasta
grep "gggg" -c RDP_16S_Archaea_Subset.fasta
grep -E -c "g{6}" RDP_16S_Archaea_Subset.fasta

❖ Challenge #9: The search for sequence motifs is very limited. Can you see why this is? How could you overcome the limitations?

Suggestion #9

There are at least two things to keep in mind that can have a negative impact on the search.
1 - The search is case sensitive. So a search must contain both (upper and lower case) or the sequences must be formatted in the same way.

tr '[:upper:]' '[:lower:]' < RDP_16S_Archaea_Subset.fasta > RDP_16S_Archaea_Subset_AllLow.fa

2 - The sequence in a fasta file may extend over several lines. Therefore, wrapped sequence motifs will not be recognised. You would have to reformat the fasta file to make the sequence line up.

# Wrapped Sequences
>seq_1
tgctgcaccc
cccgcactgc
>seq_2
tgctgcaacc
cccgcattgc

# Lined Up Sequence
>seq_1
tgctgcaccccccgcactgc
>seq_2
tgctgcaacccccgcattgc

We will see how to solve this when we get to the Remote Terminal chapter, which deals with tampering with Fasta files in more details.

Helpful

⬇︎ Linux Cheatsheet

⬇︎ Linux Pocket Guide

If you are new to the command line you might find these links useful:

In case you prefer a movie ...

This is only a small and limited selection. There is more, much more.