Evolutionary Genetics - Bioinformatics

RegEx - Nik Tesla


The Problem

I would like to create a song list based on the mp3 file names.

│
└── Tinariwen                      (Artist)
    └── plus10                     (Album)
        ├── 01 Toumast Tincha.mp3  (Song)
        ├── 02 Chaghaybou.mp3      (Song)
        └─ ...

I can list the folder and files using the command ls and redirect the output into a test file.

ls -1 Tinariwen/plus10/*.mp3 > SongList.txt
cat SongList.txt
wc -l SongList.txt # How many song are there?

Output

I would like to have the arstist’s name first, followed by the album’s name and the song:

Artist [Album] Songs: Tinariwen [plus10] Toumast Tincha

Step-by-Step Protocol

I would like to simplify the problem by dividing the task into smaller subtasks.

Remove suffix

This is an easy first step.

Tinariwen/plus10/01 Toumast Tincha.mp3
Tinariwen/plus10/02 Chaghaybou.mp3
Tinariwen/plus10/03 Arhegh Danagh.mp3
Tinariwen/plus10/04 Timadrit In Sahara.mp3
Tinariwen/plus10/05 Imidiwan Ahi Sigdim.mp3
Tinariwen/plus10/06 Tahalomot.mp3
Tinariwen/plus10/07 Sendad Eghlalan.mp3
Tinariwen/plus10/08 Imdiwan Ahi Tifhamam.mp3
Tinariwen/plus10/09 Koud Edhaz Emin.mp3
Tinariwen/plus10/10 Emajer.mp3
Tinariwen/plus10/11 Aghregh Medin.mp3
find [.mp3]
replace []
Tinariwen/plus10/01 Toumast Tincha
Tinariwen/plus10/02 Chaghaybou
Tinariwen/plus10/03 Arhegh Danagh
Tinariwen/plus10/04 Timadrit In Sahara
Tinariwen/plus10/05 Imidiwan Ahi Sigdim
Tinariwen/plus10/06 Tahalomot
Tinariwen/plus10/07 Sendad Eghlalan
Tinariwen/plus10/08 Imdiwan Ahi Tifhamam
Tinariwen/plus10/09 Koud Edhaz Emin
Tinariwen/plus10/10 Emajer
Tinariwen/plus10/11 Aghregh Medin

Replace slashes

The slashes might case problems with the regex I am using later.

find [/]
replace [ ]
Tinariwen plus10 01 Toumast Tincha
Tinariwen plus10 02 Chaghaybou
Tinariwen plus10 03 Arhegh Danagh
Tinariwen plus10 04 Timadrit In Sahara
Tinariwen plus10 05 Imidiwan Ahi Sigdim
Tinariwen plus10 06 Tahalomot
Tinariwen plus10 07 Sendad Eghlalan
Tinariwen plus10 08 Imdiwan Ahi Tifhamam
Tinariwen plus10 09 Koud Edhaz Emin
Tinariwen plus10 10 Emajer
Tinariwen plus10 11 Aghregh Medin
find [(\w+) (\w+) \d+ (\w+.+)]
replace [$1 [$2] $3]
Tinariwen [plus10] Toumast Tincha
Tinariwen [plus10] Chaghaybou
Tinariwen [plus10] Arhegh Danagh
Tinariwen [plus10] Timadrit In Sahara
Tinariwen [plus10] Imidiwan Ahi Sigdim
Tinariwen [plus10] Tahalomot
Tinariwen [plus10] Sendad Eghlalan
Tinariwen [plus10] Imdiwan Ahi Tifhamam
Tinariwen [plus10] Koud Edhaz Emin
Tinariwen [plus10] Emajer
Tinariwen [plus10] Aghregh Medin

One-Step Solution

Let see if I can combine all the steps.

find [(\w+)/(\w+)/\d+ (\w+.+).mp3]
replace [$1 [$2] $3]
Tinariwen [plus10] Toumast Tincha
Tinariwen [plus10] Chaghaybou
Tinariwen [plus10] Arhegh Danagh
Tinariwen [plus10] Timadrit In Sahara
Tinariwen [plus10] Imidiwan Ahi Sigdim
Tinariwen [plus10] Tahalomot
Tinariwen [plus10] Sendad Eghlalan
Tinariwen [plus10] Imdiwan Ahi Tifhamam
Tinariwen [plus10] Koud Edhaz Emin
Tinariwen [plus10] Emajer
Tinariwen [plus10] Aghregh Medin