Batch processing of files

Using the Python standard libraries (i.e., the glob and os modules), we can also quickly code up batch operations e.g. over all files with a certain extension in a directory. For example, we can make a list of all .wav files in the audio directory, use Praat to pre-emphasize these Sound objects, and then write the pre-emphasized sound to a WAV and AIFF format file.

[1]:

# Find all .wav files in a directory, pre-emphasize and save as new .wav and .aiff file
import parselmouth

import glob
import os.path

for wave_file in glob.glob("audio/*.wav"):
    print("Processing {}...".format(wave_file))
    s = parselmouth.Sound(wave_file)
    s.pre_emphasize()
    s.save(os.path.splitext(wave_file)[0] + "_pre.wav", 'WAV') # or parselmouth.SoundFileFormat.WAV instead of 'WAV'
    s.save(os.path.splitext(wave_file)[0] + "_pre.aiff", 'AIFF')

Processing audio/5_y.wav...
Processing audio/the_north_wind_and_the_sun.wav...
Processing audio/5_b.wav...
Processing audio/bet.wav...
Processing audio/2_b.wav...
Processing audio/1_b.wav...
Processing audio/3_y.wav...
Processing audio/1_y.wav...
Processing audio/bat.wav...
Processing audio/4_b.wav...
Processing audio/4_y.wav...
Processing audio/3_b.wav...
Processing audio/2_y.wav...

After running this, the original home directory now contains all of the original .wav files pre-emphazised and written again as .wav and .aiff files. The reading, pre-emphasis, and writing are all done by Praat, while looping over all .wav files is done by standard Python code.

[2]:

# List the current contents of the audio/ folder
!ls audio/

1_b.wav       2_y_pre.aiff  4_b_pre.wav   bat.wav
1_b_pre.aiff  2_y_pre.wav   4_y.wav       bat_pre.aiff
1_b_pre.wav   3_b.wav       4_y_pre.aiff  bat_pre.wav
1_y.wav       3_b_pre.aiff  4_y_pre.wav   bet.wav
1_y_pre.aiff  3_b_pre.wav   5_b.wav       bet_pre.aiff
1_y_pre.wav   3_y.wav       5_b_pre.aiff  bet_pre.wav
2_b.wav       3_y_pre.aiff  5_b_pre.wav   the_north_wind_and_the_sun.wav
2_b_pre.aiff  3_y_pre.wav   5_y.wav       the_north_wind_and_the_sun_pre.aiff
2_b_pre.wav   4_b.wav       5_y_pre.aiff  the_north_wind_and_the_sun_pre.wav
2_y.wav       4_b_pre.aiff  5_y_pre.wav

[3]:

# Remove the generated audio files again, to clean up the output from this example
!rm audio/*_pre.wav
!rm audio/*_pre.aiff

Similarly, we can use the pandas library to read a CSV file with data collected in an experiment, and loop over that data to e.g. extract the mean harmonics-to-noise ratio. The results CSV has the following structure:

condition	…	pp_id
0	…	1877
1	…	801
1	…	2456
0	…	3126

The following code would read such a table, loop over it, use Praat through Parselmouth to calculate the analysis of each row, and then write an augmented CSV file to disk. To illustrate we use an example set of sound fragments: results.csv, 1_b.wav, 2_b.wav, 3_b.wav, 4_b.wav, 5_b.wav, 1_y.wav, 2_y.wav, 3_y.wav, 4_y.wav, 5_y.wav

In our example, the original CSV file, results.csv contains the following table:

[4]:

import pandas as pd

print(pd.read_csv("other/results.csv"))

   condition pp_id
0          3     y
1          5     y
2          4     b
3          2     y
4          5     b
5          2     b
6          3     b
7          1     y
8          1     b
9          4     y

[5]:

def analyse_sound(row):
    condition, pp_id = row['condition'], row['pp_id']
    filepath = "audio/{}_{}.wav".format(condition, pp_id)
    sound = parselmouth.Sound(filepath)
    harmonicity = sound.to_harmonicity()
    return harmonicity.values[harmonicity.values != -200].mean()

# Read in the experimental results file
dataframe = pd.read_csv("other/results.csv")

# Apply parselmouth wrapper function row-wise
dataframe['harmonics_to_noise'] = dataframe.apply(analyse_sound, axis='columns')

# Write out the updated dataframe
dataframe.to_csv("processed_results.csv", index=False)

We can now have a look at the results by reading in the processed_results.csv file again:

[6]:

print(pd.read_csv("processed_results.csv"))

   condition pp_id  harmonics_to_noise
0          3     y           22.615414
1          5     y           16.403205
2          4     b           17.839167
3          2     y           21.054674
4          5     b           16.092489
5          2     b           12.378289
6          3     b           15.718858
7          1     y           16.704779
8          1     b           12.874451
9          4     y           18.431586

[7]:

# Clean up, remove the CSV file generated by this example
!rm processed_results.csv