Title: | Retime and Analyse Speech Signals |
---|---|
Description: | Retime speech signals with a native Waveform Similarity Overlap-Add (WSOLA) implementation translated from the 'TSM toolbox' by Driedger & Müller (2014) <https://www.audiolabs-erlangen.de/content/resources/MIR/TSMtoolbox/2014_DriedgerMueller_TSM-Toolbox_DAFX.pdf>. Design retimings and pitch (f0) transformations with tidy data and apply them via 'Praat' interface. Produce spectrograms, spectra, and amplitude envelopes. Includes implementation of vocalic speech envelope analysis (fft_spectrum) technique and example data (mm1) from Tilsen, S., & Johnson, K. (2008) <doi:10.1121/1.2947626>. |
Authors: | Alistair Beith [aut, cre, cph] |
Maintainer: | Alistair Beith <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.3 |
Built: | 2025-01-23 05:19:17 UTC |
Source: | https://github.com/abeith/retimer |
Extract amplitude envelope of filtered speech signal. Adapted from Tilson & Johnson (2008). Procedure:
extract_env( x, fs, low_pass = 80, fs_out = 80, win = c(700, 1300), mean_centre = FALSE, replace_init = FALSE )
extract_env( x, fs, low_pass = 80, fs_out = 80, win = c(700, 1300), mean_centre = FALSE, replace_init = FALSE )
x |
a speech signal |
fs |
sampling frequency of signal |
low_pass |
frequency of lowpass filter used for smoothing |
fs_out |
output sampling frequency |
win |
lower and upper frequencies for initial bypass filter. Default is 700Hz-1300Hz as in Tilson & Johnson (2008) |
mean_centre |
if TRUE signal will be scaled between 0 and 1 and then mean centred. Default is FALSE |
replace_init |
if TRUE (default is FALSE) first sample of result will be replaced with second sample to deal with initialisation issue in resampling |
1. Signal is bypass filtered to extract desired frequency range 2. Absolute signal is then lowpass filtered 3. Signal is downsampled and mean centred if desired
A matrix with time and amplitude
Tilsen, S., & Johnson, K. (2008). Low-frequency Fourier analysis of speech rhythm. The Journal of the Acoustical Society of America, 124(2), EL34–EL39. doi:10.1121/1.2947626
fft_spectro
Extracts 'Praat' PitchTier from wav object.
extractPitchTier(wav, res = 0.1, fmin = 50, fmax = 250, output = "PitchTier")
extractPitchTier(wav, res = 0.1, fmin = 50, fmax = 250, output = "PitchTier")
wav |
path to a wav file or a tuneR WAVE object |
res |
resolution of PitchTier |
fmin |
minimum frequency of PitchTier |
fmax |
maximum frequency of PitchTier |
output |
can be "PitchTier" or "file" |
Returns a PitchTier object or the temporary path to the generated PitchTier file
Extract from a wav file with reference to a TextGrid.
extractWord( x, word, tier = "Word", ignore_case = TRUE, instance = "random", wd = getwd() )
extractWord( x, word, tier = "Word", ignore_case = TRUE, instance = "random", wd = getwd() )
x |
path to a TextGrid |
word |
word to search for |
tier |
name of word tier in TextGrid |
ignore_case |
default is 'TRUE' |
instance |
instance of word in TextGrid to extract. Default extracts a random instance. Can also be numeric (row number) |
wd |
working directory for Praat to use. Accepts relative paths. |
Extracts section of wav file corresponding to word and saves in format name_wordi.wav where name is the original name, word is the word and x is the numeric instance.
density
Calculates low frequency power spectrogram of vocalic interval of speech signal. Following method of Tilsen & Johnson (2008)
fft_spectro(x, f_out = 80, window_size = 256, padding = 2048, plot = TRUE)
fft_spectro(x, f_out = 80, window_size = 256, padding = 2048, plot = TRUE)
x |
a 'tuneR' "Wave" object or the path to a .wav file. |
f_out |
the sample frequency for the output |
window_size |
number of samples to calculate each spectrum over |
padding |
length to zero pad signal to. If signal is longer than padding, this will be increased. |
plot |
if true a spectrogram will be plotted |
Returns a tibble with frequency (Hz), time (s) and power
Tilsen, S., & Johnson, K. (2008). Low-frequency Fourier analysis of speech rhythm. The Journal of the Acoustical Society of America, 124(2), EL34–EL39. doi:10.1121/1.2947626
fft_spectrum
Calculates low frequency power spectrum of vocalic interval of speech signal. Following method of Tilsen & Johnson (2008)
fft_spectrum(signal, f, f_out = 80, padding = 512)
fft_spectrum(signal, f, f_out = 80, padding = 512)
signal |
a speech signal |
f |
sampling frequency |
f_out |
output sampling frequency. Signal will be lowpass filtered at f_out/2 |
padding |
length to zero pad signal to. If signal is longer than padding, this will be increased. |
Returns a matrix with columns 'freq' (frequency in Hz) and 'pwr' (spectral power).
Tilsen, S., & Johnson, K. (2008). Low-frequency Fourier analysis of speech rhythm. The Journal of the Acoustical Society of America, 124(2), EL34–EL39. doi:10.1121/1.2947626
fft_spectro
Find the mode of numeric vector using the peak of its density distribution.
findPeak(x, ...)
findPeak(x, ...)
x |
a numeric vector |
... |
further arguements to be passed to 'density' |
Returns the value of 'x' that corresponds to the peak of the density curve.
density
Find a word in a TextGrid
findWord(x, word = "speech", tier = "Word", ignore_case = TRUE)
findWord(x, word = "speech", tier = "Word", ignore_case = TRUE)
x |
path to a TextGrid |
word |
word to search for |
tier |
name of word tier in TextGrid |
ignore_case |
default is 'TRUE' |
Returns a tibble with onset (t1) and offset (t2) of each occurance of the word in the TextGrid
extractWord
Flatten fundamental frequency contour using 'Praat'
flatF0(wav, .f = findPeak, ...)
flatF0(wav, .f = findPeak, ...)
wav |
path to a wav file or a tuneR WAVE object |
.f |
function to use to determine pitch. Default is findPeak which finds the mode of the existing pitch contour. |
... |
Additional arguments passed to extractPitchTier |
Returns a tuneR WAVE object of the input with a flat F0 contour
extractPitchTier
Convert a set of point anchors to a set of anchors that prevent overlaps while fixing the retiming factor within words.
get_serial_anchors( anc_in, anc_out, w_onsets, w_offsets, fs = NULL, retime_f = NULL, dry_run = FALSE, smudge = 0 )
get_serial_anchors( anc_in, anc_out, w_onsets, w_offsets, fs = NULL, retime_f = NULL, dry_run = FALSE, smudge = 0 )
anc_in |
a vector of time points in the input signal |
anc_out |
a vector of the times anc_in should be mapped to in the output signal |
w_onsets |
a vector of time points for the onsets of words. Should be same length as anc_in. |
w_offsets |
a vector of time points for the offsets of words. Should be same length as anc_in. |
fs |
Sample rate of signal. If provided, returned anchor points with be expressed in samples. If NULL result will be expressed in seconds. |
retime_f |
The desired factor that words should be sped by. If NULL the minimum change in rate that will prevent overlaps will be calculated. |
dry_run |
If TRUE function will exit early with the minimum factor that will prevent overlaps. |
smudge |
If > 0 this applies a crude adjustment to the calculated anchors to ensure monotonicity. Not necessary unless w_onsets are same as previous w_offsets. |
A list that can be used to perform retiming with the wsola function of this package.
wsola
S4 generic for length.
## S4 method for signature 'Wave' length(x)
## S4 method for signature 'Wave' length(x)
x |
a 'tuneR' WAVE object |
The length of the left channel of the WAVE object
length
Example speech from Tilsen & Johnson (2008)
mm1
mm1
A tuneR "Wave" object:
Tilsen, S., & Johnson, K. (2008). Low-frequency Fourier analysis of speech rhythm. The Journal of the Acoustical Society of America, 124(2), EL34–EL39. doi:10.1121/1.2947626
praatRetime
praatRetime(wav, tg)
praatRetime(wav, tg)
wav |
path to a wav file or a tuneR WAVE object |
tg |
a 'Praat' TextGrid object with 2 tiers: First tier should be intervals in the input audio file and second tier should be the same intervals with the desired onsets (t1) and offsets (t2). |
A wav file with the timing of the second tier of the TextGrid will be saved to the outfile location.
[read_tg()] for reading an existing TextGrid and [write_tg()] for saving a tibble as a TextGrid.
set.seed(42) data(mm1) dur <- length(mm1)/[email protected] x <- runif(10) t2_out <- dur*cumsum(x)/sum(x) t1_out <- c(0, t2_out[-length(t2_out)]) t2_in <- dur*seq_len(10)/10 t1_in <- c(0, t2_in[-length(t2_in)]) tg <- dplyr::tibble( name = rep(c("old", "new"), each = 10), type = "interval", t1 = c(t1_in, t1_out), t2 = c(t2_in, t2_out), label = rep(letters[1:10], times = 2) ) |> tidyr::nest(data = c(t1, t2, label)) if (Sys.which("praat") != "") { wav_retimed <- praatRetime(mm1, tg) } else { message("Skipping example because Praat is not installed.") }
set.seed(42) data(mm1) dur <- length(mm1)/mm1@samp.rate x <- runif(10) t2_out <- dur*cumsum(x)/sum(x) t1_out <- c(0, t2_out[-length(t2_out)]) t2_in <- dur*seq_len(10)/10 t1_in <- c(0, t2_in[-length(t2_in)]) tg <- dplyr::tibble( name = rep(c("old", "new"), each = 10), type = "interval", t1 = c(t1_in, t1_out), t2 = c(t2_in, t2_out), label = rep(letters[1:10], times = 2) ) |> tidyr::nest(data = c(t1, t2, label)) if (Sys.which("praat") != "") { wav_retimed <- praatRetime(mm1, tg) } else { message("Skipping example because Praat is not installed.") }
Executes a Praat script using the R system function.
praatScript(args, script = "reTimeWin.praat", wd = getwd(), praat = NULL)
praatScript(args, script = "reTimeWin.praat", wd = getwd(), praat = NULL)
args |
arguements to pass to Praat script ("–run" not required) |
script |
name of script if using a script from this package, or path to script for other scripts |
wd |
working directory for Praat to use |
praat |
path to Praat. If null will search for Praat in C:/Program Files (for Windows) or attempt to use "praat" for Unix based systems. |
Runs script in Praat and prints stdout to console.
Call 'Praat' via system2()
praatSys(args = "--version", praat = NULL, ...)
praatSys(args = "--version", praat = NULL, ...)
args |
arguements to pass to 'Praat' |
praat |
path to 'Praat'. If null will search for 'Praat' in C:/Program Files (for Windows) or attempt to use "praat" for Unix based systems. |
... |
arguements to pass to internal get_praat_path() function. This can be used to change the folder to look for R in for Windows (default is appDir = "C:/Program Files") |
Prints stdout to console
system2
Reads a 'Praat' TextGrid as a nested tibble
read_tg(file, encoding = "auto")
read_tg(file, encoding = "auto")
file |
path to TextGrid file |
encoding |
Passed to rPraat::tg.read: 'auto' (default) will detect encoding, or can be set to 'UTF-8' (rPraat default) |
Returns a nested tibble with 'name', 'type' and 'data'. 'data' has the variables 't1', 't2' and 'label'
Universal spectrogram function.
spectrogram( x, fs = NULL, method = NULL, output = "tibble", wintime = 25, steptime = 10 )
spectrogram( x, fs = NULL, method = NULL, output = "tibble", wintime = 25, steptime = 10 )
x |
a signal, 'tuneR' WAVE object, or the path to an .wav or .mp3 file. |
fs |
sample rate if supplying the signal as a vector |
method |
spectrogram implementation to use. Available options are 'phonTools', 'tuneR', 'gsignal', and 'seewave'. Default is to select the first of these methods that is available. |
output |
format of output |
wintime |
length of analysis window in ms |
steptime |
interval between steps in ms |
Returns a spectrogram in the desired format
Writes a nested tibble to a 'Praat' TextGrid file
write_tg(x, file)
write_tg(x, file)
x |
Nested tibble. Must contain the columns 'name', 'type' and 'data'. 'data' must have the columns 't1', 't2' and 'label' |
file |
File name to save TextGrid as |
Returns path of saved TextGrid file
Waveform Similarity Overlap-add. Translated from 'TSM Toolbox'.
wsola(x, s, win = "hann", winLen = 1024, synHop = 512, tol = 512)
wsola(x, s, win = "hann", winLen = 1024, synHop = 512, tol = 512)
x |
an audio signal |
s |
a scaling factor or a list of two vector with anchor points |
win |
window function. Default is 'hann' for hanning window. Can also be a custom window supplied as a vector |
winLen |
window length |
synHop |
synthesis window hop size |
tol |
tolerance for overlap delta |
retimed audio signal as vector
Driedger, J., Müller, M. (2014). TSM Toolbox: MATLAB Implementations of Time-Scale Modification Algorithms. In Proceedings of the International Conference on Digital Audio Effects (DAFx): 249–256.
fft_spectrum, get_serials_anchors
set.seed(42) data(mm1) dur <- length(mm1) n <- 10 x <- runif(n) anchors <- list(anc_in = c(0, dur*seq_len(n)/n), anc_out = c(0, dur*cumsum(x)/sum(x))) sig <- wsola(mm1@left, anchors)
set.seed(42) data(mm1) dur <- length(mm1) n <- 10 x <- runif(n) anchors <- list(anc_in = c(0, dur*seq_len(n)/n), anc_out = c(0, dur*cumsum(x)/sum(x))) sig <- wsola(mm1@left, anchors)