Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Categorization of Mouse Ultrasonic Vocalizations Using Machine Learning Techniques

Categorization of Mouse Ultrasonic Vocalizations Using Machine Learning Techniques acoustics Article Categorization of Mouse Ultrasonic Vocalizations Using Machine Learning Techniques 1, 2 3 Spyros Kouzoupis * , Andreas Neocleous and Irene Athanassakis Department of Music Technology and Acoustics, Hellenic Mediterranean University, 71500 Crete, Greece Department of Computer Science, University of Cyprus, 1678 Nicosia, Cyprus; neocleous.andreas@gmail.com Department of Biology, University of Crete, 70013 Heraklion, Greece; athanire@uoc.gr * Correspondence: skouzo@hmu.gr Received: 13 July 2019; Accepted: 23 October 2019; Published: 4 November 2019 Abstract: A study of the ultrasonic vocalizations of several adult male BALB/c mice in the presence of a female, is undertaken in this study. A total of 179 distinct ultrasonic syllables referred to as “phonemes” are isolated, and in the resulting dataset, k-means and agglomerative clustering algorithms are implemented to group the ultrasonic vocalizations into clusters based on features extracted from their pitch contours. In order to find the optimal number of clusters, the elbow method was used, and nine distinct categories were obtained. Results when the k-means method was applied are presented through a matching matrix, while clustering results when the agglomerative technique was applied are presented as a dendrogram. The results of both methods are in line with the manual annotations made by the authors, as well as with the ones presented in the literature. The two methods of unsupervised analysis applied on 14 element feature vectors provide evidence that vocalizations can be grouped into nine clusters, which translates into the claim that there is a distinct repertoire of “syllables” or “phonemes”. Keywords: ultrasonic vocalizations; mice BALB/c biosignals; k-means clustering; bioacoustics 1. Introduction It has been known for over three decades that mice produce ultrasonic courtship vocalizations [1]. Specific pitch patterns are characterized by short, repeating monochromatic vocalizations with silent gaps in between [2]. The production of ultrasonic vocalizations in mammals has a communicative role [3–7]. A number of studies have shown that mice produce ultrasonic vocalizations in at least two situations: Pups produce “isolation calls” when cold or when removed from the nest [8,9], while males emit ultrasonic vocalizations in the presence of females or when they detect female urinary pheromones [3]. It has also been observed that females in oestrus are attracted to vocalizing males, thus facilitating the process of reproduction [10]. Several studies use the term “courtship vocalizations” [7,11–13]. It is known that these vocalizations are mainly monochromatic and appear in sequences of small bursts with approximately 30 ms of silent gaps in between [4]. These small bursts are typically called “phonemes”. A spectrogram of such a sequence is presented in Figure 1. The vertical red lines indicate the points where phonemes were manually segmented. As described by Holy and Guo [3], murine vocalizations fall into two major groups: a group consisting of “modulated” pitch contours and a group containing “pitch jumps”. The frequency trace of the vocalization in the “pitch jump” group contains one or two discontinuities. They actually report three types of phonemes as subcategories. Similar phonemes appear in our dataset and will be presented in Section 3. The concept of using a representation of the pitch tracks as feature vectors and the use of machine learning techniques can also be found in [7,11,14–16]. In [14], Musolf et al. used the Avisoft Acoustics 2019, 1, 837–846; doi:10.3390/acoustics1040050 www.mdpi.com/journal/acoustics Acoustics 2019, 1 838 software to extract 25 features including, among others, frequency values indicating the start, center, and end frequency of the pitch track, frequency at peak energy, as well as the vocalization duration. They performed dimensionality reduction using principal component analysis and applied supervised techniques such as support vector machines. The phonemes of male mice were classified into seven categories (five modulated and two pitch jumps). In [11], the authors suggest four categories for the repertoire of the wild house mice. Two belong in the modulated and two in the pitch jump group. Since the phonemes of male vocalizations are repeating, Holy and Guo introduced the term “male mice songs” [3]. The term “song” has been widely used in animal vocalization research, including on birds, frogs, whales, and bats [17–20]. The terms “calls”, “phonemes”, “motifs”, and “phrases” relate to the structure of a song in a similar way to the songs in music [21]. Figure 1. Spectrogram of an ultrasonic vocalization sequence consisting of several syllables. In this study, we apply two unsupervised machine learning methods for clustering the ultrasonic vocalizations of young male BALB/c mice in the presence of a female in oestrus (BALB/c is an albino, laboratory-bred strain of the house mouse from which a number of common substrains are derived; having initially been bred in New York in 1920, over 200 generations of BALB/c mice are now distributed globally, and they are among the most widely used inbred strains used in animal experimentation). More precisely, we chose to apply the k-means method mainly because not only it is easy to understand and implement, but it is also one of the most widely used algorithms for clustering. Since we use k-means, which is a distance-based method, we normalized the data (range between 0 and 1). Furthermore, we chose to apply an additional method (agglomerative clustering) for the verification of our results. The agglomerative clustering results become more interesting when they are visualized as a dendrogram. The added value of such a method is that one can explore subclusters, which can give insights and a better understanding of the data. We show that the ultrasonic vocalizations can be categorized into nine distinct categories based on their acoustical features. This fact reinforces the claim that there is a distinct repertoire of “syllables” or “phonemes” and comprises the main contribution of this work. The rest of the paper is organized as follows: In Section 2, the methodology pertaining to data acquisition, signal analysis, and dataset formation along with the machine learning techniques used for clustering, is presented. In Section 3, remarks on our results are made, followed by a discussion and concluding remarks in Section 4. 2. Materials and Methods 2.1. Experimental Conditions and Recordings The BALB/c mice used for our experiments were three months old, having been born and raised in captivity at the Biology Department of the University of Crete (Heraklion, Crete, Greece, EL91-BIObr-09/2016, Athanassakis I., Kourouniotis K., Lika K.). During the experiments, a certain number of mice were transferred to an adjacent noise-insulated lab equipped with the ultrasonic recording devices (females being in oestrus at the time of the recordings and approximately at the same estrus stage). Acoustics 2019, 1 839 All the experiments were conducted with the mice in a container of dimensions of 30 15 15 cm. The ultrasonic condenser microphone (Model CM16/CMPA, Avisoft Bioacoustics, Berlin, Germany) was placed on top of the container at a distance of approximately 30 cm (Figure 2). Figure 2. Left: Equipment used for the recording of the ultrasonic vocalizations. Right: Container hosting a female and a male mouse. Signal acquisition was done through Ultra Sound Gate (Avisoft Bioacoustics, Berlin, Germany), which performs analog to digital conversion, signal conditioning, and provides polarizing voltage to the ultrasonic condenser microphone. It comes with built-in anti-aliasing filters, and its A/D converters are of a Delta–Sigma architecture. It can amplify the signal up to 40 dB, while its sampling frequency can reach 750 kHz at 8 bit resolution. We used a 250 kHz sampling rate at a 16 bit resolution, and the recording was done in the SASLab software environment (Avisoft Bioacoustics, Berlin, Germany). 2.2. Data and Pitch Contour Extraction The total duration of all the ultrasonic recordings undertaken sum up to about 2.5 h. Several recording sessions were undertaken, and more than 15 different mice were used. The recordings were manually segmented, resulting in a dataset of 179 phonemes. For each phoneme extracted from a series of audio recordings, contours were formed from the detected vectors of the monochromatic frequencies. The pitch contour of the ultrasonic monochromatic vocalizations is formed from overlapping frames, where in each frame the most prominent frequency is detected through the FFT. (Here, we use the term “pitch” loosely since this term is meant to be used for the perceived sensation of frequency). The amplitude of the prominent frequency, compared to the remaining spectrum, is significantly higher, as shown in Figure 3, and thus, it is relatively easy to identify a unique pitch candidate for each frame. After the estimation of pitch candidates, a post-processing procedure is applied in order to correct some errors, appearing mainly in the noisiest parts of the signal. All pitch contour shapes were manually categorized into one of the representative categories depicted in Figure 4 (see also Section 3). 2.3. Feature Extraction The approach taken here was to define a set of 14 features that pertain to: (a) the frequencies themselves and (b) the temporal positions of some designated time-points across the phonemes. Thus, the features are: (1) the beginning frequency, (2) the end frequency, (3) the mean frequency, (4) the standard deviation of the frequency, (5–6) the middle frequency (value and position), (7–8) the first quartile of the frequency (value and position), (9–10) the second quartile of the frequency (value and position), (11–12) the third quartile of the frequency (value and position), and (13–14) the fourth quartile of the frequency (value and position). The raw frequency values derived from the pitch track are normalized because we use distance metrics (i.e., Euclidean distance) in the k-means method. Acoustics 2019, 1 840 The decision on the number and nature of parameters was made with robustness and low dimensionality in mind. The selected parameters effectively attribute the contour shape of each vocalization. Descriptors based on the signal’s acoustic characteristics were not used (i.e., those derived from the Fourier transform, like the spectral centroid or the spectral spread). -55 -60 -65 -70 -75 -80 -85 -90 0 20 40 60 80 100 120 140 Frequency (kHz) Figure 3. A single audio frame spectrum. Note the prominence of the main vocalization frequency, which essentially makes this type of high-frequency mouse vocalization monochromatic. 2.4. k-Means The k-means algorithm is repeatedly applying two operations until convergence is reached. It initially computes the distances between all data points and all the prototypes representing every cluster and assigns the data points to the cluster belonging to the closest prototype. Then, it repositions the prototypes to the mean of each cluster. The convergence criterion can be set as the condition where all of the prototypes are not significantly different from the mean values of their clusters. Another criterion can simply be a specified number of iterations. In this paper, we used the first case as a convergence criterion. We used the elbow method for finding the optimal number of clusters. First, we compute the sum of squared errors (SSE) for k = 1, 2, . . . , 15, as shown in Equation (1), and then we look at the values of the SSE across the number of clusters. Typically, the SSE drops significantly as more clusters are added. After the optimal number of clusters for a particular problem is reached, the SSE remains relatively constant. W = D (1) k å r=1 In Equation (1), k is the number of clusters, n is the number of points in cluster r, and D is the r r sum of distances between all points in a cluster, given by: n 1 n r r D = jjd d jj (2) å å i j i=1 j=i A graph of the SSE against the number of clusters derived from our data is shown in Figure 5. It is indicated that beyond nine clusters, the SSE remains relatively constant. Amplitude (dB) Acoustics 2019, 1 841 Category "A" Category "B" Category "C" Category "D" 93.37 70.82 61.79 66.18 51.17 54.71 66.15 49.47 11 44 3 37 11 57 4 122 Category "E" Category "F" Category "G" Category "H" 69.89 60.61 82.66 64.29 50.78 46.79 40.49 50.81 8 121 4 25 6 45 8 110 Time (ms) Category "I" Category "J" Category "K" 87.35 98.91 83.56 54.21 45.37 49.38 2 149 6 55 6 72 Time (ms) Time (ms) Time (ms) Figure 4. A typical phoneme pitch contour for every category. Optimal number of clusters 3 6 9 12 15 Number of clusters Figure 5. Within-cluster sum of squared errors (SSE) across 15 runs of the k-means algorithm with k = 1, 2, . . . , 15. The optimal number of clusters in this example is 9 since after this model, the SSE does not improve significantly. The k-means algorithm typically returns different results when experiments are repeated, mainly because of the randomized initialization of the prototypes. As a result, the elbow method may also return a different number of optimal clusters. To create a robust conclusion for this work, we ran our experiments 500 times, and in each run, we identified the optimal number of clusters. In our dataset, we observed that the optimal clusters for different runs were between 8 and 10. However, in the majority of runs, the method suggested 9 optimal clusters. Frequency (KHz) Frequency (KHz) Frequency (KHz) Within−cluster sum of squares Acoustics 2019, 1 842 2.5. Agglomerative Clustering We chose to apply agglomerative clustering [22] to our data in order to compare the results with the widely used k-means clustering algorithm. In addition, data can be viewed as a dendrogram, which can give a better understanding of the subclusters and the relation of the instances between pairs. Agglomerative clustering groups the instances in a dataset with a hierarchical order. Initially, each instance is seen as a separate cluster. Then, a similarity measure is computed between all possible pairs. A linkage criterion determines pairs of instances that are grouped together within a single cluster, creating the first layer of subclusters. In a similar manner, pairs of data are linked in subclusters until all instances become one cluster. A threshold can be used in any branch of the tree to get the clustering results. The similarity measure can be any distance, for instance, the Euclidean or Mahalanobis distance. Examples of linkage criteria include the maximum or minimum distances between pairs or other more sophisticated approaches taking into account the probabilities between distributions. A dendrogram derived from our dataset is shown in Figure 6. We elaborate further on the outcome of the agglomerative clustering in Sections 3 and 4. 3. Results By visual inspection of the pitch contours, we identified 11 generic contour shapes, which were labeled according to their shape (see Figure 4). In Figure 7, we present some statistics of the phonemes in each category. The ground truth here is used only to measure the performance of the unsupervised methods. Therefore, the recommended number of clusters is derived by the “elbow method”. The ground truth is not involved in the training procedure. Cl. 1 Cl. 2 Cl. 3 Cl. 4 Cl. 5 Cl. 6 Cl. 7 Cl. 8 & 9 0.8 0.7 0.6 0.5 Threshold = 0.39 0.4 0.3 0.2 0.1 Number of instances Figure 6. Dendrogram resulting from the agglomerative clustering algorithm. Categories are indicated with different colors (in the electronic version of the paper). The phonemes fall into two general groups: those presenting some modulation without any pitch jumps and those with one or two pitch jumps. The modulated group is formed by the A–G categories, while the pitch jump group is formed by H–K categories. The number of modulated phonemes (categories A–G) is higher in the sample than the number of phonemes in the pitch jump group (categories H–K). From the mean frequency perspective, it is shown that the values in all categories are found around 60 kHz, except for category J, which has a mean frequency value of 71 kHz. The shortest mean duration of the phonemes is found in category F and the longest in category I. The mean duration is around 60 ms, except for categories D and E, which have longer durations. Linkage value Acoustics 2019, 1 843 Frequency Duration A Lamda type (n = 24) B Asymmetric lamda left (n = 29) C Asymmetric lamda right (n = 26) D M type (n = 22) E M left (n = 21) F Linear positive (n = 17) G Linear negative (n = 16) H jump asymmetric lamda right (n = 6) I Jump M left (n = 7) J Jump1 (n = 6) K Jump2 (n = 5) 40 50 60 70 80 90 100 50 100 150 kHz ms (a) (b) Figure 7. Statistical properties of the frequency (a) and the duration (b) of the phonemes of the 11 categories. The number of instances in each category is shown in parentheses after the labels on the left side of the figure. In Figure 8, mean normalized values of the 14 features for all 179 phonemes are plotted. The dots represent these values in every category and are joined with lines for better visualization. The category label (see Figure 4) is annotated below the horizontal axis, while the vertical lines simply separate the different categories. The number of phonemes in each category is shown at the top of the figure between the separation lines. The average value for each category is pictured as a thick horizontal line. In Section 4, we discuss further the consistency between the feature set, as shown in Figure 8, and the results from the k-means algorithm. 0.95 24 29 26 22 21 17 16 6 7 6 5 0.9 0.85 0.8 0.75 0.7 0.65 0.6 0.55 A B C D E F G H I J K Number of phonemes Figure 8. Mean normalized values of the 14 feature vectors in the 11 categories are shown as dots joined with lines, while thick horizontal lines depict the average values of the features in each category. Vertical lines simply separate the different categories, which are indicated by the letters at the bottom, while the figures at the top designate the number of phonemes in each category. Mean normalized values over 14 features Acoustics 2019, 1 844 The clustering results from the k-means algorithm for k = 9 are shown with a matching matrix in Table 1. The rows indicate the 11 ground truth categories and the columns the k-means output. More than 80% of the A - lambda category has been assigned to cluster 9, together with the 45% of the B - asymmetric lambda left. The majority of the instances in categories C - asymmetric lambda right and G - linear negative are assigned to cluster 3 (95% and 86%, respectively), showing some similarity in their feature vectors. Similarly, 77% of the D - M type, 81% of the E - M left, and 80% of the K - jump 2 categories are assigned to cluster 1. The two categories H - jump asymmetric lambda right and I - jump M left (67% for H and 100% for I), are placed in cluster 2. The category F - Linear positive is spread in clusters 4–7. In Figure 6 we present a dendrogram derived from the agglomerative clustering algorithm. In order to view the several clusters of this method, it is necessary to apply a threshold to a branch of the tree, for the sub-branches to form the respective clusters. With the threshold set at 0.4 we get 9 clusters, shown with different colors in Figure 6 (in the electronic version of the paper). The first five clusters are stemming from the same branch in the two higher levels of the tree and they contain mainly phonemes from the modulated group. The remaining four clusters are coming out from different branches of the tree and they contain phonemes mainly from the pitch jump category. Table 1. Matching matrix (k-means). Category/Cluster 1 2 3 4 5 6 7 8 9 A - Lambda 0 0 0 0 0 4 0 0 20 B - Asymmetric lambda left 0 0 0 1 0 2 13 0 13 C - Asymmetric lambda right 0 0 20 0 0 5 0 0 1 D - M type 17 0 3 0 0 0 0 0 2 E - M left 17 0 4 0 0 0 0 0 0 F - Linear positive 0 0 0 7 3 2 5 0 0 G - Linear negative 2 0 12 0 0 2 0 0 0 H - Jump asymmetric lambda right 0 4 1 0 0 1 0 0 0 I - Jump M left 0 7 0 0 0 0 0 0 0 J - Jump 1 0 0 0 0 0 0 0 5 1 K - Jump 2 4 1 0 0 0 0 0 0 0 4. Discussion In this work, we used unsupervised machine learning techniques to create clusters of phonemes that share similar characteristics. It is shown in Figure 7 that the mean frequency of the J category is statistically higher than the average frequency among all categories. In the matching matrix (Table 1), by comparing the k-means outputs to the ground truth, we can see that 83% of the phonemes of the J category fall in the 8th cluster. From the matching matrix, we also observe that categories D, E, and K are seen by the k-means mainly as one class. The mean durations of the phonemes for categories D and E are similar, although they are statistically higher than the overall mean duration (Figure 7). Looking at Figure 8, we observe very similar values for the normalized means across the 14 feature values for categories D, E, and K. Similarly, from the matching matrix we see that categories C and G fall into cluster 3. The normalized mean feature values of categories A, C, and G are also statistically similar. The highest mean feature value appears in category F, and the pitch jump categories appear to have relatively lower mean feature values in relation to the total mean value across all categories. The use of the dendrogram as a visualization and clustering technique adds some value to the understanding of the data and the hierarchy of the similarity between phonemes. It seems that if the threshold is placed at 0.39, then indeed, the agglomerative clustering returns results that are consistent with those of the k-means algorithm, as well as with those reported in the literature [3]. Regarding the size of the dataset used in this study, we have to remark that the number of phonemes belonging, especially, to the pitch jump group appears to be relatively small. These kinds of vocalizations were much rarer despite the fact that the collected vocalizations arose from several hours Acoustics 2019, 1 845 of recordings. Nevertheless, we claim that since our results are consistent with those reported in the literature, a richer dataset would have yielded similar results. Note also that despite the fact that data were recorded both in the audio and the ultrasonic spectrum, in this study we focused primarily on the ultrasonic range. The ultrasonic phonemes, in most cases, were mainly continuous whistles with a mean duration of 500 ms. The sounds with discontinuities (denoted as jumps) are attributed to the nonlinearities of the vocal fold system, pertaining to bifurcations. Thus, in our study, the precise representation of the frequency trace (contour) was of primary concern. This is the main reason we developed the mentioned contour extraction and processing method instead of using other freely available software, such as MUPET [23]. MUPET is a remarkable project, but its contour extraction process inspired by the workings of the human auditory system and symmetric and evenly distributed Gammatone filterbank (composed of 64 band-pass filters) that spans the whole mouse ultrasonic vocalization (USV) frequency range is not fully justified in our opinion (especially the fact that the upper and lower bounds of the frequency range is modeled by a smaller number of wider filters that are symmetric). Concluding Remarks Recordings of the repetitive ultrasonic monochromatic vocalizations of male mice in the presence of a female were made, and from these, phrases were manually isolated and pitch contours extracted. Two methods of unsupervised analysis were performed using 14 element feature vectors, providing evidence that vocalizations can be grouped into 9 clusters, which translates into the claim that there is a distinct repertoire of “syllables” or “phonemes”. Machine learning techniques such as k-means and agglomerative clustering proved suitable for clustering the different pitch patterns based on their reduced feature samples. This work supports the claims made by the scientific community concerning the existence of a distinctive set of mice ultrasonic vocalizations. Author Contributions: Conceptualization, S.K. and I.A.; Formal analysis, S.K. and A.N.; Investigation, S.K., A.N., I.A.; Methodology, S.K., A.N., I.A.; Resources, S.K., A.N., I.A.; Software, S.K., A.N.; Writing original draft, S.K., A.N. Funding: This research received no external funding. Acknowledgments: All applicable international, national, and/or institutional guidelines for the care and use of animals were followed. This article does not contain any studies with human participants performed by any of the authors. Conflicts of Interest: All authors declare that they have no conflict of interest. References 1. Dizinno, G.; Whitney, G.; Nyby, J. Ultrasonic vocalizations by male mice (mus musculus) to female sex pheromone: Experiential determinants. Behav. Biol. 1978, 22, 104–113. [CrossRef] 2. Gourbal, B.E.; Barthelemy, M.; Petit, G.; Gabrion, C. Spectrographic analysis of the ultrasonic vocalisations of adult male and female balb/c mice. Naturwissenschaften 2004, 91, 381–385. [CrossRef] [PubMed] 3. Holy, T.E.; Guo, Z. Ultrasonic songs of male mice. PLoS Biol. 2005, 3, e386. [CrossRef] [PubMed] 4. Hofer, M.A.; Shair, H. Ultrasonic vocalization during social interaction and isolation in 2-week-old rats. Dev. Psychobiol. 1978, 11, 495–504. [CrossRef] [PubMed] 5. Portfors, C.V.; Perkel, D.J. The role of ultrasonic vocalizations in mouse communication. Curr. Opin. Neurobiol. 2014, 28, 115–120. [CrossRef] [PubMed] 6. Kobayasi, K.I.; Ishino, S. Riquimaroux, H. Phonotactic responses to vocalization in adult Mongolian gerbils (Meriones unguiculatus). J. Ethol. 2014, 32, 7–13. [CrossRef] 7. Hoffmann, F.; Musolf, K.; Penn, D.J. Ultrasonic courtship vocalizations in wild house mice: Spectrographic analyses. J. Ethol. 2012, 30, 173–180. [CrossRef] 8. Ehret, G. Infant rodent ultrasounds—A gate to the understanding of sound communication. Behav. Genet. 2005, 35, 19–29. [CrossRef] [PubMed] Acoustics 2019, 1 846 9. Hahn, M.E.; Schanz, N. The effects of cold, rotation, and genotype on the production of ultrasonic calls in infant mice. Behav. Genet. 2002, 32, 267–273. [CrossRef] [PubMed] 10. Hammerschmidt, K.; Radyushkin, K.; Ehrenreich, H.; Fischer, J. Female mice respond to male ultrasonic songs with approach behaviour. Biol. Lett. 2009, 5, 589–592. [CrossRef] [PubMed] 11. Hoffmann, F.; Musolf, K.; Penn, D.J. Spectrographic analyses reveal signals of individuality and kinship in the ultrasonic courtship vocalizations of wild house mice. Physiol. Behav. 2012, 105, 766–771. [CrossRef] [PubMed] 12. D’amato F.R. Courtship ultrasonic vocalizations and social status in mice. Anim. Behav. 1991, 41, 875–885. [CrossRef] 13. Hanson, J.L.; Hurley, L.M. Female presence and estrous state influence mouse ultrasonic courtship vocalizations. PLoS ONE 2012, 7, e40782. [CrossRef] [PubMed] 14. Musolf K.; Meindl, S.; Larsen, A.L.; Kalcounis-Rueppell, M.C.; Penn, D.J. Ultrasonic vocalizations of male mice differ among species and females show assortative preferences for male calls. PLoS ONE 2015, 10, e0134123. [CrossRef] [PubMed] 15. Liu, R.C.; Miller, K.D.; Merzenich, M.M.; Schreiner, C.E. Acoustic variability and distinguishability among mouse ultrasound vocalizations. J. Acoust. Soc. Am. 2003, 114, 3412–3422. [CrossRef] [PubMed] 16. Kalcounis-Rueppell, M.C.; Petric, R.; Briggs, J.R.; Carney, C.; Marshall, M.M.; Willse, J.T.; Rueppell, O.; Ribble, D.O.; Crossland, J.P. Differences in ultrasonic vocalizations between wild and laboratory California mice (peromyscus californicus). PLoS ONE 2010, 5, e9705. [CrossRef] [PubMed] 17. Marler, P.R.; Slabbekoorn, H. Nature’s Music: The Science of Birdsong; Academic Press: Cambridge, MA, USA, 2004. 18. Kelley, D.B.; Tobias, M.L.; Horng, S.; Ryan, M. Producing and perceiving frog songs; Dissecting the neural bases for vocal behaviors in xenopus laevis. In Anuran Communication; Ryan, M.J., Ed.; Smithsonian Institution Press: Washington, DC, USA, 2001; pp. 156–166. 19. Au, W.W.; Pack, A.A.; Lammers, M.O.; Herman, L.M.; Deakos, M.H.; Andrews, K. Acoustic properties of humpback whale songs. J. Acoust. Soc. Am. 2006, 120, 1103–1110. [CrossRef] [PubMed] 20. Behr, O.; von Helversen, O. Bat serenades complex courtship songs of the sac-winged bat (saccopteryx bilineata). Behav. Ecol. Sociobiol. 2004, 56, 106–115. [CrossRef] 21. Busnel, R.G. Acoustic Behavior of Animals; Elsevier: Amsterdam, The Netherlands, 1964. 22. Gowda, K.C.; Krishna, G. Agglomerative clustering using the concept of mutual nearest neighbourhood. Patt. Recogn. 1978, 10, 105–112. [CrossRef] 23. Van Segbroeck, M.; Knoll, A.T.; Levitt, P.; Narayanan, S. MUPET-Mouse Ultrasonic Profile ExTraction: A Signal Processing Tool for Rapid and Unsupervised Analysis of Ultrasonic Vocalizations. Neuron 2017, 94, 465–485. [CrossRef] [PubMed] 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Acoustics Multidisciplinary Digital Publishing Institute

Categorization of Mouse Ultrasonic Vocalizations Using Machine Learning Techniques

Loading next page...
 
/lp/multidisciplinary-digital-publishing-institute/categorization-of-mouse-ultrasonic-vocalizations-using-machine-V5bUpeHBMm
Publisher
Multidisciplinary Digital Publishing Institute
Copyright
© 1996-2021 MDPI (Basel, Switzerland) unless otherwise stated Disclaimer The statements, opinions and data contained in the journals are solely those of the individual authors and contributors and not of the publisher and the editor(s). MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. Terms and Conditions Privacy Policy
ISSN
2624-599X
DOI
10.3390/acoustics1040050
Publisher site
See Article on Publisher Site

Abstract

acoustics Article Categorization of Mouse Ultrasonic Vocalizations Using Machine Learning Techniques 1, 2 3 Spyros Kouzoupis * , Andreas Neocleous and Irene Athanassakis Department of Music Technology and Acoustics, Hellenic Mediterranean University, 71500 Crete, Greece Department of Computer Science, University of Cyprus, 1678 Nicosia, Cyprus; neocleous.andreas@gmail.com Department of Biology, University of Crete, 70013 Heraklion, Greece; athanire@uoc.gr * Correspondence: skouzo@hmu.gr Received: 13 July 2019; Accepted: 23 October 2019; Published: 4 November 2019 Abstract: A study of the ultrasonic vocalizations of several adult male BALB/c mice in the presence of a female, is undertaken in this study. A total of 179 distinct ultrasonic syllables referred to as “phonemes” are isolated, and in the resulting dataset, k-means and agglomerative clustering algorithms are implemented to group the ultrasonic vocalizations into clusters based on features extracted from their pitch contours. In order to find the optimal number of clusters, the elbow method was used, and nine distinct categories were obtained. Results when the k-means method was applied are presented through a matching matrix, while clustering results when the agglomerative technique was applied are presented as a dendrogram. The results of both methods are in line with the manual annotations made by the authors, as well as with the ones presented in the literature. The two methods of unsupervised analysis applied on 14 element feature vectors provide evidence that vocalizations can be grouped into nine clusters, which translates into the claim that there is a distinct repertoire of “syllables” or “phonemes”. Keywords: ultrasonic vocalizations; mice BALB/c biosignals; k-means clustering; bioacoustics 1. Introduction It has been known for over three decades that mice produce ultrasonic courtship vocalizations [1]. Specific pitch patterns are characterized by short, repeating monochromatic vocalizations with silent gaps in between [2]. The production of ultrasonic vocalizations in mammals has a communicative role [3–7]. A number of studies have shown that mice produce ultrasonic vocalizations in at least two situations: Pups produce “isolation calls” when cold or when removed from the nest [8,9], while males emit ultrasonic vocalizations in the presence of females or when they detect female urinary pheromones [3]. It has also been observed that females in oestrus are attracted to vocalizing males, thus facilitating the process of reproduction [10]. Several studies use the term “courtship vocalizations” [7,11–13]. It is known that these vocalizations are mainly monochromatic and appear in sequences of small bursts with approximately 30 ms of silent gaps in between [4]. These small bursts are typically called “phonemes”. A spectrogram of such a sequence is presented in Figure 1. The vertical red lines indicate the points where phonemes were manually segmented. As described by Holy and Guo [3], murine vocalizations fall into two major groups: a group consisting of “modulated” pitch contours and a group containing “pitch jumps”. The frequency trace of the vocalization in the “pitch jump” group contains one or two discontinuities. They actually report three types of phonemes as subcategories. Similar phonemes appear in our dataset and will be presented in Section 3. The concept of using a representation of the pitch tracks as feature vectors and the use of machine learning techniques can also be found in [7,11,14–16]. In [14], Musolf et al. used the Avisoft Acoustics 2019, 1, 837–846; doi:10.3390/acoustics1040050 www.mdpi.com/journal/acoustics Acoustics 2019, 1 838 software to extract 25 features including, among others, frequency values indicating the start, center, and end frequency of the pitch track, frequency at peak energy, as well as the vocalization duration. They performed dimensionality reduction using principal component analysis and applied supervised techniques such as support vector machines. The phonemes of male mice were classified into seven categories (five modulated and two pitch jumps). In [11], the authors suggest four categories for the repertoire of the wild house mice. Two belong in the modulated and two in the pitch jump group. Since the phonemes of male vocalizations are repeating, Holy and Guo introduced the term “male mice songs” [3]. The term “song” has been widely used in animal vocalization research, including on birds, frogs, whales, and bats [17–20]. The terms “calls”, “phonemes”, “motifs”, and “phrases” relate to the structure of a song in a similar way to the songs in music [21]. Figure 1. Spectrogram of an ultrasonic vocalization sequence consisting of several syllables. In this study, we apply two unsupervised machine learning methods for clustering the ultrasonic vocalizations of young male BALB/c mice in the presence of a female in oestrus (BALB/c is an albino, laboratory-bred strain of the house mouse from which a number of common substrains are derived; having initially been bred in New York in 1920, over 200 generations of BALB/c mice are now distributed globally, and they are among the most widely used inbred strains used in animal experimentation). More precisely, we chose to apply the k-means method mainly because not only it is easy to understand and implement, but it is also one of the most widely used algorithms for clustering. Since we use k-means, which is a distance-based method, we normalized the data (range between 0 and 1). Furthermore, we chose to apply an additional method (agglomerative clustering) for the verification of our results. The agglomerative clustering results become more interesting when they are visualized as a dendrogram. The added value of such a method is that one can explore subclusters, which can give insights and a better understanding of the data. We show that the ultrasonic vocalizations can be categorized into nine distinct categories based on their acoustical features. This fact reinforces the claim that there is a distinct repertoire of “syllables” or “phonemes” and comprises the main contribution of this work. The rest of the paper is organized as follows: In Section 2, the methodology pertaining to data acquisition, signal analysis, and dataset formation along with the machine learning techniques used for clustering, is presented. In Section 3, remarks on our results are made, followed by a discussion and concluding remarks in Section 4. 2. Materials and Methods 2.1. Experimental Conditions and Recordings The BALB/c mice used for our experiments were three months old, having been born and raised in captivity at the Biology Department of the University of Crete (Heraklion, Crete, Greece, EL91-BIObr-09/2016, Athanassakis I., Kourouniotis K., Lika K.). During the experiments, a certain number of mice were transferred to an adjacent noise-insulated lab equipped with the ultrasonic recording devices (females being in oestrus at the time of the recordings and approximately at the same estrus stage). Acoustics 2019, 1 839 All the experiments were conducted with the mice in a container of dimensions of 30 15 15 cm. The ultrasonic condenser microphone (Model CM16/CMPA, Avisoft Bioacoustics, Berlin, Germany) was placed on top of the container at a distance of approximately 30 cm (Figure 2). Figure 2. Left: Equipment used for the recording of the ultrasonic vocalizations. Right: Container hosting a female and a male mouse. Signal acquisition was done through Ultra Sound Gate (Avisoft Bioacoustics, Berlin, Germany), which performs analog to digital conversion, signal conditioning, and provides polarizing voltage to the ultrasonic condenser microphone. It comes with built-in anti-aliasing filters, and its A/D converters are of a Delta–Sigma architecture. It can amplify the signal up to 40 dB, while its sampling frequency can reach 750 kHz at 8 bit resolution. We used a 250 kHz sampling rate at a 16 bit resolution, and the recording was done in the SASLab software environment (Avisoft Bioacoustics, Berlin, Germany). 2.2. Data and Pitch Contour Extraction The total duration of all the ultrasonic recordings undertaken sum up to about 2.5 h. Several recording sessions were undertaken, and more than 15 different mice were used. The recordings were manually segmented, resulting in a dataset of 179 phonemes. For each phoneme extracted from a series of audio recordings, contours were formed from the detected vectors of the monochromatic frequencies. The pitch contour of the ultrasonic monochromatic vocalizations is formed from overlapping frames, where in each frame the most prominent frequency is detected through the FFT. (Here, we use the term “pitch” loosely since this term is meant to be used for the perceived sensation of frequency). The amplitude of the prominent frequency, compared to the remaining spectrum, is significantly higher, as shown in Figure 3, and thus, it is relatively easy to identify a unique pitch candidate for each frame. After the estimation of pitch candidates, a post-processing procedure is applied in order to correct some errors, appearing mainly in the noisiest parts of the signal. All pitch contour shapes were manually categorized into one of the representative categories depicted in Figure 4 (see also Section 3). 2.3. Feature Extraction The approach taken here was to define a set of 14 features that pertain to: (a) the frequencies themselves and (b) the temporal positions of some designated time-points across the phonemes. Thus, the features are: (1) the beginning frequency, (2) the end frequency, (3) the mean frequency, (4) the standard deviation of the frequency, (5–6) the middle frequency (value and position), (7–8) the first quartile of the frequency (value and position), (9–10) the second quartile of the frequency (value and position), (11–12) the third quartile of the frequency (value and position), and (13–14) the fourth quartile of the frequency (value and position). The raw frequency values derived from the pitch track are normalized because we use distance metrics (i.e., Euclidean distance) in the k-means method. Acoustics 2019, 1 840 The decision on the number and nature of parameters was made with robustness and low dimensionality in mind. The selected parameters effectively attribute the contour shape of each vocalization. Descriptors based on the signal’s acoustic characteristics were not used (i.e., those derived from the Fourier transform, like the spectral centroid or the spectral spread). -55 -60 -65 -70 -75 -80 -85 -90 0 20 40 60 80 100 120 140 Frequency (kHz) Figure 3. A single audio frame spectrum. Note the prominence of the main vocalization frequency, which essentially makes this type of high-frequency mouse vocalization monochromatic. 2.4. k-Means The k-means algorithm is repeatedly applying two operations until convergence is reached. It initially computes the distances between all data points and all the prototypes representing every cluster and assigns the data points to the cluster belonging to the closest prototype. Then, it repositions the prototypes to the mean of each cluster. The convergence criterion can be set as the condition where all of the prototypes are not significantly different from the mean values of their clusters. Another criterion can simply be a specified number of iterations. In this paper, we used the first case as a convergence criterion. We used the elbow method for finding the optimal number of clusters. First, we compute the sum of squared errors (SSE) for k = 1, 2, . . . , 15, as shown in Equation (1), and then we look at the values of the SSE across the number of clusters. Typically, the SSE drops significantly as more clusters are added. After the optimal number of clusters for a particular problem is reached, the SSE remains relatively constant. W = D (1) k å r=1 In Equation (1), k is the number of clusters, n is the number of points in cluster r, and D is the r r sum of distances between all points in a cluster, given by: n 1 n r r D = jjd d jj (2) å å i j i=1 j=i A graph of the SSE against the number of clusters derived from our data is shown in Figure 5. It is indicated that beyond nine clusters, the SSE remains relatively constant. Amplitude (dB) Acoustics 2019, 1 841 Category "A" Category "B" Category "C" Category "D" 93.37 70.82 61.79 66.18 51.17 54.71 66.15 49.47 11 44 3 37 11 57 4 122 Category "E" Category "F" Category "G" Category "H" 69.89 60.61 82.66 64.29 50.78 46.79 40.49 50.81 8 121 4 25 6 45 8 110 Time (ms) Category "I" Category "J" Category "K" 87.35 98.91 83.56 54.21 45.37 49.38 2 149 6 55 6 72 Time (ms) Time (ms) Time (ms) Figure 4. A typical phoneme pitch contour for every category. Optimal number of clusters 3 6 9 12 15 Number of clusters Figure 5. Within-cluster sum of squared errors (SSE) across 15 runs of the k-means algorithm with k = 1, 2, . . . , 15. The optimal number of clusters in this example is 9 since after this model, the SSE does not improve significantly. The k-means algorithm typically returns different results when experiments are repeated, mainly because of the randomized initialization of the prototypes. As a result, the elbow method may also return a different number of optimal clusters. To create a robust conclusion for this work, we ran our experiments 500 times, and in each run, we identified the optimal number of clusters. In our dataset, we observed that the optimal clusters for different runs were between 8 and 10. However, in the majority of runs, the method suggested 9 optimal clusters. Frequency (KHz) Frequency (KHz) Frequency (KHz) Within−cluster sum of squares Acoustics 2019, 1 842 2.5. Agglomerative Clustering We chose to apply agglomerative clustering [22] to our data in order to compare the results with the widely used k-means clustering algorithm. In addition, data can be viewed as a dendrogram, which can give a better understanding of the subclusters and the relation of the instances between pairs. Agglomerative clustering groups the instances in a dataset with a hierarchical order. Initially, each instance is seen as a separate cluster. Then, a similarity measure is computed between all possible pairs. A linkage criterion determines pairs of instances that are grouped together within a single cluster, creating the first layer of subclusters. In a similar manner, pairs of data are linked in subclusters until all instances become one cluster. A threshold can be used in any branch of the tree to get the clustering results. The similarity measure can be any distance, for instance, the Euclidean or Mahalanobis distance. Examples of linkage criteria include the maximum or minimum distances between pairs or other more sophisticated approaches taking into account the probabilities between distributions. A dendrogram derived from our dataset is shown in Figure 6. We elaborate further on the outcome of the agglomerative clustering in Sections 3 and 4. 3. Results By visual inspection of the pitch contours, we identified 11 generic contour shapes, which were labeled according to their shape (see Figure 4). In Figure 7, we present some statistics of the phonemes in each category. The ground truth here is used only to measure the performance of the unsupervised methods. Therefore, the recommended number of clusters is derived by the “elbow method”. The ground truth is not involved in the training procedure. Cl. 1 Cl. 2 Cl. 3 Cl. 4 Cl. 5 Cl. 6 Cl. 7 Cl. 8 & 9 0.8 0.7 0.6 0.5 Threshold = 0.39 0.4 0.3 0.2 0.1 Number of instances Figure 6. Dendrogram resulting from the agglomerative clustering algorithm. Categories are indicated with different colors (in the electronic version of the paper). The phonemes fall into two general groups: those presenting some modulation without any pitch jumps and those with one or two pitch jumps. The modulated group is formed by the A–G categories, while the pitch jump group is formed by H–K categories. The number of modulated phonemes (categories A–G) is higher in the sample than the number of phonemes in the pitch jump group (categories H–K). From the mean frequency perspective, it is shown that the values in all categories are found around 60 kHz, except for category J, which has a mean frequency value of 71 kHz. The shortest mean duration of the phonemes is found in category F and the longest in category I. The mean duration is around 60 ms, except for categories D and E, which have longer durations. Linkage value Acoustics 2019, 1 843 Frequency Duration A Lamda type (n = 24) B Asymmetric lamda left (n = 29) C Asymmetric lamda right (n = 26) D M type (n = 22) E M left (n = 21) F Linear positive (n = 17) G Linear negative (n = 16) H jump asymmetric lamda right (n = 6) I Jump M left (n = 7) J Jump1 (n = 6) K Jump2 (n = 5) 40 50 60 70 80 90 100 50 100 150 kHz ms (a) (b) Figure 7. Statistical properties of the frequency (a) and the duration (b) of the phonemes of the 11 categories. The number of instances in each category is shown in parentheses after the labels on the left side of the figure. In Figure 8, mean normalized values of the 14 features for all 179 phonemes are plotted. The dots represent these values in every category and are joined with lines for better visualization. The category label (see Figure 4) is annotated below the horizontal axis, while the vertical lines simply separate the different categories. The number of phonemes in each category is shown at the top of the figure between the separation lines. The average value for each category is pictured as a thick horizontal line. In Section 4, we discuss further the consistency between the feature set, as shown in Figure 8, and the results from the k-means algorithm. 0.95 24 29 26 22 21 17 16 6 7 6 5 0.9 0.85 0.8 0.75 0.7 0.65 0.6 0.55 A B C D E F G H I J K Number of phonemes Figure 8. Mean normalized values of the 14 feature vectors in the 11 categories are shown as dots joined with lines, while thick horizontal lines depict the average values of the features in each category. Vertical lines simply separate the different categories, which are indicated by the letters at the bottom, while the figures at the top designate the number of phonemes in each category. Mean normalized values over 14 features Acoustics 2019, 1 844 The clustering results from the k-means algorithm for k = 9 are shown with a matching matrix in Table 1. The rows indicate the 11 ground truth categories and the columns the k-means output. More than 80% of the A - lambda category has been assigned to cluster 9, together with the 45% of the B - asymmetric lambda left. The majority of the instances in categories C - asymmetric lambda right and G - linear negative are assigned to cluster 3 (95% and 86%, respectively), showing some similarity in their feature vectors. Similarly, 77% of the D - M type, 81% of the E - M left, and 80% of the K - jump 2 categories are assigned to cluster 1. The two categories H - jump asymmetric lambda right and I - jump M left (67% for H and 100% for I), are placed in cluster 2. The category F - Linear positive is spread in clusters 4–7. In Figure 6 we present a dendrogram derived from the agglomerative clustering algorithm. In order to view the several clusters of this method, it is necessary to apply a threshold to a branch of the tree, for the sub-branches to form the respective clusters. With the threshold set at 0.4 we get 9 clusters, shown with different colors in Figure 6 (in the electronic version of the paper). The first five clusters are stemming from the same branch in the two higher levels of the tree and they contain mainly phonemes from the modulated group. The remaining four clusters are coming out from different branches of the tree and they contain phonemes mainly from the pitch jump category. Table 1. Matching matrix (k-means). Category/Cluster 1 2 3 4 5 6 7 8 9 A - Lambda 0 0 0 0 0 4 0 0 20 B - Asymmetric lambda left 0 0 0 1 0 2 13 0 13 C - Asymmetric lambda right 0 0 20 0 0 5 0 0 1 D - M type 17 0 3 0 0 0 0 0 2 E - M left 17 0 4 0 0 0 0 0 0 F - Linear positive 0 0 0 7 3 2 5 0 0 G - Linear negative 2 0 12 0 0 2 0 0 0 H - Jump asymmetric lambda right 0 4 1 0 0 1 0 0 0 I - Jump M left 0 7 0 0 0 0 0 0 0 J - Jump 1 0 0 0 0 0 0 0 5 1 K - Jump 2 4 1 0 0 0 0 0 0 0 4. Discussion In this work, we used unsupervised machine learning techniques to create clusters of phonemes that share similar characteristics. It is shown in Figure 7 that the mean frequency of the J category is statistically higher than the average frequency among all categories. In the matching matrix (Table 1), by comparing the k-means outputs to the ground truth, we can see that 83% of the phonemes of the J category fall in the 8th cluster. From the matching matrix, we also observe that categories D, E, and K are seen by the k-means mainly as one class. The mean durations of the phonemes for categories D and E are similar, although they are statistically higher than the overall mean duration (Figure 7). Looking at Figure 8, we observe very similar values for the normalized means across the 14 feature values for categories D, E, and K. Similarly, from the matching matrix we see that categories C and G fall into cluster 3. The normalized mean feature values of categories A, C, and G are also statistically similar. The highest mean feature value appears in category F, and the pitch jump categories appear to have relatively lower mean feature values in relation to the total mean value across all categories. The use of the dendrogram as a visualization and clustering technique adds some value to the understanding of the data and the hierarchy of the similarity between phonemes. It seems that if the threshold is placed at 0.39, then indeed, the agglomerative clustering returns results that are consistent with those of the k-means algorithm, as well as with those reported in the literature [3]. Regarding the size of the dataset used in this study, we have to remark that the number of phonemes belonging, especially, to the pitch jump group appears to be relatively small. These kinds of vocalizations were much rarer despite the fact that the collected vocalizations arose from several hours Acoustics 2019, 1 845 of recordings. Nevertheless, we claim that since our results are consistent with those reported in the literature, a richer dataset would have yielded similar results. Note also that despite the fact that data were recorded both in the audio and the ultrasonic spectrum, in this study we focused primarily on the ultrasonic range. The ultrasonic phonemes, in most cases, were mainly continuous whistles with a mean duration of 500 ms. The sounds with discontinuities (denoted as jumps) are attributed to the nonlinearities of the vocal fold system, pertaining to bifurcations. Thus, in our study, the precise representation of the frequency trace (contour) was of primary concern. This is the main reason we developed the mentioned contour extraction and processing method instead of using other freely available software, such as MUPET [23]. MUPET is a remarkable project, but its contour extraction process inspired by the workings of the human auditory system and symmetric and evenly distributed Gammatone filterbank (composed of 64 band-pass filters) that spans the whole mouse ultrasonic vocalization (USV) frequency range is not fully justified in our opinion (especially the fact that the upper and lower bounds of the frequency range is modeled by a smaller number of wider filters that are symmetric). Concluding Remarks Recordings of the repetitive ultrasonic monochromatic vocalizations of male mice in the presence of a female were made, and from these, phrases were manually isolated and pitch contours extracted. Two methods of unsupervised analysis were performed using 14 element feature vectors, providing evidence that vocalizations can be grouped into 9 clusters, which translates into the claim that there is a distinct repertoire of “syllables” or “phonemes”. Machine learning techniques such as k-means and agglomerative clustering proved suitable for clustering the different pitch patterns based on their reduced feature samples. This work supports the claims made by the scientific community concerning the existence of a distinctive set of mice ultrasonic vocalizations. Author Contributions: Conceptualization, S.K. and I.A.; Formal analysis, S.K. and A.N.; Investigation, S.K., A.N., I.A.; Methodology, S.K., A.N., I.A.; Resources, S.K., A.N., I.A.; Software, S.K., A.N.; Writing original draft, S.K., A.N. Funding: This research received no external funding. Acknowledgments: All applicable international, national, and/or institutional guidelines for the care and use of animals were followed. This article does not contain any studies with human participants performed by any of the authors. Conflicts of Interest: All authors declare that they have no conflict of interest. References 1. Dizinno, G.; Whitney, G.; Nyby, J. Ultrasonic vocalizations by male mice (mus musculus) to female sex pheromone: Experiential determinants. Behav. Biol. 1978, 22, 104–113. [CrossRef] 2. Gourbal, B.E.; Barthelemy, M.; Petit, G.; Gabrion, C. Spectrographic analysis of the ultrasonic vocalisations of adult male and female balb/c mice. Naturwissenschaften 2004, 91, 381–385. [CrossRef] [PubMed] 3. Holy, T.E.; Guo, Z. Ultrasonic songs of male mice. PLoS Biol. 2005, 3, e386. [CrossRef] [PubMed] 4. Hofer, M.A.; Shair, H. Ultrasonic vocalization during social interaction and isolation in 2-week-old rats. Dev. Psychobiol. 1978, 11, 495–504. [CrossRef] [PubMed] 5. Portfors, C.V.; Perkel, D.J. The role of ultrasonic vocalizations in mouse communication. Curr. Opin. Neurobiol. 2014, 28, 115–120. [CrossRef] [PubMed] 6. Kobayasi, K.I.; Ishino, S. Riquimaroux, H. Phonotactic responses to vocalization in adult Mongolian gerbils (Meriones unguiculatus). J. Ethol. 2014, 32, 7–13. [CrossRef] 7. Hoffmann, F.; Musolf, K.; Penn, D.J. Ultrasonic courtship vocalizations in wild house mice: Spectrographic analyses. J. Ethol. 2012, 30, 173–180. [CrossRef] 8. Ehret, G. Infant rodent ultrasounds—A gate to the understanding of sound communication. Behav. Genet. 2005, 35, 19–29. [CrossRef] [PubMed] Acoustics 2019, 1 846 9. Hahn, M.E.; Schanz, N. The effects of cold, rotation, and genotype on the production of ultrasonic calls in infant mice. Behav. Genet. 2002, 32, 267–273. [CrossRef] [PubMed] 10. Hammerschmidt, K.; Radyushkin, K.; Ehrenreich, H.; Fischer, J. Female mice respond to male ultrasonic songs with approach behaviour. Biol. Lett. 2009, 5, 589–592. [CrossRef] [PubMed] 11. Hoffmann, F.; Musolf, K.; Penn, D.J. Spectrographic analyses reveal signals of individuality and kinship in the ultrasonic courtship vocalizations of wild house mice. Physiol. Behav. 2012, 105, 766–771. [CrossRef] [PubMed] 12. D’amato F.R. Courtship ultrasonic vocalizations and social status in mice. Anim. Behav. 1991, 41, 875–885. [CrossRef] 13. Hanson, J.L.; Hurley, L.M. Female presence and estrous state influence mouse ultrasonic courtship vocalizations. PLoS ONE 2012, 7, e40782. [CrossRef] [PubMed] 14. Musolf K.; Meindl, S.; Larsen, A.L.; Kalcounis-Rueppell, M.C.; Penn, D.J. Ultrasonic vocalizations of male mice differ among species and females show assortative preferences for male calls. PLoS ONE 2015, 10, e0134123. [CrossRef] [PubMed] 15. Liu, R.C.; Miller, K.D.; Merzenich, M.M.; Schreiner, C.E. Acoustic variability and distinguishability among mouse ultrasound vocalizations. J. Acoust. Soc. Am. 2003, 114, 3412–3422. [CrossRef] [PubMed] 16. Kalcounis-Rueppell, M.C.; Petric, R.; Briggs, J.R.; Carney, C.; Marshall, M.M.; Willse, J.T.; Rueppell, O.; Ribble, D.O.; Crossland, J.P. Differences in ultrasonic vocalizations between wild and laboratory California mice (peromyscus californicus). PLoS ONE 2010, 5, e9705. [CrossRef] [PubMed] 17. Marler, P.R.; Slabbekoorn, H. Nature’s Music: The Science of Birdsong; Academic Press: Cambridge, MA, USA, 2004. 18. Kelley, D.B.; Tobias, M.L.; Horng, S.; Ryan, M. Producing and perceiving frog songs; Dissecting the neural bases for vocal behaviors in xenopus laevis. In Anuran Communication; Ryan, M.J., Ed.; Smithsonian Institution Press: Washington, DC, USA, 2001; pp. 156–166. 19. Au, W.W.; Pack, A.A.; Lammers, M.O.; Herman, L.M.; Deakos, M.H.; Andrews, K. Acoustic properties of humpback whale songs. J. Acoust. Soc. Am. 2006, 120, 1103–1110. [CrossRef] [PubMed] 20. Behr, O.; von Helversen, O. Bat serenades complex courtship songs of the sac-winged bat (saccopteryx bilineata). Behav. Ecol. Sociobiol. 2004, 56, 106–115. [CrossRef] 21. Busnel, R.G. Acoustic Behavior of Animals; Elsevier: Amsterdam, The Netherlands, 1964. 22. Gowda, K.C.; Krishna, G. Agglomerative clustering using the concept of mutual nearest neighbourhood. Patt. Recogn. 1978, 10, 105–112. [CrossRef] 23. Van Segbroeck, M.; Knoll, A.T.; Levitt, P.; Narayanan, S. MUPET-Mouse Ultrasonic Profile ExTraction: A Signal Processing Tool for Rapid and Unsupervised Analysis of Ultrasonic Vocalizations. Neuron 2017, 94, 465–485. [CrossRef] [PubMed] 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Journal

AcousticsMultidisciplinary Digital Publishing Institute

Published: Nov 4, 2019

Keywords: ultrasonic vocalizations; mice BALB/c biosignals; k-means clustering; bioacoustics

There are no references for this article.