® PhD Thesis

Research, Innovation

  • Title: Analysis and Generative Model for Expressivity. Applied to Speech and Musical Performance.
  • Author: Gregory Beller
  • Supervisor: Xavier Rodet
  • Academic: Paris VI University – IRCAM, Sound Analysis and Synthesis Team.
  • Examining Board
    • Gérard Bailly rapporteur GIPSA-lab
    • Christophe D’Alessandro examiner LIMSI-CNRS
    • Laurence Devillers rapporteure LIMSI-CNRS
    • Thierry Dutoit examiner TCTS
    • Axel Roebel examiner IRCAM
    • Xavier Rodet supervisor IRCAM
    • Jean-Luc Zarader examiner ISIR
  • Defense: Wednesday, June 24 2009, at IRCAM, Paris
  • Keywords: Emotions, expressivity, artistic performance, musical performance, speech, prosody, speech signal transformation, generative model, machine learning, bayesian network.
  • Download the Thesis in French

This thesis joins in the current researches on the feelings and the emotional reactions, on the modelling and the transformation of the speech, as well as on the musical performance. It seems that the capacity to express, to feign and to identify emotions, humors, intentions or attitudes, is fundamental in the human communication. The ease with which we understand the state of a character, from the only observation of the behavior of the actors and the sounds which they utter, shows that this source of information is essential and, sometimes, sufficient in our social relationships. If the emotional state presents the peculiarity to be idiosyncratic, that is private to every individual, it does not also go away of the associated reaction which shows itself by the gesture (movement, posture, face), the sound (voice, music), and which, it is observable by others.

That is why paradigm of analysis-transformation-synthesis of the emotional reactions grows on into the therapeutic, commercial, scientific and artistic domains. This thesis joins in these last two domains and proposes several contributions. From a theoretical point of view, this thesis proposes a definition of the expressivity, a definition of the neutral expressivity, a new representation mode of the expressivity, as well as a set of expressive categories common to the speech and to the music. It places the expressivity among the census of the available levels of information in the performance which can be seen as a model of the artistic performance. It proposes an original model of the speech and its constituents, as well as a new hierarchical prosodic model.

From an experimental point of view, this thesis supplies a protocol for the acquisition of performed expressive data. Collaterally, it makes available three corpora for the observation of the expressivity. It supplies a new statistical measure of the degree of articulation as well as several analysis results concerning the influence of the expressivity on the speech.

From a technical point of view, it proposes a speech processing algorithm allowing the modification of the degree of articulation. It presents an innovative database management system which is used, already, by some other automatic speech processing applications, requiring the manipulation of corpus. It shows the establishment of a bayesian network as generative model of context dependent transformation parameters. From a technological point of view, an experimental system of high quality transformation of the expressivity of a French neutral utterance, either synthetic or recorded, has been produced, as well as a non-line interface for perceptive tests.

Finally and especially, from a forward-looking point of view, this thesis proposes various research tracks for the future, both on the theoretical, experimental, technical, and technological aspects.

Among these, the confrontation of the demonstrations of the expressivity in the speech and in the musical performance seems to be a promising way.