The extractive summarization task involves identifying the most salient segments or utterances from meeting transcripts. The inherent disfluencies, speaker overlap and unstructured nature of spontaneous multi-party dialogues make the summarization problem more challenging compared to the text domain. In recent years researchers have used both supervised and unsupervised approaches to tackle this problem.
To the set off features used in the previously proposed baseline system, we added additional high level features like dialogue act tags, content words etc. The skewed data distribution problem had been researched previously and - in our two-step classification system - we experimented with the effect of resampling the input fed to the classifiers. The resampling is based on the probabilities assigned by a first stage classifier. Also, in absence of manual gold standard annotations we proposed an approach which uses the current annotation scheme for the AMI corpus.
An effort was made to research the effect of decision segments in the final evaluation. A separate decision module was integrated into the summarization system which assigns probability of being a decision statement, to a sentence. The segments were ranked based on their extract-worthiness in a final sentence selection scheme.