Deconstructing Human Assisted Video Transcription and Annotation for Legislative Proceedings

Legislative proceedings present a rich source of multidimensional information that is crucial to citizens and journalists in a democratic system. At present, no fully automated solution exists that is capable of capturing all the necessary information during such proceedings. Even if professional-quality automated transcriptions existed, other tasks such as speaker or rhetorical position identifications are not fully automatable. This work focuses on improving and evaluating the transcription software used by the Digital Democracy initiative, named Transcription Tool.

An Empathetic AI Coach for Self-Attachment Therapy

In this work, we present a new dataset and a computational strategy for a digital coach that aims to guide users in practicing the protocols of self-attachment therapy. Our framework augments a rule-based conversational agent with a deep-learning classifier for identifying the underlying emotion in a user’s text response, as well as a deep-learning assisted retrieval method for producing novel, fluent and empathetic utterances. We also craft a set of human-like personas that users can choose to interact with.

Multimodal speaker identification in legislative discourse

A first-of-its-kind platform, Digital Democracy1 offers a searchable archive of all statements made in US state legislative hearings in four American states (California, New York, Texas and Florida) covering one third of the US population. The purpose of the platform is to increase government transparency in state legislatures. It allows citizens to follow state lawmakers, lobbyists, and advocates as they debate, craft, and vote on policy proposals. State hearings in the U.S. are typically recorded on video and broadcast on cable TV stations, but they are not transcribed or indexed.

Combining Corpus-Based Features for Selecting Best Natural Language Sentences

Automated paraphrasing of natural language text has many interesting applications from aiding in better translations to generating better and more appropriate style language. In this paper, we are concerned with the problem of picking the best English sentence out of a set of machine generated paraphrase sentences, each designed to express the same content as a human generated original. We present a system of scoring sentences based on examples in large corpora.

Taxonomy and Evaluation of Markers for Computational Stylistics

Currently, stylistic analysis of natural language texts is achieved through a wide variety of techniques containing many different algorithms, feature sets and collection methods. Most machine-learning methods rely on feature extraction to model the text and perform classification. But what are the best features for making style based distinctions? While many researchers have developed particular collections of style features – called style markers – no definitive list exists.

Automatic Source Attribution of Text: A Neural Networks Approach

Recent advances in automatic authorship attribution have been promising. Relatively new techniques such as N-gram analysis have shown important improvements in accuracy. Much of the work in this area does remain in the realm of statistics best suited for human assistance rather than autonomous attribution. While there have been attempts at using neural networks in the area in the past, they have been extremely limited and problem-specific 171.