Machine Learning

Predicting the Vote Using Legislative Speech

As most dedicated observers of voting bodies like the U.S. Supreme Court can attest, it is possible to guess vote outcomes based on statements made during deliberations or questioning by the voting members. We show this is also possible to do automatically using machine learning, potentially providing a powerful tool to ordinary citizens. Our working hypothesis is that verbal utterances made during the legislative process by elected representatives can indicate their intent on a future vote, and therefore can be used to automatically predict said vote to a significant degree.

Computational Style Processing (Ph.D. Dissertation)

Our main thesis is that computational processing of natural language styles can be accomplished using corpus analysis methods and language transformation rules. We demonstrate this first by statistically modeling natural language styles, and second by developing tools that carry out style processing, and finally by running experiments using the tools and evaluating the results.

Digital Democracy Project: Making Government More Transparent one Video at a Time

The Digital Democracy platform obtains data about the legislative committee hearings: the video archives, the information about the state legislature and so on. Figure 1 shows the design of the DD system. The main source of information for the DD platform is the Cal Channel video archive of legislative sessions, a service provided courtesy of cable TV companies that operate in California.

Combining Corpus-Based Features for Selecting Best Natural Language Sentences

Automated paraphrasing of natural language text has many interesting applications from aiding in better translations to generating better and more appropriate style language. In this paper, we are concerned with the problem of picking the best English sentence out of a set of machine generated paraphrase sentences, each designed to express the same content as a human generated original. We present a system of scoring sentences based on examples in large corpora.

Taxonomy and Evaluation of Markers for Computational Stylistics

Currently, stylistic analysis of natural language texts is achieved through a wide variety of techniques containing many different algorithms, feature sets and collection methods. Most machine-learning methods rely on feature extraction to model the text and perform classification. But what are the best features for making style based distinctions? While many researchers have developed particular collections of style features – called style markers – no definitive list exists.

Automatic Synonym and Phrase Replacement Shows Promise for Style Transformation

Style transformation refers to the process by which a piece of text written in a certain style of writing is transformed into another text exhibiting a distinctly different style of writing without significant change to the meaning of individual sentences. In this paper we continue investigation into the linguistic style transformation problem and demonstrate current achievements in transformation on sample texts from a standard authorship attribution corpus.