As most dedicated observers of voting bodies like the U.S. Supreme Court can attest, it is possible to guess vote outcomes based on statements made during deliberations or questioning by the voting members. We show this is also possible to do automatically using machine learning, potentially providing a powerful tool to ordinary citizens. Our working hypothesis is that verbal utterances made during the legislative process by elected representatives can indicate their intent on a future vote, and therefore can be used to automatically predict said vote to a significant degree.
Our main thesis is that computational processing of natural language styles can be accomplished using corpus analysis methods and language transformation rules. We demonstrate this first by statistically modeling natural language styles, and second by developing tools that carry out style processing, and finally by running experiments using the tools and evaluating the results.
The Digital Democracy platform obtains data about the legislative committee hearings: the video archives, the information about the state legislature and so on. Figure 1 shows the design of the DD system. The main source of information for the DD platform is the Cal Channel video archive of legislative sessions, a service provided courtesy of cable TV companies that operate in California.
Automated paraphrasing of natural language text has many interesting applications from aiding in better translations to generating better and more appropriate style language. In this paper, we are concerned with the problem of picking the best English sentence out of a set of machine generated paraphrase sentences, each designed to express the same content as a human generated original. We present a system of scoring sentences based on examples in large corpora.
Currently, stylistic analysis of natural language texts is achieved through a wide variety of techniques containing many different algorithms, feature sets and collection methods. Most machine-learning methods rely on feature extraction to model the text and perform classification. But what are the best features for making style based distinctions? While many researchers have developed particular collections of style features – called style markers – no definitive list exists.
Style transformation refers to the process by which a piece of text written in a certain style of writing is transformed into another text exhibiting a distinctly different style of writing without significant change to the meaning of individual sentences. In this paper we continue investigation into the linguistic style transformation problem and demonstrate current achievements in transformation on sample texts from a standard authorship attribution corpus.