Friday 26 February 2016

Text analysis with R, Python, and Spark, using the State of the Union Address and Congressional Hearings





Frank D. Evans, data scientist at Exaptive, provides a conceptual and technical look at text analysis on big data with open source tools.

His fodder is 70 years of State of the Union Addresses, from Truman to Obama, and 20 years of Congressional hearing transcripts. Using R, he performs text clustering to identify trends.

Using Python, he goes a step further to create topic models that identify commonalities and differences between presidencies. Then combining Python with Spark, he topic models a data set 2500 times as big.

He explains the statistics methods behind each analysis and show how he implemented them in code. He also explains how to produce plots or even some live data applications to let others explore the modeled data.

Be sure to visit our website to learn more about our Big Data Conference, speakers and workshops.

If you happen to see any of Frank’s work online in the future, we highly recommend taking a few minutes and investigating what he has to say. We’re sure you’ll learn a thing or two.

No comments:

Post a Comment