Friday 26 February 2016

Text analysis with R, Python, and Spark, using the State of the Union Address and Congressional Hearings





Frank D. Evans, data scientist at Exaptive, provides a conceptual and technical look at text analysis on big data with open source tools.

His fodder is 70 years of State of the Union Addresses, from Truman to Obama, and 20 years of Congressional hearing transcripts. Using R, he performs text clustering to identify trends.

Using Python, he goes a step further to create topic models that identify commonalities and differences between presidencies. Then combining Python with Spark, he topic models a data set 2500 times as big.

He explains the statistics methods behind each analysis and show how he implemented them in code. He also explains how to produce plots or even some live data applications to let others explore the modeled data.

Be sure to visit our website to learn more about our Big Data Conference, speakers and workshops.

If you happen to see any of Frank’s work online in the future, we highly recommend taking a few minutes and investigating what he has to say. We’re sure you’ll learn a thing or two.

Monday 22 February 2016

Data Science Conference - ODSC


Welcome to our brand new blog! Here you will find some of the biggest and brightest minds sharing what's hot in the Data Science community. We will be posting great content weekly so be sure to check back soon to see what we have for you!

For more information visit the link  Data Science Conference

 ODSC - Open Data Science Conference - is essential for anyone who wants to connect to the data science community and contribute to the open source applications they use everyday. 

Our goal is to bring together and cultivate the global data science community to help foster the exchange of innovative ideas and encourage the growth of open source software.