Open Data Science Conference #ODSC: August 2016

Thursday, 18 August 2016

Data Science in Fantasy Football

With football season just three weeks away, millions of fantasy football players are obsessively preparing for their league drafts. They’re reading every guide on the web, participating in daily mock drafts, and engaging in superstitious rituals. Some players have gone beyond the traditional approach and have turned to data science to gain that competitive edge.

Fantasy sports are a perfect subject for data science, they’re stats-heavy and the main way to win is predicting which players will have great performances. Yet just because the two are a match doesn’t guarantee that your average data scientist will be able to conquer this multi-billion dollar industry.

Three years ago Boris Chen, then a data scientist at the New York Times, published a post detailing his method of bringing machine learning to fantasy football. His goal was coming up with a way to rank players at various positions. With data from fantasypros.com, he used a clustering algorithm called Gaussian mixture model to determine a new and improved system of rankings. Chen’s problem was that most rankings “do not illustrate the true distance between players”, which led him to choose this particular model because it’s designed to account for this issue.

The following chart shows the results of Chen’s algorithm on the Quarterback position by illustrating the tiers of the players in a clear manner.

To a casual fantasy football player, this graphic provides a simple guide to selecting a quarterback, especially for those players who aren’t obvious stars.

What if you’re an expert data scientist and you’re confident enough that you can build models that will not only win leagues but actual hard cash in a play-for-pay competition like FanDuel or DraftKings? You might want to tone down your expectations.

A Bloomberg article titled “You Aren’t Good Enough To Win Money Playing Daily Fantasy Football” bluntly explains why you should get your hopes: “Only the top 1.3 percent of players finished in the green during the three months measured by the Sport Business Journal. An unrelated survey of more than 1,400 fantasy sports players conducted by Krejcik of Eilers Research this summer found that 70 percent of participants have lost money.”

Though it’s unlikely you’ll be able to quit your day job and become a full time “Fantasy Football Data Scientist”, you’ll just have to settle for bragging rights over friends and coworkers.

Find this article useful?

Learn more about this topic and others just like it at upcoming Data Science Conferences this year.

Applying Machine Learning and Science to Trading Decisions

Sean Kruzel, founder and C.E.O of Astrocyte Research spoke at the most recent Boston ODSC Meetup about using Machine Learning in the investment industry. His first key point stressed that the type of data used in finance is different than that usually used in Machine Learning.

The latter uses 'stale' and static temporal data while the former uses evolving data with an adversarial component due to the nature of the markets. Thus the scientific process in finance is harder to implement, and relies on prior beliefs and flexibility.

A lot of Machine Learning investments come from Roboadvisors. While this product is very scalable, it is overly simplistic with the use of traditional techniques like linear regression to estimate factors like risk and return.

For one, using correlations only exposes superficial relationships within stock data. More importantly, using decades of historical stock data to make future predictions is flawed as the forces driving the market years ago may have no relationship to today's catalysts.

How then is Machine Learning used in finance? Three sections where it can be applied included the evaluation of portfolio managers, deciding when to enter or exit trades, and converting forecasts into investments. A Bayesian Machine Learning framework works well in tandem with a traditional investment metric like the Sharpe Ratio, the ratio of expected return to expected risk.

In evaluating portfolio managers the target Sharpe Ratio and frequency of evaluation depends on whether one is an asset or hedge fund manager. The key task is a classification problem to calculate the probability that a portfolio manager has a given Sharpe Ratio within a certain time interval.

Replacing the Sharpe Ratio with the Bayes Factor in trade entries and exits means that traditional methods like tracking losses over time and allocating a dollar value to risk can be enhanced by a Bayesian classification framework. The process in evaluating forecasts is similar.

There are more applications to speak of, and more will soon join the fold. In the future Machine Learning could even be applied to automated trade idea generation.

Find this article helpful?

Learn more about this topic and others just like it at ODSC's Machine Learning Conference in California this year.

Wednesday, 17 August 2016

Open Science in Government

The Open Data Science Conference Chicago Meetup recently welcomed Tom Shchenk, Chief Data Officer of the city of Chicago, to speak about how his office works with open data, and how it is improving the lives of citizens. Of the initiatives developed to promote open data, the one called Initiative C stands out most clearly. It's goal is to leverage technology to make government more efficient, effective, and open.

One success of this initiative is the presence of bus data on Chicago's open data portal. The presence of this data represents a possible root for projects. In this instance, a platform was developed to determine traffic congestion using data from the G.P.S devices located on each city bus.

Although this information and much more is available on the city's open data website, it is even more convenient to access with API's. Thus there are packages available to provide this access point. Public contribution is encouraged by making these libraries open source.
The invitation to participate goes even further.

Analysis done by the city on issues such as water quality and food inspections are also publicly available. Since the data this analysis is built on is on the open data portal, interested individuals can pick apart the analysis from data exploration and munging to model building. Comments and criticism are welcome in the form of words or even a citizen's own model.

This is just overview of how the city of Chicago is working to make itself more data-driven to reap the benefits that come from this stance.

Tuesday, 16 August 2016

Open Source Bioinformatics for Data Scientists

Last night was the ODSC Launch Meetup covering Open-Source Bioinformatics for Data Scientists with Amanda Schierz.

Amanda Schierz is a widely recognised data scientist and currently the #1 Kaggler in the United Kingdom. The Meetup was really informative and engaging and also a great community of people were involved. It was such a change to the standard Meetups I usually attend.

The reason that this was so different from any other Data Science Meetup I have been to, was it was heavily focused on the medical and health fields. The focus of this talk was on Early-stage Drug Discovery, an ideal area where predictive analytics can have a profound impact. She presented multiple use cases such as druggability predictions, network analysis to explain drug resistance, and the prediction of candidate chemical compounds.

Amanda talked a lot about Cancer research and the battle that data scientists and researches have to communicate the information. It's often that case that people in medical fields don't trust the data as they are used to doing tests on animals or live subjects to get results.

What Amanda was pushing was that we can now use technology to our advantage to no longer have to test on animals or humans. We can now predict things in any direction with multiple outcomes simply by understanding the data created which was fascinating.

About the event itself, ran a great event in a very cool startup hub located in East London called the Rise. The venue is almost secret unless you know about it hidden behind a hipster eatery and up a set of stairs. Once in the space it's massive and has a great vibe full of startup teams and entrepreneurs.

The data science networking was great as it was the launch of ODSC so brand new faces and businesses. As it was a health related talk lots of medical companies and research groups were there as well which was a change from the usual Data Science crowd. All in all a really great night and the potential to Meetup outside with some like minded individuals i may want to partner with on projects.