Open Data Science Conference #ODSC: The Library of Python Data Sets

Python’s move towards being a language for data analysis has seen it copy many features from R, a language that was designed for dealing with data.

Such features include the dataframe, the statsmodels package for building linear models, and the Python port of ggplot2. The latest addition to this list is the PyDataset package, a resource modeled on the data sets that come pre-packaged with R.

7 Mistakes to Avoid in Machine Learning

It’s fairly straightforward to get started with Machine Learning due to the availability of several superb open source APIs. However, mastery in the subject can only be achieved by adding profundity to one’s knowledge.

One such facet involves learning how to deal with the assumptions and drawbacks of the various algorithms being used. In a post for KDnuggets, Ex-Google engineer Cheng-Tao Chu goes into seven mistakes to avoid for the aspiring Machine Learning expert.

Among his seven points, Chu talks about picking a suitable evaluation metric for your model that fits the domain in which it is being applied, being cognizant of and dealing with outliers carefully, and avoiding models which tend to overfit when dealing with data where the number of features outnumbers the number of data points.

Find this article useful?

Help others find it by sharing and commenting below.

Learn more about our Data Science Conference, speakers and workshops. Hurry some discount tickets are still available.

Open Data Science Conference #ODSC

Wednesday, 9 March 2016

The Library of Python Data Sets

No comments:

Post a Comment