Are you trying to implement a machine learning algorithm to classify documents? Need to determine the intent of a sentence to use in a chatbot? You might be asking yourself the same question. How do I convert text into a form that my machine learning algorithm can use? In the following post we will go over a simple to use model to convert sentences into vectors called the Bag of Words model. We will implement this algorithm in python from scratch and then we will use Scikit learns built in functions to vectorize sentences.

Read MoreFeature selection is an important part of building machine learning models. As the saying goes, garbage in garbage out. Training your algorithms with irrelevant features will affect the performance of your model. Also known as variable selection or attribute selection, choosing or engineering new features is often what separates the best performing models from the rest.

Read MoreBuilding neural networks is a complex endeavor with many parameters to tweak prior to achieving the final version of a model. On top of this, the two most widely used numerical platforms for deep learning and neural network machine learning models, TensorFlow and Theano, are too complex to allow for rapid prototyping. The Keras Deep Learning library for Python helps bridge the gap between prototyping speed and the utilization of the advanced numerical platforms for deep learning. Keras is a high-level API for building neural networks that run on top of TensorFlow, Theano or CNTK. It allows for rapid prototyping, supports both recurrent and convolutional neural networks and runs on either your CPU or GPU for increased speed.

Read MoreIf you have been using machine learning, you will sooner rather than later realize that machine learning algorithms require numerical inputs. Unlucky for us, our features will come in various forms. Some will be continuous, others categorical in numeric or text format. Machine learning algorithms cannot work with variables in text form, we must perform certain preprocessing steps to get our data in the right format. How do we deal with these categorical variables? Worry no more! In this blog post I will explain how to deal with these categorical variables by using a technique known as one hot encoding.

Read MoreProbability distributions are a powerful tool to use when modeling random processes. They are widely used in statistics, simulations, engineering and various other settings. I have had to use them in various projects to correctly model randomness. There are many probability distributions to choose, from the well-known normal distribution to many others such as logistic and Weibull. The common problem I have continuously faced is having an easy to use tool to quickly fit the best distribution to my data and then use the best fit distribution to generate random numbers. Once again Python shows its flexibility for data science with its SciPy package, one of the main Python packages for mathematics, science and engineering. We will be using the SciPy package to tackle this task.

Read MoreIn the previous post we discussed the theory and history behind the perceptron algorithm developed by Frank Rosenblatt. Even though this is a very basic algorithm and only capable of modeling linear relationships, it serves as a great starting point to understanding neural network machine learning models. In this post, we will implement this basic Perceptron in Python.

Read MoreThe perceptron is a supervised learning algorithm used for binary classification. It is one of the oldest algorithms used in machine learning going back to the 1950’s which has been the inspiration to many state of the art algorithms used today.

Read MoreMachine learning is big, its growing and it’s here to stay.

Read More
C#
chatbots
elasticsearch
Feature Selection
Forecasting
google
Keras
Machine Learning
Management
Microsoft
Neural Networks
NLP
Object Detection
OneHotEncoding
Perceptron
Power BI
Preprocessing
Probability Distribution
Prophet
PySpark
Python
Regression
Regression Analysis
Rest API
RNN
Scala
Sentiment Analysis
Spark
Statistics
Tensorflow
Time Series
Twitter
Ubuntu
Web Scraping
Word Embedding
Word2Vec
Zeppelin