Projects

Sense Your city

As a child I suffered from asthma. Any person who has gone through similar experience becomes acutely aware of the importance of the quality of the air we are breathe. As I was looking for data related hackathons I came across this awesome dataset by DataCanvas. The dataset contains measurements such as temperature and dust level, collected by citizen's DIY sensors. Sense your city is a pretty exciting project put together by Data Canvas. I live in bay area now, so I got curious and started exploring the data from sensors located in San Francisco. Code is in github.

Spam Detection

The goal of this project is to classify email as spam or not. This project was a great learning experience in understanding how to apply the results from machine learning models in real life. For example, it is useful to distinguish spam from good emails. However it is even more important to ensure that good emails are never classified as spam. Code is in github.



Yummly

This is another interesting Kaggle project. Yummly website is a repository of recipes. The Kaggle data set consists of the ingredients in each recipe and the cuisine to which recipe belongs. Using NLP techniques, I processed the ingredients list and built a model to fit new data. This allows us to predict which cuisine the recipe belongs to based on which ingredients the recipe has. Code is in github.


Titanic

This was my first Kaggle project and one I am particularly proud of building. I enjoyed doing exploratory data analysis (EDA) on this data set. I also learned to apply various techniques for dealing with missing values. The data set has information about the passengers on the infamous ship wreck and indicates whether or not the passengers survived. Using this data I built a model that predicted the likelihood of survival from available features in the data. Code is in github.