Wei-Ying Wang Ph.D.

Data Scientist




About Me


I am Data Science Tech Lead in Wayfair, leading competitor matching team

My projects involves solving business problem and build scalable ML pipelines

I got a Ph.D. from Applied Mathematics in Brown University; my advisor is Prof. Stuart Geman.

My background is probability, statistics, and some mathematics.

I enjoy building and repair things, from algorithms to carpentry.



Data Visualization\Dimension Reduction

We have a simple algorithm to achieve data visualization and dimension reduction. The left plot is the original data point, they are roughly following a spiral structure. In the right plot, we achieve the visualization of the spiral data. The same algorithm can achieve data reduction too. Continue Reading…


Prototype Learning--Generalized K-means

We built an algorithm to find “linear structures” (like lines in the right plot) to summarize data. We characterize a property of the minimizer of our loss function and built an algorithm that approximate it really well. Continue Reading…


Toward the Maximum Compression Rate

We developed a lossless image compression algorithm that has analytic guarantee. When context size is getting bigger, our algorithm reaches the optimal compression rate a.e. under very light assumptions. We actually don’t need a lot of context size to outperform CALIC algorithm, one of the best scheme so far. Continue Reading…


Generating Shakespeare's Article

This is a deep learning (LSTM) model with Keras module on Python. The model generates Shakespeare’s text. The tutorial will help people to quickly understand Keras and LSTM model. Continue Reading…


Digit Recognizer with CNN

The tutorial describes deep learning (CNN) models on learning MNIST digits. Keras module allows user to fast setup the neural network structures without too much hassle. It is easy to get about 99% correction rate. Continue Reading…


SMS Spam detection

Detecting spam from SMS text message with NLP model. In this post, I compared 7 machine learning tools. Continue Reading…


Kaggle competition on NY Taxi

Predicting New York city taxi trip durations with Gradient Boost Machine. The data consists with 1.5 million of samples with features like pickup and dropoff coordinators, hours when pickup occurred, etc. Continue Reading…