In the London research team we are lucky to have many world-class academic institutions on our doorstep. This includes University College London (UCL) which, according to the Research Excellence Framework of 2014, is the highest rated research institute in the UK.
Among its many strong points is its cutting-edge research in display advertising, led by Dr. Jun Wang in the Computer Science department. Jun has spent much of his research career focusing on online advertising, including display advertising.
In 2015 we approached Jun about the possibility of a collaboration between ourselves and UCL. As data scientists, we here at Adform are fortunate to have a vast wealth of complex real-time bidding display advertising data ready and waiting to be analyzed. Such data offers fresh and interesting research possibilities to academics and, luckily for us, Jun and his Ph.D. student, Weinan Zhang, were more than happy to apply their technical skills to our data.
The project, which lasted throughout the summer of 2015, primarily consisted of a series of research meetings in which Jun and Weinan would detail the progress they had made with their work, while we would provide the necessary technical details about the data that we have here at Adform. After the project Jun and his group successfully published their work, titled “Implicit Look-alike Modelling in Display Ads: Transfer Collaborative Filtering to CTR Estimation”, at the European Conference of Information Retrieval (ECIR).
What problem did they solve? Well, a key part of display advertising is building profiles of users, or cookies, via tracking of their online browsing habits. These profiles are then used in statistical models to model the expected response of a cookie to a given advert. Including cookie profiles in the models in this manner enables us to better deliver relevant adverts according to the cookies interests, thus providing higher targeting accuracy and improved advertising performance. Current user profiling methods include building keywords and topic tags or mapping users onto a hierarchical taxonomy. In their work, Jun and Weinan propose a general framework which learns the user profiles based on their online browsing behavior, and transfers the learned knowledge onto prediction of their ad response. Technically, they propose a transfer learning model based on the probabilistic latent factor graphic models, where the ad response profiles of users are generated from their online browsing profiles.
How did they solve it? A core part of their solution is the use of factorization machines. Factorization machines are a particular form of statistical model that were introduced by Steffen Rendle, then of the Institute of Scientific and Industrial Research, Osaka University, Japan, in 2010. Factorization machines are a more general re-expression of previously existing factorization models, such as matrix factorization, SVD++ and PITF. The model equation for a factorization machine of order 2 is given as,
in which represents the sigmoid function and , are model parameters and, is a hyperparameter that defines the dimensionality of the factorization. Among the benefits of a factorization machine are that it can be calculated in linear time and is amenable to standard optimization techniques, such as stochastic gradient descent. Since their introduction, factorization machines have been a powerful tool in the display advertising domain. For instance, they formed a core part of the winning solution of the Criteo display advertising challenge hosted on Kaggle.
In display advertising it is common to use one-hot encodings to represent individual and high-order interactions of categorical variables. While this approach provides a general class of models, in the sense that it is possible to learn the relationship of each of the individual interactions, it results in an extremely sparse feature representation that provides no generalization across the different interaction terms. This is an issue because there is often not enough data to estimate many of the interactions accurately. In factorization machines this problem is approached through the use of a factorized parametrization of the interactions, . This parametrization breaks the independence of the interaction model parameters, which allows factorization machines to generalize across interactions so that data for one interaction also helps to estimate the parameters for related interactions.
In their work Jun and Weinan build two factorization machine models, one to model cookie browsing behaviors, which they refer to as web browsing prediction, and one to model click through rates, which they refer to as advert response prediction. To leverage the modelling of cookie browsing histories into the click prediction model they propose a parameter tying between the two models. Given this parameter tying they then learn both models jointly through gradient ascent. Experiments based on the real-world data that we provided demonstrated significant improvement of their solution over some strong baselines.
- UCL ref scores - https://www.ucl.ac.uk/ref2014
- Jun Wang - http://web4.cs.ucl.ac.uk/staff/jun.wang/blog/
- Factorization machines - http://www.csie.ntu.edu.tw/~b97053/paper/Rendle2010FM.pdf
- Criteo display advertising challenge - https://www.kaggle.com/c/criteo-display-ad-challenge
- ECIR paper - http://arxiv.org/pdf/1601.02377v1.pdf