AL3: Graph kernels under uncertainty

An important and growing branch of machine learning deals with graph-structured data. Typical applications ask to classify data in areas like chemo–informatics, e.g., to predict the toxicity of a molecule. In this area, graph kernels form a rapidly developing set of techniques that are well suited for such tasks. One reason for their popularity is that they allow the use of welldeveloped kernel methods. Many efficient graph kernels have been developed over the past. A particularly efficient algorithm is the Weisfeiler–Lehman kernel that scales well with growing inputs. This kernel is based on isomorphism testing and plays an important role in descriptive complexity. Recently, it was demonstrated that it can be exploited to perform static code analysis. An important shortcoming of the kernel is, however, that it requires discrete data that is not allowed to contain any form of uncertainty. In fact, the kernel can neither handle noisy nor incomplete data.

This dissertation project aims to develop efficient robust kernels that allow for uncertain input data. Existing robust graph kernels that are applicable to uncertain data do not scale as well as the Weisfeiler–Lehman kernel. We aim to develop a robust kernel that is as efficient as the Weisfeiler–Lehman kernel. In general, the issues arising when applying the kernel to uncertain data are related to those occurring in the context of the robust graph isomorphism problem—given two graphs which are almost isomorphic, find an “almost- isomorphism”. This problem asks for maps between graphs that preserve most of the graph structure. Two-dimensional versions related to matrix multiplication open a direction to combat uncertain data via the route of algebraic operations. The plan of this research project is to develop new graph kernels that are robust under uncertain data by incorporating randomisation into the Weisfeiler–Lehman graph kernel framework, and generalising this framework to the two- dimensional setting.

The second aim of the dissertation project is to develop robust kernels for continuous and uncertain data. There are means to handle continuous data and there exist generic methods to turn discrete kernels into continuous ones. While at first sight it appears that there is an intrinsic need for discrete data in the Weisfeiler–Lehman kernel, recent work shows how randomisation and hashing can be exploited to transform the seemingly intrinsic discrete kernel into a kernel suitable for continuous data. This dissertation project plans to improve randomisation techniques to allow continuous inputs to graph kernels.