Convolutional neural networks for structured omics: OmicsCNN and the OmicsConv layer (1710.05918v1)
Abstract: Convolutional Neural Networks (CNNs) are a popular deep learning architecture widely applied in different domains, in particular in classifying over images, for which the concept of convolution with a filter comes naturally. Unfortunately, the requirement of a distance (or, at least, of a neighbourhood function) in the input feature space has so far prevented its direct use on data types such as omics data. However, a number of omics data are metrizable, i.e., they can be endowed with a metric structure, enabling to adopt a convolutional based deep learning framework, e.g., for prediction. We propose a generalized solution for CNNs on omics data, implemented through a dedicated Keras layer. In particular, for metagenomics data, a metric can be derived from the patristic distance on the phylogenetic tree. For transcriptomics data, we combine Gene Ontology semantic similarity and gene co-expression to define a distance; the function is defined through a multilayer network where 3 layers are defined by the GO mutual semantic similarity while the fourth one by gene co-expression. As a general tool, feature distance on omics data is enabled by OmicsConv, a novel Keras layer, obtaining OmicsCNN, a dedicated deep learning framework. Here we demonstrate OmicsCNN on gut microbiota sequencing data, for Inflammatory Bowel Disease (IBD) 16S data, first on synthetic data and then a metagenomics collection of gut microbiota of 222 IBD patients.
- Giuseppe Jurman (32 papers)
- Valerio Maggio (4 papers)
- Diego Fioravanti (3 papers)
- Ylenia Giarratano (5 papers)
- Isotta Landi (9 papers)
- Margherita Francescatto (2 papers)
- Claudio Agostinelli (27 papers)
- Marco Chierici (4 papers)
- Manlio De Domenico (81 papers)
- Cesare Furlanello (26 papers)