**Our group's research currently focuses on the following topics:**

### Deciphering the epigenetic basis of resilience

This study examines the molecular mechanisms regulating neuronal function in stress-susceptible and resilient mice following chronic social defeat (CSD) by comparing the transcriptomes of activated cells in selected brain regions using RNA-seq. In addition, using the assay for transposase-accessible chromatin followed by high-throughput sequencing (ATAC-seq) and bisulfite sequencing (Bis-seq), we will monitor the global changes in chromatin accessibility and DNA methylation signatures that alter the functional states of neurons in response to stress.

**People involved**: Tamer Butto, Kanak Mungikar

**Collaboration partner: Dr. Jennifer Winter, Kristina Endres**

**Funding: SFB 1193 Resilience (http://crc1193.de/)**

## BIG data integration of genetic and epigenetic variations in neurodegenerative diseases

To gain a better understanding of the global mechanisms underlying neurodegeneration we use supercomputing facilities and recently developed High-Performance Computing methods in multivariate Genome Wide Association Studies

(GWAS) for the extraction of global patterns. Analysis includes genetic as well as epigenetic and transcriptional aspects, underlying neurodegenerative diseases i.e. Alzheimer’s, Parkinson’s and Huntington’s disease. Via a trans-Omics evaluation followed by in silico modeling we hope to extract core (biochemical) networks across multiple omic-layers (Genome, Transcriptome, Methylome).

In other words, the goal of this project is to integrate data from different sources (genomes, transcription datasets, DNA methylation and others) in order to determine which information is most relevant in order to explain the phenotypes observed in neurodegenerative diseases such as Alzheimer’s Disease. This will hopefully lead to the identification of genes or pathways which are involved in disease formation. Available genomes of control and Alzheimer patients (Alzheimer sequencing project) will help to find variations in coding regions, promoters and regulatory regions (whether SNP or larger variants). RNA-seq will provide both

information about expression and possible mutations in structural expressed regions of the genome. As such, noncoding RNA could be of special focus as mutations in proteincoding regions are rare. Epigenetic data will finally bring information on possible regulatory mechanisms of genes.

**People involved**: Susanne Klingenberg, Anna Wierczeiko, Kanak Mungikar

## Genomic features involved in pathology of neurodegenerative diseases

The goal of this project is to integrate data from different sources (genomes, transcription datasets, DNA methylation and others) in order to determine which information is most relevant in order to explain the phenotypes observed in neurodegenerative diseases such as Alzheimer. This will hopefully lead to the identification of genes or pathways which are involved in disease formation. Available genomes of control and Alzheimer patients (Alzheimer sequencing project) will help to find variation in coding regions, promoters and regulatory regions (whether SNP or larger variants). Transcriptomics data will help to know if the genomic variations have an impact on the genes functions - in different regions of the brain. RNA-seq will provide both information about expression and possible mutations in structural expressed regions of the genome. As such, non-coding RNA could be of special focus as mutations in protein-coding regions are rare. Epigenetic data will finally bring information on possible regulatory mechanisms of genes.

**People involved**: Stanislav Sys

**Collaboration partner: Prof. lllia Horenko**

**Funding:** Center of Computational Sciences

## On computation of relations between categorical data sets and application to genomics

Inference of relations between categorical data sets is an important problem in many areas of computational biology and bioinformatics. One of the essential problems hereby emerging - and frequently leading to strongly-biased and even completely wrong results and interpretations - is induced by the multiscale character of the biological systems, manifested in a presence of latent/unresolved variables, as well as in the issue of a model error - resulting from (may be wrong) a priori mathematical assumptions about the underlying processes. Combining tools and concepts from information theory with the exact law of the total probability, we derive a multiscale relation measure that - in terms of the underlying mathematical assumptions - is less restrictive and allows to infer an eventual impact from the latent variables. At the same time, it has a same leading order of the computational complexity (linear in the size of the data statistics) as the standard relation measures. The application the introduced measure the analysis of single nucleotide polymorphisms (SNPs) in a part of the human genome reveals more complex relation patterns than implied by standard measures - thereby providing an indication that many of the interpretations deduced from the genomic data may be resulting from the bias induced by too restrictive underlying mathematical assumptions of the standard measures. Allowing for a systematic distinction between different dependence and independence combinations of category relations in the data, the proposed measure opens new possibilities for Genome Wide Association Studies (GWAS).

**Collaboration partner**: Prof. lllia Horenko

**Related Publications:**

**Funding:** **Center of Computational Sciences**

## Improving clustering by imposing network information

Cluster analysis is one of the most popular data analysis tools in a wide range of applied disciplines. We propose and justify a computationally efficient and straightforward-to-implement way of imposing the available information from networks/graphs (a priori available in many application areas) on a broad family of clustering methods. The introduced approach is illustrated on the problem of a noninvasive unsupervised brain signal classification. This task is faced with several challenging difficulties such as non-stationary noisy signals and a small sample size, combined with a high-dimensional feature space and huge noise-to-signal ratios. Applying this approach results in an exact unsupervised classification of very short signals, opening new possibilities for clustering methods in the area of a noninvasive brain-computer interface.

**Collaboration partner: Prof. Illia Horenko**

**Related Publication**: S. Gerber and I. Horenko: Improving clustering by imposing network information. **Science Advances** 1(7), e1500163; DOI: 10.1126/sciadv.1500163 (2015)

**Inference of causality for discrete state models in a multiscale context**

Discrete state models are a common tool of modeling in many areas. E.g., Markov state models as a particular representative of this model family became one of the major instruments for analysis and understanding of processes in molecular dynamics (MD). Here we extend the scope of discrete state models to the case of systematically missing scales, resulting in a nonstationary and nonhomogeneous formulation of the inference problem.

We demonstrate how the recently developed tools of nonstationary data analysis and information theory can be used to identify the simultaneously most optimal (in terms of describing the given data) and most simple (in terms of complexity and causality) discrete state models. We apply the resulting formalism to a problem from molecular dynamics and show how the results can be used to understand the spatial and temporal causality information beyond the usual assumptions. We demonstrate that the most optimal explanation for the appropriately discretized/coarse-grained MD torsion angles data in a polypeptide is given by the causality that is localized both in time and in space, opening new possibilities for deploying percolation theory and stochastic subgridscale modeling approaches in the area of MD.

**Collaboration partner: Prof. lllia Horenko**

**Related Publication:** Susanne Gerber and Illia Horenko: On inference of causality for discrete state models in multiscale context. **PNAS** 111 (41) 14651-14656, doi:10.1073/pnas.1410404111,(2014)

## Thermodynamic Modeling of Cation homeostasis

Metals, and particularly their positively charged ions (cations), are an integral part of our environment, and all living organisms are exposed to metals in their natural habitat. Even though significant efforts have already been made by experimental and theoretical analysis of the individual components of transport systems and individual transport-mechanisms, such efforts did not result in an integration of the highly connected and complex system.

The development of kinetic networks might well contribute to the understanding and visualization of cation homeostasis. However, such kinetic systemic analysis would require more detailed biochemical information than is currently available. We circumvented this problem by using an entirely phenomenological approach of the theory of non-equilibrium thermodynamics. The methodology does not require the detailed understanding of structure, function or kinetic parameters of individual constituents of the system but produces some unique parameters related to thermodynamic couplings between different ion fluxes and ATP consumption. These estimated phenomenological constants combine the kinetic parameters and transport coefficients and control the coupling of fluxes. The model predictions are in good agreement with the biological understanding of the roles of the transporter proteins. Our modeling approach might contribute to the development of new diagnostic and therapeutic purposes with cation-homeostasis as key-target.

**Collaboration partner: Prof. Edda, Klipp, Prof. Sergey Shaballa **