May, 15th 2018

Hi!

I'm Federico Marini, Virchow Fellow @CTH Mainz/IMBEI

PhD in Biostatistics/Bioinformatics @IMBEI:
Development of applications for interactive and reproducible data analysis

\(\rightarrow\) Methods and tools to maximize information extraction and knowledge transfer, strengthen translational interactions




You can find this presentation here: https://federicomarini.github.io/erum2018/
@FedeBioinfo

Background

  • large amounts of complex datasets - everywhere!
  • lack of analytical skills: data understanding << data generation

Wishlist for accessible and robust data analyses:

  • Comprehensiveness
  • Interactivity (empowers the domain expert \(\rightarrow\) better insights)
  • Reproducibility (re-performing the same analysis with the same code)

Enabling transparency, independent verification, standing on the shoulder of giants

Particularly true in the field of Bioinformatics!

RNA-seq

High-dimensional snapshot of the transcriptomic activity

RNA-seq: genes \(\times\) samples tables
Aim: identification of differentially expressed genes, gene signatures, many more

Platelets transcriptomics

Exploratory Data Analysis (EDA) + Differential Expression (DE) analysis \(\rightarrow\) analyze, visualize, integrate

  • thrombocytes: anucleated, yet carrying a vast repertoire of transcripts
  • scenarios: alteration of thrombin signaling, crosstalk with tumor development

🙏 The Bioconductor project makes these task possible for many researchers!

My contributions:

\(\rightarrow\) Web-based applications enabling interactivity and reproducibility

Under the hood

pcaExplorer and ideal

Motivation & aim

  • general downweighting of the importance of data exploration (essential step!)
  • lack of something to do this interactively (and fully integrated in Bioconductor)
  • transparency & help for future self was also needed \(\rightarrow\) automated reporting & state saving!
  • empower the domain expert (extensively used by our coop partners): not just list, but all the way down to functional interpretation
  • usable platform for reproducible analysis, interactive tours for self-paced learning (via rintrojs)

Workflow overview & quick demo

iSEEing is believing

Single cell RNA-seq, a.k.a. Data exploration has probably never been so important

scRNA-seq: genes \(\times\) cells
Aim: Identification of cell subpopulations, description of developmental trajectories

  • time consuming task + many iterations needed
  • how did I do this plot?

iSEE, interactive SummarizedExperiment Explorer: concept made in December 2017 @Bioc conference (joint work with Aaron Lun, Charlotte Soneson, Kevin Rue-Albrecht)

Wishlist: interactivity, linked information across plots, reproducibility, hammer for many nails

A few months (and about 1000 commits) later…

Ready for a demo?

# now in Bioc 3.7!
source("https://bioconductor.org/biocLite.R")
biocLite("iSEE")
library("scRNAseq")
data("allen")
library("scater")
sce <- as(allen, "SingleCellExperiment")
counts(sce) <- assay(sce, "tophat_counts")
sce <- normalize(sce)
sce <- runPCA(sce)
sce <- runTSNE(sce)
library("iSEE")
iSEE(sce)
# couple of genes to check: (Zeisel, Science 2015; Tasic, Nature Neuroscience 2016)
# Tbr1 (TF required for the final differentiation of cortical projection neurons);
# Snap25 (pan-neuronal); 
# Rorb (mostly L4 and L5a); 
# Foxp2 (L6)

Or just visit https://github.com/federicomarini/erum2018

Outlook

Flexibility + Customizability \(\rightarrow\) Accompanying publications as live browser!

Summary

  • Main aim: Making analyses and exploration accessible to a wide range of scientists, especially domain experts, while still upholding the high standards in guaranteeing reproducible research steps
  • real practical companions for any (RNA-seq/omics) dataset, fully integrated in Bioconductor
  • ❤️ bug reports and ❤️❤️❤️ pull requests!
  • Fellowship output focus on Thrombosis & Hemostasis, enabling analysis workflows of high-dimensional data: platelet-o-pedia portal as unique landing point, lower barriers to genomic data exploration - these apps as cornerstones
  • Proper applications & software enable data literacy for many researchers - naturally encouraging networking possibilities + common ground to promote openness of science

Acknowledgements & Links