class: center, middle, title-slide #
iSEE
: RNA-sequencing data exploration made easy and reproducible ### Federico Marini (
marinif@uni-mainz.de
) ### 2019/03/21 DAGStat Conference 2019 - Munich#dagstat2019 --- class: center <!-- Submitted abstract: --> <!-- Data exploration is crucial for the comprehension of large biological datasets obtained by high-throughput assays such as sequencing. --> <!-- Interactive exploration is a key element in bioinformatics, as it fosters the efficient generation of novel data-driven hypotheses prior to rigorous statistical analysis, enables diagnosis of potential problems during quality control, and facilitates interpretation of the results in the context of a specific scientific question. --> <!-- Most existing tools for intuitive and interactive visualization are limited to specific assays or analyses and lack support for reproducible analysis. --> <!-- As a result of a community-driven effort in the scope of the Bioconductor project, we have built a general-purpose tool, iSEE - Interactive SummarizedExperiment Explorer, designed to accommodate any experimental data (bulk RNA-seq, single-cell RNA-seq, mass cytometry, ...) and/or associated metadata, stored in an instance of a SummarizedExperiment container. --> <!-- iSEE is implemented in R using the Shiny framework, and is compatible with many existing R/Bioconductor packages for analysing high-throughput biological data. --> <!-- Salient features of iSEE include: - a customizable interface with different panel types, linked among each other via user selection (brushing) - automatic tracking, storage, and rendering of the exact R code to generate all visible plots - interactive tours to showcase datasets and findings, with step-by-step description of relevant publication-ready plots - extendability with the definition of custom panel types --> <!-- Example applications are available online to demonstrate the interactive exploration of the TCGA RNA sequencing data (https://marionilab.cruk.cam.ac.uk/iSEE_tcga), single-cell RNA sequencing data (https://marionilab.cruk.cam.ac.uk/iSEE_allen, https://marionilab.cruk.cam.ac.uk/iSEE_pbmc4k), and mass cytometry data (https://marionilab.cruk.cam.ac.uk/iSEE_cytof). --> <!-- excellent slides: https://docs.google.com/presentation/d/1z_ycM7Rzb7DWgoWCwOTyelaTIkD7DqP8AYuNm6RVOrQ/edit#slide=id.g4fc057d4ed_0_51 --> <!-- TODO: --> <!-- new points to touch --> <!-- figure from Rob --> <!-- better lead in for the memes --> <!-- tweets appreciating iSEE to Drake --> <!-- plug for the shiny contest! --> <!-- focus more on the "philosophy" as well --> # Hi! I'm **Federico Marini**, Virchow Fellow @CTH Mainz/IMBEI -- I like platelets (and their transcriptome), and you should as well. -- <a href="mailto:marinif@uni-mainz.de">
`marinif@uni-mainz.de`</a><br> <a href="https://federicomarini.github.io">
`federicomarini.github.io`</a><br> <a href="http://twitter.com/FedeBioinfo">
`@FedeBioinfo`</a><br> <a href="http://github.com/federicomarini">
`@federicomarini`</a><br><br> <a href="http://www.imbei.de">
CTH/IMBEI (Mainz, Germany)</a> You can find this presentation here: [`https://federicomarini.github.io/dagstat2019/`](https://federicomarini.github.io/dagstat2019/)</br> <!--
[`@FedeBioinfo`](https://twitter.com/FedeBioinfo) --> <p align="center"> <img src="images/qrcode_dagstat.png" alt="" height="170"/> </p> --- # Transcriptomics at a glance - High dimensional snapshot of the transcriptomic activity of all RNA species in a sample <!-- - Bulk or single cell? --> -- **Data**: genes `\(\times\)` samples (e.g. bulk/single cells) **Objectives**: - Compare abundances to discover differentially expressed genes, gene signatures, etc. - features associated with phenotypic differences - Identification of cell subpopulations, description of developmental trajectories, study noise in transcriptional regulation -- **Challenges** - Large sets - Heterogeneous sets - Sparse sets --- background-image: url("images/console_logcounts_sparse.png") background-size: contain background-position: 50% 50% class: middle, center # Exploration and visualization Effective and efficient methods are key to deliver... --
better **quality assessment** --
better **generation of research hypotheses** --
better **representation of the results** --
better **communication** of findings --- # Efficient data containers: `SummarizedExperiment`s <p align="center"> <img src="images/sce_class.png" alt="" height="450"/> </p> <!-- or the one from the OSCA paper? --> It can store [**RNA-seq**|DNA methylation|Hi-C|Microarray|Mass cytometry|SNPs] data --- background-image: url("images/marie.png") background-size: cover background-position: 50% 50% class: center, bottom, inverse # Does the exploration of your data spark joy? --- # WANTED: A hammer for many nails Data exploration is crucial: - No general tool for this, limited to assay types or analysis steps - No support for reproducibility while keeping it intuitive and usable -- <!-- `iSEE`, interactive SummarizedExperiment Explorer: --> Joint work with Aaron Lun, Charlotte Soneson, Kevin Rue-Albrecht initiated at #EuroBioc2017 <p align="center"> <img src="images/lun_aaron_web.jpg" alt="" width="200"/> <img src="images/twit_charlotte.jpg" alt="" width="200"/> <img src="images/twit_kev.jpg" alt="" width="200"/> </p> <!-- We could have an interactive SummarizedExperiment Explorer tool... --> <!-- Visualize my data in any (precomputed) reduced dimension space. --> <!-- Color the data points with any experimental covariate (e.g. batch). --> <!-- Color the data points with any expression data. --> <!-- Select data points in a plot, and highlight them in another. --> <!-- Visualize the distribution of any assay or metadata. --> <!-- Visualize the correlation between gene A and gene B, specifically in ‘this’ or ‘that’ cluster --> <!-- Fully empower the data generators - and get lazy! --> --- class: center, animated, fadeIn # Hello `iSEE` <img src="images/iSEE.png" width="170"/> .pull-left[ <img src="images/ss_bioc_isee.png" alt="" width="500"/> ] .pull-right[ <img src="images/ss_isee.png" alt="" width="500"/> ] </br> [`https://f1000research.com/articles/7-741/v1`](https://f1000research.com/articles/7-741/v1), with live apps inside! <!-- -- --> <!-- Available in Bioconductor </br> --> <!-- ... or as devel version at [`https://github.com/csoneson/iSEE`](https://github.com/csoneson/iSEE) --> --- # `iSEE` in action: the *Tabula Muris* dataset .pull-left[ <p align="center"> <img src="images/iSEE.png" width="170"/> </p> ] .pull-right[ <img src="images/tabulamuris.png" alt="" width="500"/> ] -- Preprocessing details + `iSEE` configuration for this set can be found at [`https://github.com/federicomarini/iSEE_instances`](https://github.com/federicomarini/iSEE_instances) - 23036 genes - 43598 cells `\(\rightarrow\)` >1 billion data points! - 8 mice (10-15 weeks old) - 20 organs and tissues <!-- - First steps: `example(iSEE,ask = FALSE)` to explore the `allen` dataset --> <!-- - ... or start [`https://marionilab.cruk.cam.ac.uk/iSEE_pbmc4k/`](https://marionilab.cruk.cam.ac.uk/iSEE_pbmc4k/) --> <!-- <iframe width="720" height="400" src="https://www.youtube.com/embed/wpu9daTE4ok" frameborder="0" allow="autoplay; encrypted-media" allowfullscreen></iframe> --> <!-- Touch upon: --> <!-- input: SummarizedExperiment --> <!-- panel types --> <!-- add/remove plots --> <!-- link plots --> <!-- show code tracker --> <!-- show tour --> <!-- voice control!! --> --- background-image: url("images/tm_isee_allpanels.png") background-size: contain background-position: 50% 50% class: center, bottom # A tour of the panels --- background-image: url("images/ss_reddim.png") background-position: 50% 50% background-size: contain class: center, animated, zoomInLeft --- background-image: url("images/tm_isee_allpanels.png") background-size: contain background-position: 50% 50% class: center, animated, fadeIn --- background-image: url("images/ss_featassay.png") background-size: contain background-position: 50% 50% class: center, animated, zoomIn --- background-image: url("images/tm_isee_allpanels.png") background-size: contain background-position: 50% 50% class: center, animated, fadeIn --- background-image: url("images/ss_samplesassay.png") background-size: contain background-position: 50% 50% class: center, animated, zoomInRight --- background-image: url("images/tm_isee_allpanels.png") background-size: contain background-position: 50% 50% class: center, animated, fadeIn --- background-image: url("images/ss_coldata_rowdata.png") background-size: contain background-position: 50% 50% class: center, animated, zoomInUp --- background-image: url("images/tm_isee_allpanels.png") background-size: contain background-position: 50% 50% class: center, animated, fadeIn --- background-image: url("images/ss_tables.png") background-size: contain background-position: 50% 50% class: center, animated, zoomInUp --- background-image: url("images/tm_isee_allpanels.png") background-size: contain background-position: 50% 50% class: center, animated, fadeIn --- background-image: url("images/gif_reordering.gif") background-size: contain background-position: 50% 50% class: center, bottom # Reordering the panels --- background-image: url("images/tm_isee_reorderedpanels.png") background-size: contain background-position: 50% 25% class: center --- background-image: url("images/gif_colorbygene.gif") background-size: contain background-position: 50% 50% class: center, bottom # Augmenting the observed data with biological knowledge --- background-image: url("images/gif_linkedtotable.gif") background-size: contain background-position: 50% 50% class: center, bottom # Linking the panels --- background-image: url("images/gif_customde.gif") background-size: contain background-position: 50% 50% class: center, bottom # Custom panels --- background-image: url("images/gif_reprocode.gif") background-size: contain background-position: 50% 50% class: center, bottom # Code access for full reproducibility --- background-image: url("images/gif_toursteps.gif") background-size: contain background-position: 50% 50% class: center, bottom # Tours as a means to communicate --- <!-- Markers: Pax9, Olig1, Cd68 --> # Some feedback from users - early on <p align="center"> <img src="images/tweet_saskia.png" alt="" height="400"/> </p> [`https://twitter.com/trashystats/status/1007061299568578561`](https://twitter.com/trashystats/status/1007061299568578561) --- # Some feedback from users - now <p align="center"> <img src="images/tweet_rob.png" alt="" height="400"/> </p> [`https://twitter.com/robamezquita/status/1102612120527548416`](https://twitter.com/robamezquita/status/1102612120527548416) --- # Some feedback from users <p align="center"> <img src="images/meme_drake_isee.jpg" height="450"/> </p> `not sure this one can be trusted` --- background-image: url("images/iSEE.png") background-size: 200px background-position: 90% 10% # Summary - **Key features** for optimal exploration: - flexibility and customizability - linked information across plots - guided showcase/usage - effective communication - reproducibility (for self & for others!) -- - **Got data?** Accompany your publications as live browser! -- - **Outlook**: voice control + new features + steps towards multi-omics -- > See first, think later, then test. But always see first. </br> > Otherwise you will only see what you were expecting. Most scientists forget that. > > <footer>--- Douglas Adams</footer> --- background-image: url("images/iSEE.png") background-size: 200px background-position: 90% 10% # Summary - **Key features** for optimal exploration: - flexibility and customizability - linked information across plots - guided showcase/usage - effective communication - reproducibility (for self & for others!) - **Got data?** Accompany your publications as live browser! - **Outlook**: voice control + new features + steps towards multi-omics > ~~See~~ `iSEE` first, think later, then test. But always ~~see~~ `iSEE` first. > Otherwise you will only see what you were expecting. Most scientists forget that. > > <footer>--- Douglas Adams</footer> --- background-image: url("images/iSEE.png") background-size: 200px background-position: 90% 10% # Links <!-- a.k.a. the `iSEE`verse --> - **[`http://bioconductor.org/packages/iSEE/`](http://bioconductor.org/packages/iSEE/)** - [`https://github.com/csoneson/iSEE`](https://github.com/csoneson/iSEE) - [`https://github.com/LTLA/iSEE2018`](https://github.com/LTLA/iSEE2018) - [`https://github.com/kevinrue/iSEE_custom`](https://github.com/kevinrue/iSEE_custom) - [`https://github.com/federicomarini/iSEE_instances`](https://github.com/federicomarini/iSEE_instances) - **[`https://federicomarini.github.io/dagstat2019/`](https://federicomarini.github.io/dagstat2019/)** -- ### Acknowledgements - Center for Thrombosis and Hemostasis (CTH), Mainz (Virchow Fellowship) - IMBEI - Biostatistics & Bioinformatics division - Charlotte Soneson, Aaron Lun, Kevin Rue-Albrecht (the `iSEE` team) -- ### ... thank you for your attention! <code>marinif@uni-mainz.de</code> -
[`@FedeBioinfo`](https://twitter.com/FedeBioinfo) --- <!-- empty page --> --- # `iSEE` in action - a few lines of code ```r # now in Bioc 3.7! source("https://bioconductor.org/biocLite.R") biocLite("iSEE") library("scRNAseq") data("allen") library("scater") sce <- as(allen, "SingleCellExperiment") counts(sce) <- assay(sce, "tophat_counts") sce <- normalize(sce) sce <- runPCA(sce) sce <- runTSNE(sce) *library("iSEE") *iSEE(sce) # couple of genes to check: (Zeisel, Science 2015; # Tasic, Nature Neuroscience 2016) # Tbr1 (TF required for the final differentiation of # cortical projection neurons); # Snap25 (pan-neuronal); # Rorb (mostly L4 and L5a); # Foxp2 (L6) ``` --- # `iSEE` in action - the PBMC4k dataset <iframe width="900" height="500" src="https://www.youtube.com/embed/wpu9daTE4ok" frameborder="0" allow="autoplay; encrypted-media" allowfullscreen></iframe>