class: center, middle, title-slide # Interactive visualization as a key tool for understanding (medical) omics data ### Federico Marini (
marinif@uni-mainz.de
) ### 2019/09/09 GMDS 2019 - Dortmund
#gmds2019
--- class: center <!-- Submitted abstract: --> <!-- # Interactive visualization as a key tool for understanding (medical) omics data --> <!-- Life sciences have been evolving through the last decade to become a quantitative discipline, with a leading role played by high-throughput technologies (gene expression profiling, protein quantitation via mass spectrometry, high-throughput imaging). --> <!-- Data is available in different experimental conditions, at different molecular layers, and also at different resolution, whereas single-cell techniques (especially in the field of transcriptomics) have enabled unprecedented views to understand complex phenomena by means of large, heterogeneous datasets. --> <!-- How can interactive visualization help in this regard? It is an essential tool for quality assessment of often noisy multivariate data, for hypothesis generation and exploration, for the visualization of results, as well as for the efficient communication of findings. --> <!-- The ideal visualization tool offers a variety of views on the data: reduced dimensionality views, features and samples plots for the assays and the metadata (scatter plots, heatmaps, distribution plots, interactive tables). Simultaneous and linked viewpoints are also fundamental to aid the exploration of complex data, and this holds true for anyone who accesses the data. A tool should also account for ways to guide external users (i.e. other practitioners) in obtaining new angles on the same input, to increase usability and impact of the data at hand. --> <!-- We developed a general tool, iSEE (http://bioconductor.org/packages/iSEE/ [1]), for exploring a wide range of high dimensional datasets (bulk and single cell RNA-seq, Mass cytometry, ...), with a solution that delivers scalability, flexibility, interactivity, and reproducibility, and I would like to present in this workshop how we addressed these aspects more in detail. --> <!-- Using iSEE, inter-disciplinary efforts in understanding complex data are efficient in the many iterations commonly involved in exploration. Moreover, our tool can become a stepping stone to tackle the challenges of multi-omics approaches, by leveraging data structures able to accommodate the different biological, molecular, and clinical layers. --> <!-- 1. Rue-Albrecht K, Marini F, Soneson C and Lun ATL. iSEE: Interactive SummarizedExperiment Explorer [version 1; referees: 3 approved]. F1000Research 2018, 7:741 --> <!-- (https://doi.org/10.12688/f1000research.14966.1) --> # `Sys.getenv("USER")` -- I'm **Federico Marini**, Virchow Fellow @CTH Mainz/IMBEI -- I like platelets (and their transcriptome), and you should as well. -- <a href="mailto:marinif@uni-mainz.de">
`marinif@uni-mainz.de`</a><br> <a href="https://federicomarini.github.io">
`federicomarini.github.io`</a><br> <a href="http://twitter.com/FedeBioinfo">
`@FedeBioinfo`</a><br> <a href="http://github.com/federicomarini">
`@federicomarini`</a><br><br> <a href="http://www.imbei.de">
CTH/IMBEI (Mainz, Germany)</a> -- You can find this presentation here: [`https://federicomarini.github.io/GMDS2019/`](https://federicomarini.github.io/GMDS2019/)</br> <!--
[`@FedeBioinfo`](https://twitter.com/FedeBioinfo) --> <p align="center"> <img src="images/qrcode_GMDS2019.png" alt="" height="170"/> </p> --- # Why are we here? -- - Because we are dealing with complex stuff, be it Life sciences, Medical/Clinical sciences, you name it -- - Because we have data that can't fit in our head - and we have to process, manage, and analyse those large (big?) amounts -- - Because we are curious and stubborn enough that we want to try and understand this -- - The Pro & the Con: **We are not alone**, and we work with a variety of experts -- **Mind the gap**: technical expertise to work with data VS biological expertise to interpret the data! --- # Transcriptomics at a glance **RNA-seq**: High dimensional snapshot of the transcriptomic activity of all RNA species in a sample <!-- - Bulk or single cell? --> -- **Data**: genes `\(\times\)` samples (e.g. bulk/single cells) **Objectives**: - Compare abundances to discover differentially expressed genes, gene signatures, etc. - features associated with phenotypic differences - Identification of cell subpopulations, description of developmental trajectories, study noise and heterogeneity in transcriptional regulation -- **Challenges** - Large sets - Heterogeneous sets - Sparse sets --- background-image: url("images/marie.png") background-size: cover background-position: 50% 50% class: center, bottom, inverse -- # Does the exploration of your data spark joy? --- # The analysis workflow <p align="center"> <img src="images/Interaction_nostep.png" alt=""/> </p> --- # The analysis workflow <p align="center"> <img src="images/Interaction_step0.png" alt=""/> </p> --- # The analysis workflow <p align="center"> <img src="images/Interaction_step1.png" alt=""/> </p> --- # The analysis workflow <p align="center"> <img src="images/Interaction_step2.png" alt=""/> </p> --- # The analysis workflow <p align="center"> <img src="images/Interaction_step3.png" alt=""/> </p> --- # The analysis workflow <p align="center"> <img src="images/Interaction_step4.png" alt=""/> </p> --- # The analysis workflow <p align="center"> <img src="images/Interaction_step5.png" alt=""/> </p> --- # The analysis workflow <p align="center"> <img src="images/Interaction_step6.png" alt=""/> </p> --- class: center, bottom # What gives you joy in exploring and visualizing your data? <p align="center"> <img src="images/unclesam.png" alt="" height = 350/> </p> ### *a.k.a., how do we prevent the last steps from happening?* <!-- Is there a better way to support these cycles of exploration, inquiry, and hypothesis generation? --> --- # What's there for us to use? -- The data is **large** - use structures/containers that support alternative (out-of-memory) representations, e.g. HDF5 format - combining `assays`, `rowData`, `colData` -- Our task is **complex** - Proper analysis tools _should_ combine interactivity and reproducibility - *Development of Applications for Interactive and Reproducible Research: a Case Study*, Marini and Binder (2016) - [`10.1186/s12859-019-2879-1`](https://doi.org/10.1186/s12859-019-2879-1) - *pcaExplorer: an R/Bioconductor package for interacting with RNA-seq principal components*, Marini and Binder (2019) - [`10.18547/gcb.2017.vol3.iss1.e39`](https://doi.org/10.18547/gcb.2017.vol3.iss1.e39) - Soon to appear in your preprint venue: *ideal: an R/Bioconductor package for Interactive Differential Expression Analysis* - Joe Cheng's keynote at `useR!2019` in Toulouse (welcome `shinymeta`!) --- background-image: url("images/console_logcounts_sparse.png") background-size: contain background-position: 50% 50% class: middle, center # Exploration and visualization: why? Effective and efficient methods are key to deliver... --
better **quality assessment** --
better **generation of research hypotheses** --
better **representation of the results** --
better **communication** of findings --- # `SummarizedExperiment`s <p align="center"> <img src="images/sce_class.png" alt="" height="350"/> </p> <!-- Thank you Martin Morgan!!! --> <!-- or the one from the OSCA paper? --> <!-- If you are a statistician, just tilt your head by 90 degrees :) --> -- It can store [**RNA-seq**|DNA methylation|Hi-C|Microarray|Mass cytometry|SNPs] data If you are into single cell data, check out [`https://osca.bioconductor.org`](https://osca.bioconductor.org)! --- # Looking for a silver bullet Data exploration is crucial: - No general tool for this, limited to assay types or analysis steps - No support for reproducibility while keeping it intuitive and usable -- Joint work with Aaron Lun, Charlotte Soneson, Kevin Rue-Albrecht initiated at [#EuroBioc2017](https://twitter.com/hashtag/EuroBioC2017) <p align="center"> <img src="images/lun_aaron_web.jpg" alt="" width="200"/> <img src="images/twit_charlotte.jpg" alt="" width="200"/> <img src="images/twit_kev.jpg" alt="" width="200"/> </p> -- "We could have an interactive SummarizedExperiment Explorer tool..." <!-- Visualize my data in any (precomputed) reduced dimension space. --> <!-- Color the data points with any experimental covariate (e.g. batch). --> <!-- Color the data points with any expression data. --> <!-- Select data points in a plot, and highlight them in another. --> <!-- Visualize the distribution of any assay or metadata. --> <!-- Visualize the correlation between gene A and gene B, specifically in βthisβ or βthatβ cluster --> <!-- Fully empower the data generators - and get lazy! --> --- class: animated, fadeIn # Hello `iSEE` <p align="center"> <img src="images/iSEE.png" width="370"/> </p> * [`https://f1000research.com/articles/7-741/v1`](https://f1000research.com/articles/7-741/v1), live apps inside * Available in Bioconductor [`http://bioconductor.org/packages/iSEE/`](http://bioconductor.org/packages/iSEE/) <!-- .pull-left[ --> <!-- <img src="images/ss_bioc_isee.png" alt="" width="500"/> --> <!-- ] --> <!-- .pull-right[ --> <!-- <img src="images/ss_isee.png" alt="" width="500"/> --> <!-- ] --> <!-- </br> --> <!-- Available in Bioconductor </br> --> <!-- ... or as devel version at [`https://github.com/csoneson/iSEE`](https://github.com/csoneson/iSEE) --> --- class: center, middle # `iSEE(sce)` --- # `iSEE` in action: the *Tabula Muris* dataset .pull-left[ <p align="center"> <img src="images/iSEE.png" width="170"/> </p> ] .pull-right[ <img src="images/tabulamuris.png" alt="" width="500"/> ] -- Preprocessing details + `iSEE` configuration for this set can be found at [`https://github.com/federicomarini/iSEE_instances`](https://github.com/federicomarini/iSEE_instances/tree/master/iSEE_tabulamuris) - 20 organs and tissues, from 8 different mice - 23036 genes - 43598 cells `\(\rightarrow\)` >1 billion data points! <!-- and that is the smaller set --> <!-- - First steps: `example(iSEE,ask = FALSE)` to explore the `allen` dataset --> <!-- - ... or start [`https://marionilab.cruk.cam.ac.uk/iSEE_pbmc4k/`](https://marionilab.cruk.cam.ac.uk/iSEE_pbmc4k/) --> <!-- <iframe width="720" height="400" src="https://www.youtube.com/embed/wpu9daTE4ok" frameborder="0" allow="autoplay; encrypted-media" allowfullscreen></iframe> --> <!-- Touch upon: --> <!-- input: SummarizedExperiment --> <!-- panel types --> <!-- add/remove plots --> <!-- link plots --> <!-- show code tracker --> <!-- show tour --> <!-- voice control!! --> --- background-image: url("images/tm_isee_allpanels.png") background-size: contain background-position: 50% 50% class: center, bottom # A tour of the panels --- background-image: url("images/ss_reddim.png") background-position: 50% 50% background-size: contain class: center, animated, zoomInLeft --- background-image: url("images/tm_isee_allpanels.png") background-size: contain background-position: 50% 50% class: center, animated, fadeIn --- background-image: url("images/ss_featassay.png") background-size: contain background-position: 50% 50% class: center, animated, zoomIn --- background-image: url("images/tm_isee_allpanels.png") background-size: contain background-position: 50% 50% class: center, animated, fadeIn --- background-image: url("images/ss_samplesassay.png") background-size: contain background-position: 50% 50% class: center, animated, zoomInRight --- background-image: url("images/tm_isee_allpanels.png") background-size: contain background-position: 50% 50% class: center, animated, fadeIn --- background-image: url("images/ss_coldata_rowdata.png") background-size: contain background-position: 50% 50% class: center, animated, zoomInUp --- background-image: url("images/tm_isee_allpanels.png") background-size: contain background-position: 50% 50% class: center, animated, fadeIn --- background-image: url("images/ss_tables.png") background-size: contain background-position: 50% 50% class: center, animated, zoomInUp --- background-image: url("images/tm_isee_allpanels.png") background-size: contain background-position: 50% 50% class: center, animated, fadeIn --- background-image: url("images/gif_bsb2.gif") background-size: contain background-position: 50% 50% class: center, bottom, inverse, fadeIn # Flexibility and customizability --- background-image: url("images/gif_reordering.gif") background-size: contain background-position: 50% 50% class: center, bottom # Reordering the panels --- background-image: url("images/tm_isee_reorderedpanels.png") background-size: contain background-position: 50% 25% class: center --- background-image: url("images/gif_colorbygene.gif") background-size: contain background-position: 50% 50% class: center, bottom # Augmenting the observed data with biological knowledge --- background-image: url("images/gif_linkedtotable.gif") background-size: contain background-position: 50% 50% class: center, bottom # Linking the panels --- background-image: url("images/gif_customde.gif") background-size: contain background-position: 50% 50% class: center, bottom # Custom panels --- background-image: url("images/meme_owl.jpg") background-size: contain background-position: 50% 50% class: center, bottom # Working in a reproducible way <br> --- background-image: url("images/brit_bored.gif") background-size: cover class: center, bottom, inverse # "Oh, code..." --- background-image: url("images/gif_reprocode.gif") background-size: contain background-position: 50% 50% class: center, bottom # Code access for full reproducibility --- background-image: url("images/brit_dance.gif") background-size: cover class: center, bottom, inverse -- # `knit` me baby one more time --- background-image: url("images/meme_gandalf.jpg") background-size: contain class: center, bottom <!-- Markers: Pax9, Olig1, Cd68 --> --- background-image: url("images/gif_toursteps.gif") background-size: contain background-position: 50% 50% class: center, bottom # Interactive tours: an efficient means to communicate --- # `iSEE` in action - a few lines of code ```r # in Bioc since 3.7 install.packages("BiocManager") BiocManager::install("iSEE") library("scRNAseq") data("allen") library("scater") sce <- allen counts(sce) <- assay(sce, "tophat_counts") sce <- normalize(sce) sce <- runPCA(sce) sce <- runTSNE(sce) *library("iSEE") *iSEE(sce) # couple of genes to check: (Zeisel, Science 2015; # Tasic, Nature Neuroscience 2016) # Tbr1 (TF required for the final differentiation of # cortical projection neurons); # Snap25 (pan-neuronal); # Rorb (mostly L4 and L5a); # Foxp2 (L6) ``` --- class: center, middle # `iSEE(sce)` --- class: center, middle # `iSEE()` -- ## `iSEE.data` --- # Some feedback from users/1 <p align="center"> <img src="images/tweet_saskia.png" alt="" height="400"/> </p> [`https://twitter.com/trashystats/status/1007061299568578561`](https://twitter.com/trashystats/status/1007061299568578561) --- # Some feedback from users/2 <p align="center"> <img src="images/tweet_rob.png" alt="" height="400"/> </p> [`https://twitter.com/robamezquita/status/1102612120527548416`](https://twitter.com/robamezquita/status/1102612120527548416) <!-- --- --> <!-- # Some feedback from users --> <!-- <p align="center"> --> <!-- <img src="images/meme_drake_isee.jpg" height="450"/> --> <!-- </p> --> <!-- `not sure this one can be trusted` --> <!-- --- --> <!-- # πΎ π πΊ π π€ --> <!-- <p align="center"> --> <!-- <img src="images/shiny_contest_winner.gif" alt="" height="400"/> --> <!-- </p> --> --- # Are we/others reinventing the wheel? [`https://github.com/federicomarini/awesome-expression-browser`](https://github.com/federicomarini/awesome-expression-browser) -- <p align="center"> <img src="images/meme_batman.jpg" alt="" height="400"/> </p> [`https://blog.rstudio.com/2019/04/05/first-shiny-contest-winners`](https://blog.rstudio.com/2019/04/05/first-shiny-contest-winners) --- ## `iSEE(sce, voice = TRUE)` <iframe width="900" height="500" src="https://www.youtube-nocookie.com/embed/0crFZLwAJOE" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> <!-- Easter eggs included! --> --- # Outlook - CZI is funding Bioconductor's subproject from Stephanie Hicks (Interactive and Scalable Data Visualization) - Performance on large data sets: new fully worked out examples in the Gallery (demonstrate applicability to many data types) - iSEE as a hub-like service: allowing loading at runtime while providing a centralized server for running operations - Reduce barriers for first timers: use "modes" (e.g. QC, exploration of single cell data, DE among clusters), more dedicated vignettes - Promote open data and open science: deploy your data with iSEE and increase the impact of your findings! Foster new viewpoints and new hypotheses on existing data! - Power users: instructions and examples of custom panels - Multi-omics datasets are coming: implement support for `MultiAssayExperiment` or equivalent solutions? --- background-image: url("images/iSEE.png") background-size: 200px background-position: 90% 10% # Summary - **Key features** for optimal exploration: - flexibility and customizability - linked information across plots - guided showcase/usage - effective communication - reproducibility (for self & for others!) - voice control (fun & accessibility) -- - **Got data?** Accompany your publications as live browser! -- > See first, think later, then test. But always see first. </br> > Otherwise you will only see what you were expecting. Most scientists forget that. > > <footer>--- Douglas Adams</footer> --- background-image: url("images/iSEE.png") background-size: 200px background-position: 90% 10% # Summary - **Key features** for optimal exploration: - flexibility and customizability - linked information across plots - guided showcase/usage - effective communication - reproducibility (for self & for others!) - voice control (fun & accessibility) - **Got data?** Accompany your publications as live browser! > ~~See~~ `iSEE` first, think later, then test. But always ~~see~~ `iSEE` first. > Otherwise you will only see what you were expecting. Most scientists forget that. > > <footer>--- Douglas Adams</footer> --- background-image: url("images/iSEE.png") background-size: 200px background-position: 90% 10% # Links <!-- a.k.a. the `iSEE`verse --> - **[`http://bioconductor.org/packages/iSEE/`](http://bioconductor.org/packages/iSEE/)** - [`https://github.com/csoneson/iSEE`](https://github.com/csoneson/iSEE) - [`https://github.com/LTLA/iSEE2018`](https://github.com/LTLA/iSEE2018) - [`https://github.com/kevinrue/iSEE_custom`](https://github.com/kevinrue/iSEE_custom) - [`https://github.com/federicomarini/iSEE_instances`](https://github.com/federicomarini/iSEE_instances) - **[`https://federicomarini.github.io/GMDS2019/`](https://federicomarini.github.io/GMDS2019/)** - **[`http://kevinrue.github.io/iSEEWorkshop2019/index.html`](http://kevinrue.github.io/iSEEWorkshop2019/index.html)** -- ### Acknowledgements - Center for Thrombosis and Hemostasis (CTH), Mainz (Virchow Fellowship) - IMBEI - Biostatistics & Bioinformatics division - Charlotte Soneson, Aaron Lun, Kevin Rue-Albrecht (the `iSEE` team) -- ### ... thank you for your attention! <code>marinif@uni-mainz.de</code> - [
`@FedeBioinfo`](https://twitter.com/FedeBioinfo) --- background-image: url("images/erum2020_promo_wide.png") background-size: 800px background-position: 20% 60% class: center # eRum2020 is coming <br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br> ### [`http://2020.erum.io`](http://2020.erum.io) <!-- <p align="center"> --> <!-- <img src="images/erum2020_promo_wide.png" alt="" height="450"/> --> <!-- </p> --> --- <!-- empty slide --> --- # `iSEE` in action - the PBMC4k dataset <iframe width="900" height="500" src="https://www.youtube.com/embed/wpu9daTE4ok" frameborder="0" allow="autoplay; encrypted-media" allowfullscreen></iframe>