class: center, middle, title-slide #
iSEE
: easy and efficient exploration of sequencing data ### Federico Marini (
marinif@uni-mainz.de
) ### 2019/11/05 StatOmique Workshop --- class: center <!-- Submitted abstract: --> <!-- iSEE: easy and efficient exploration of sequencing data --> <!-- Recent technological advancements in the life and medical sciences have allowed these fields to evolve into quantitative disciplines. --> <!-- Gene expression profiling with RNA-sequencing, protein quantification via mass spectrometry, high throughput imaging are complex assays that often require advanced methods for processing and generating actionable results. Notably, large and heterogeneous datasets need specific techniques also for the exploration, and I found that an effective combination of interactivity and reproducibility is essential for doing this properly. --> <!-- Interactive visualization can be a key player in this regard: for quality assessment of often noisy multivariate data, for hypothesis generation and exploration, for the visualization of results, as well as for the efficient communication of findings. Moreover, often the scientists generating the primary data might not be well versed in programming, and this can increase the iterations required to extract insight out of it. --> <!-- I will present in detail the functionality of a Bioconductor software I co-developed, iSEE (http://bioconductor.org/packages/iSEE/). Leveraging point-and-click user interfaces with guided tours, I will show how users can efficiently explore high-dimensional datasets, automatically retrieve the code for the generated output, and learn (by doing) how to generate compelling scientific figures. --> # `Sys.getenv("USER")` I'm **Federico Marini**, Virchow Fellow @CTH Mainz/IMBEI, and I like to develop software (mainly R/Bioconductor) -- I like platelets (and their transcriptome), and you should as well. -- <a href="mailto:marinif@uni-mainz.de">
`marinif@uni-mainz.de`</a><br> <a href="https://federicomarini.github.io">
`federicomarini.github.io`</a><br> <a href="http://twitter.com/FedeBioinfo">
`@FedeBioinfo`</a><br> <a href="http://github.com/federicomarini">
`@federicomarini`</a><br><br> <a href="http://www.imbei.de">
CTH/IMBEI (Mainz, Germany)</a> -- You can find this presentation here: [`https://federicomarini.github.io/2019_StatOmique_workshop/`](https://federicomarini.github.io/2019_StatOmique_workshop/)<br> [`http://bit.ly/iSEE_statomique`](http://bit.ly/iSEE_statomique) <p align="center"> <img src="images/qrcode_statomique.png" alt="" height="160"/> </p> <!-- http://bit.ly/iSEE_statomique --> --- # Transcriptomics at a glance **RNA-seq**: High dimensional snapshot of the transcriptomic activity of all RNA species in a sample <!-- - Bulk or single cell? --> -- <p align="center"> <img src="images/wiki_dogma.jpg" alt="" height="400"/> </p> <!-- Quantify and compare --> --- # Transcriptomics at a glance **RNA-seq**: High dimensional snapshot of the transcriptomic activity of all RNA species in a sample <!-- - Bulk or single cell? --> **Data**: genes `\(\times\)` samples (e.g. bulk/single cells) **Objectives**: - Compare abundances to discover differentially expressed genes, gene signatures, etc. - features associated with phenotypic differences - Identification of cell subpopulations, description of developmental trajectories, study noise and heterogeneity in transcriptional regulation -- **Challenges** - Large sets - Heterogeneous sets - Sparse sets --- # The analysis workflow <p align="center"> <img src="images/Interaction_nostep.png" alt=""/> </p> --- # The analysis workflow <p align="center"> <img src="images/Interaction_step0.png" alt=""/> </p> --- # The analysis workflow <p align="center"> <img src="images/Interaction_step1.png" alt=""/> </p> --- # The analysis workflow <p align="center"> <img src="images/Interaction_step2.png" alt=""/> </p> --- # The analysis workflow <p align="center"> <img src="images/Interaction_step3.png" alt=""/> </p> --- # The analysis workflow <p align="center"> <img src="images/Interaction_step4.png" alt=""/> </p> --- # The analysis workflow <p align="center"> <img src="images/Interaction_step5.png" alt=""/> </p> --- # The analysis workflow <p align="center"> <img src="images/Interaction_step6.png" alt=""/> </p> --- # The analysis workflow <p align="center"> <img src="images/Interaction_step7.png" alt=""/> </p> --- background-image: url("images/marie.png") background-size: cover background-position: 50% 50% class: center, bottom, inverse -- # Does the exploration of your data spark joy? ### *a.k.a., how do we prevent the last steps from happening?* <!-- Is there a better way to support these cycles of exploration, inquiry, and hypothesis generation? --> --- # The challenges -- The data is **large** ("can't fit in our head") - use structures/containers that support alternative (out-of-memory) representations -- Our task is **complex** - yet we are curious and stubborn enough that we want to try and understand this -- The Pro & the Con: **We are not alone**, we work with a variety of experts - mind (and bridge) the gap when **communicating**! -- ### Here's my view on this - Proper analysis tools _should_ combine **interactivity and reproducibility** - Marini and Binder (2016) - [`10.18547/gcb.2017.vol3.iss1.e39`](https://doi.org/10.18547/gcb.2017.vol3.iss1.e39) - Marini and Binder (2019) - [`10.1186/s12859-019-2879-1`](https://doi.org/10.1186/s12859-019-2879-1) - Joe Cheng's keynote at `useR!2019` in Toulouse (welcome `shinymeta`!) --- class: center # Reproducible (data) science = good (data) science --
-- What do you need for doing that? -- **Data** - needless to say -- **Software** - list all required packages/versions, if needed to recreate environments on demand -- **Analysis steps/parameters**, with all tools and params in each step, documenting order, inputs, outputs --- background-image: url("images/console_logcounts_sparse.png") background-size: contain background-position: 50% 50% class: middle, center # (Interactive) Exploration and visualization: why? Effective and efficient methods are key to deliver... --
better **quality assessment** --
better **generation of research hypotheses** --
better **representation of the results** --
better **communication** of findings --- # `SummarizedExperiment` <p align="center"> <img src="images/sce_class.png" alt="" height="350"/> </p> <!-- Thank you Martin Morgan!!! --> <!-- or the one from the OSCA paper? --> -- It can store [**RNA-seq**|DNA methylation|Hi-C|Microarray|Mass cytometry|SNPs] data Excellent primer on single cell data analysis: [`https://osca.bioconductor.org`](https://osca.bioconductor.org)! --- # Looking for a silver bullet Data exploration is crucial: - No general tool for this, limited to assay types or analysis steps - No support for reproducibility while keeping it intuitive and usable -- Joint work with Aaron Lun, Charlotte Soneson, Kevin Rue-Albrecht initiated at [#EuroBioc2017](https://twitter.com/hashtag/EuroBioC2017) <p align="center"> <img src="images/lun_aaron_web.jpg" alt="" width="200"/> <img src="images/twit_charlotte.jpg" alt="" width="200"/> <img src="images/twit_kev.jpg" alt="" width="200"/> </p> -- "We could have an interactive SummarizedExperiment Explorer tool..." <!-- Visualize my data in any (precomputed) reduced dimension space. --> <!-- Color the data points with any experimental covariate (e.g. batch). --> <!-- Color the data points with any expression data. --> <!-- Select data points in a plot, and highlight them in another. --> <!-- Visualize the distribution of any assay or metadata. --> <!-- Visualize the correlation between gene A and gene B, specifically in ‘this’ or ‘that’ cluster --> <!-- Fully empower the data generators - and get lazy! --> --- # How did we get there? --- class: animated, fadeInUp <!-- Throwback thursday - tuesday :) --> # How did we get there? Pt.1 [`http://bioconductor.org/packages/pcaExplorer/`](http://bioconductor.org/packages/pcaExplorer/) <p align="center"> <img src="images/pcaExplorer_fig1.png" height="400"/> <code>library("pcaExplorer")</code></br> <code>pcaExplorer(dds_object)</code> </p> --- class: animated, fadeInUp # How did we get there? Pt. 2 [`http://bioconductor.org/packages/ideal/`](http://bioconductor.org/packages/ideal/) <p align="center"> <img src="images/ideal_figWorkflow.png" height="400"/> <code>library("ideal")</code></br> <code>ideal(dds_object)</code> </p> --- class: animated, tada # Hello `iSEE` <p align="center"> <img src="images/iSEE.png" width="370"/> </p> * [`https://f1000research.com/articles/7-741/v1`](https://f1000research.com/articles/7-741/v1), live apps inside * Available in Bioconductor [`http://bioconductor.org/packages/iSEE/`](http://bioconductor.org/packages/iSEE/) --- class: center, middle ## Using iSEE is easy as -- # `iSEE(sce)` --- # `iSEE` in action: the *Tabula Muris* dataset .pull-left[ <p align="center"> <img src="images/iSEE.png" width="170"/> </p> ] .pull-right[ <img src="images/tabulamuris.png" alt="" width="500"/> ] -- Preprocessing details + `iSEE` configuration for this set can be found at [`https://github.com/federicomarini/iSEE_instances`](https://github.com/federicomarini/iSEE_instances/tree/master/iSEE_tabulamuris) - 20 organs and tissues, from 8 different mice - 23036 genes - 43598 cells `\(\rightarrow\)` >1 billion data points! --- background-image: url("images/tm_isee_allpanels.png") background-size: contain background-position: 50% 50% class: center, bottom # A tour of the panels --- background-image: url("images/ss_reddim.png") background-position: 50% 50% background-size: contain class: center, animated, zoomInLeft --- background-image: url("images/tm_isee_allpanels.png") background-size: contain background-position: 50% 50% class: center, animated, fadeIn --- background-image: url("images/ss_featassay.png") background-size: contain background-position: 50% 50% class: center, animated, zoomIn --- background-image: url("images/tm_isee_allpanels.png") background-size: contain background-position: 50% 50% class: center, animated, fadeIn --- background-image: url("images/ss_samplesassay.png") background-size: contain background-position: 50% 50% class: center, animated, zoomInRight --- background-image: url("images/tm_isee_allpanels.png") background-size: contain background-position: 50% 50% class: center, animated, fadeIn --- background-image: url("images/ss_coldata_rowdata.png") background-size: contain background-position: 50% 50% class: center, animated, zoomInUp --- background-image: url("images/tm_isee_allpanels.png") background-size: contain background-position: 50% 50% class: center, animated, fadeIn --- background-image: url("images/ss_tables.png") background-size: contain background-position: 50% 50% class: center, animated, zoomInUp --- background-image: url("images/tm_isee_allpanels.png") background-size: contain background-position: 50% 50% class: center, animated, fadeIn --- background-image: url("images/gif_bsb2.gif") background-size: contain background-position: 50% 50% class: center, bottom, inverse, fadeIn # Flexibility and customizability --- background-image: url("images/gif_reordering.gif") background-size: contain background-position: 50% 50% class: center, bottom # Reordering the panels --- background-image: url("images/tm_isee_reorderedpanels.png") background-size: contain background-position: 50% 25% class: center --- background-image: url("images/gif_colorbygene.gif") background-size: contain background-position: 50% 50% class: center, bottom # Augmenting the observed data with biological knowledge --- background-image: url("images/gif_linkedtotable.gif") background-size: contain background-position: 50% 50% class: center, bottom # Linking the panels --- background-image: url("images/gif_customde.gif") background-size: contain background-position: 50% 50% class: center, bottom # Custom panels --- background-image: url("images/meme_owl.jpg") background-size: contain background-position: 50% 50% class: center, bottom # Working in a reproducible way --- background-image: url("images/brit_bored.gif") background-size: cover class: center, bottom, inverse # "Oh, code..." --- background-image: url("images/gif_reprocode.gif") background-size: contain background-position: 50% 50% class: center, bottom # Code access for full reproducibility --- background-image: url("images/brit_dance.gif") background-size: cover class: center, bottom, inverse -- # `knit` me baby one more time --- background-image: url("images/meme_gandalf.jpg") background-size: contain class: center, bottom --- <!-- Markers: Pax9, Olig1, Cd68 --> background-image: url("images/gif_toursteps.gif") background-size: contain background-position: 50% 50% class: center, bottom # Interactive tours: an efficient means to communicate --- class: center, middle # What's new in `iSEE`? --- class: center, middle <p align="center"> <img src="images/ss_iSEE_release3_10_news.png" alt="" height="500"/> </p> Thread here: [`https://twitter.com/FedeBioinfo/status/1189677334086991883?s=20`](https://twitter.com/FedeBioinfo/status/1189677334086991883?s=20) --- # `iSEE()` - `iSEE` as a working solution, proposed as standard way of exploring large datasets ([`https://www.biorxiv.org/content/10.1101/590562v1`](https://www.biorxiv.org/content/10.1101/590562v1) and highlighted in the useR!2019 keynote by Martin Morgan) -- - lots of datasets have a reduced impact if not properly adhering to the FAIR data principles! -- - `scRNAseq` as an example repository for published single cell datasets A toy-not-so-toy prototype... ```r library(iSEE) shiny::runApp(appDir = "../iSEE_portal/") ``` --- # Setting up your `iSEE` server Step 1 - Setup your Shiny server -- Step 2 - Setup the app in `/srv/shiny-server/[yourApp]` <p align="center"> <img src="images/ss_server_app_allen.png" alt="" height="200"/> </p> -- <p align="center"> <img src="images/ss_server_app_custom.png" alt="" height="100"/> </p> --- # Setting up your `iSEE` server Step 3 - Setup the server in `/etc/shiny-server/shiny-server.conf` <p align="center"> <img src="images/ss_server_config_allen.png" alt="" height="180"/> </p> -- <p align="center"> <img src="images/ss_server_config_custom_portal.png" alt="" height="180"/> </p> -- Step 4 - That's it! --- # An example: the TILs atlas <p align="center"> <img src="images/ss_tils.png" height="250"/> </p> -- <p align="center"> <img src="images/ss_tils_atlas.png" height="150"/> </p> --- # `iSEE` gene lists Code available at [`iSEE_instances/iSEE_small_intestinal_epithelium`](https://github.com/federicomarini/iSEE_instances/tree/master/iSEE_small_intestinal_epithelium) Marker lists are extracted from the original publication, some machine readable examples are `markerlist_fig1.txt`, `markerlist_fig3.txt`, and `markerlist_extfig1-2.txt` --- # Are we/others reinventing the wheel? <!-- Indeed even Haber provide a browser... --> -- [`https://github.com/federicomarini/awesome-expression-browser`](https://github.com/federicomarini/awesome-expression-browser) -- <p align="center"> <img src="images/meme_batman.jpg" alt="" height="400"/> </p> [`https://blog.rstudio.com/2019/04/05/first-shiny-contest-winners`](https://blog.rstudio.com/2019/04/05/first-shiny-contest-winners) --- ## `iSEE(sce, voice = TRUE)` <iframe width="900" height="500" src="https://www.youtube-nocookie.com/embed/0crFZLwAJOE" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> <!-- Easter eggs included! --> --- ## `iSEE` diamonds! Code available at [`iSEE_instances/iSEE_classics`](https://github.com/federicomarini/iSEE_instances/tree/master/iSEE_classics/) <p align="center"> <img src="images/ss_diamonds_price.png" alt="" height="350"/> </p> ... or you can also have a look at `mtcars`! --- # Under the hood -- - Panels generation: [Link to GitHub repo code](https://github.com/csoneson/iSEE/blob/a237f26599821fa844bea3d0883cca01faf2459c/R/iSEE-main.R#L341-L367) -- - Deploying custom panels: [Vignette on Bioc](http://bioconductor.org/packages/3.9/bioc/vignettes/iSEE/inst/doc/custom.html) and [some examples](https://github.com/kevinrue/iSEE_custom/blob/master/README.md) -- - Plots and code: Link to GitHub repo code for [plot generation](https://github.com/csoneson/iSEE/blob/a237f26599821fa844bea3d0883cca01faf2459c/R/plotting.R#L350-L397) and [code generation](https://github.com/csoneson/iSEE/blob/a237f26599821fa844bea3d0883cca01faf2459c/R/observers.R#L44-L57) -- - Get your tours running: setup by putting `introjsUI()` in your UI, triggered like [in this example](https://github.com/csoneson/iSEE/blob/a237f26599821fa844bea3d0883cca01faf2459c/R/observers.R#L31-L42), with the [tour itself (anchors and descriptions, supporting HTML!) defined via DOM selectors](https://github.com/csoneson/iSEE/blob/a237f26599821fa844bea3d0883cca01faf2459c/inst/extdata/intro_firststeps.txt) -- - Setup voice control: [Link to GitHub repo code](https://github.com/csoneson/iSEE/blob/a237f26599821fa844bea3d0883cca01faf2459c/R/voice.R), [setup what happens on the js side](https://github.com/csoneson/iSEE/blob/a237f26599821fa844bea3d0883cca01faf2459c/inst/www/voice.js), and put [annyang.min.js](https://github.com/csoneson/iSEE/blob/a237f26599821fa844bea3d0883cca01faf2459c/inst/www/annyang.min.js) in the `www/` folder. [This blog post](https://www.jumpingrivers.com/blog/voice-control-your-shiny-apps/) from Colin Gillespie is a minimal working example! <!-- --- --> <!-- # Outlook --> <!-- - iSEE as a hub-like service: allowing loading at runtime while providing a centralized server for running operations --> <!-- - Power users: instructions and examples of custom panels --> <!-- - Multi-omics datasets are coming: implement support for `MultiAssayExperiment` or equivalent solutions? --> --- background-image: url("images/iSEE.png") background-size: 200px background-position: 90% 10% # Summary - **Key features** for optimal exploration: - flexibility and customizability - linked information across plots - guided showcase/usage - effective communication - reproducibility (for self & for others!) - voice control (fun & accessibility) -- - **Got data?** Accompany your publications as live browser, promote open data and open science, increase the impact of your findings & foster new viewpoints -- - **Outlook**: reduce even more the barriers for first timers (`modes`) + Gallery with fully worked out examples & Power users ❤️ custom panels (and even more complex datasets!) -- - `GeneTonic`: a similar approach for enjoying all the components of transcriptome analysis - Design & deploy `Plateletopedia` --- background-image: url("images/iSEE.png") background-size: 200px background-position: 90% 10% # Links <!-- a.k.a. the `iSEE`verse --> - **[`http://bioconductor.org/packages/iSEE/`](http://bioconductor.org/packages/iSEE/)** - [`https://github.com/csoneson/iSEE`](https://github.com/csoneson/iSEE) - [`https://github.com/LTLA/iSEE2018`](https://github.com/LTLA/iSEE2018) - [`https://github.com/kevinrue/iSEE_custom`](https://github.com/kevinrue/iSEE_custom) - [`https://github.com/federicomarini/iSEE_instances`](https://github.com/federicomarini/iSEE_instances) -- Into Differential Expression? (*where it all began*) - **[`http://bioconductor.org/packages/pcaExplorer/`](http://bioconductor.org/packages/pcaExplorer/)** - **[`http://bioconductor.org/packages/ideal/`](http://bioconductor.org/packages/ideal/)** -- ### Acknowledgements - Center for Thrombosis and Hemostasis (CTH), Mainz (Virchow Fellowship) - IMBEI - Biostatistics & Bioinformatics division - Charlotte Soneson, Aaron Lun, Kevin Rue-Albrecht (the `iSEE` team) --- # eRum2020 is coming <p align="center"> <img src="images/plug_erum2020.png" alt="" height="500"/> </p> --- class: center, middle ## ... thank you for your attention! <code>marinif@uni-mainz.de</code> -
[`@FedeBioinfo`](https://twitter.com/FedeBioinfo) [`https://federicomarini.github.io/2019_StatOmique_workshop/`](https://federicomarini.github.io/2019_StatOmique_workshop/) [`http://bit.ly/iSEE_statomique`](http://bit.ly/iSEE_statomique) <img src="images/qrcode_statomique.png" alt="" height="200"/> --- <!-- empty slide --> --- # `iSEE` in action - a few lines of code ```r # in Bioc since 3.7 install.packages("BiocManager") BiocManager::install("iSEE") library("scRNAseq") data("allen") library("scater") sce <- allen counts(sce) <- assay(sce, "tophat_counts") sce <- normalize(sce) sce <- runPCA(sce) sce <- runTSNE(sce) *library("iSEE") *iSEE(sce) # couple of genes to check: (Zeisel, Science 2015; # Tasic, Nature Neuroscience 2016) # Tbr1 (TF required for the final differentiation of # cortical projection neurons); # Snap25 (pan-neuronal); # Rorb (mostly L4 and L5a); # Foxp2 (L6) ``` --- # `iSEE` in action - the PBMC4k dataset <iframe width="900" height="500" src="https://www.youtube.com/embed/wpu9daTE4ok" frameborder="0" allow="autoplay; encrypted-media" allowfullscreen></iframe>