--- title: "DamageDetective Overview" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{DamageDetective Overview} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} date: 'Compiled: `r format(Sys.Date(), "%B %d, %Y")`' --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) options(lifecycle_verbosity = "quiet") ``` # Package description The goal of `DamageDetective` is to simplify the process of making informed and reproducible damaged cell filtering decisions during the pre-processing of single cell RNA sequencing data. This requires only a count matrix to run and outputs a damage score ranging from 0 (viable, intact cell) to 1 (broken, non-viable cell). There is an option for automatic filtering using the default upper threshold damage score of 0.5. We will demonstrate briefly using an example dataset provided upon package installation. ```{r} library(DamageDetective) library(Matrix) data("test_counts", package = "DamageDetective") dim(test_counts) ``` # Prerequisites ## Libraries For an improvement in speed, load the `presto` package in addition to `DamageDetective`. ```{r library_prep, echo=FALSE, message=FALSE, warning=FALSE} library(DamageDetective) ``` ```{r library_prep_shown, eval=FALSE} install.packages("remotes") remotes::install_github("madsen-lab/valiDrops") library(DamageDetective) library(presto) ``` ## Input data formatting - Counts should be provided in the form of a compressed, column-oriented sparse matrix (`dgCMatrix`) in `R`. ```{r} # View formatting class(test_counts) ``` - Counts should have gene sets formatted according to HGCN standard, i.e., `MT-...` rather than `ENSG...`. ```{r} # View formatting head(rownames(test_counts)) ``` > See [biomaRt](https://bioconductor.org/packages/release/bioc/html/biomaRt.html) for conversion assistance or use the automated `Seurat` functions for working with alignment output, [ReadMtx](https://satijalab.org/seurat/reference/readmtx). - `DamageDetective` supports data of human and mouse, specified using the `organism` parameter. To analyse a non-standard organism, provide a list with patterns that matches the set of mitochondrially encoded genes and ribosomal genes and genes with a confirmed, permanent nuclear residence. > **Example using humans as organism of interest** > > ```{r data_prep, eval=FALSE} > organism = list(mito_pattern = "^MT-", > ribo_pattern = "^(RPS|RPL)", > nuclear = c("NEAT1","XIST", "MALAT1")) > > ``` For more information on data preparation, view the package articles on our [website]() ## Parameter selection ### `select_penalty` While `detect_damage` requires only a count matrix as input, additional parameters control aspects of the function’s computations. Of these, we recommend `ribosome_penalty` be adjusted for each dataset using the `select_penalty` function. This parameter ranges from 0 to 1 and adjusts the likelihood of ribosomal RNA loss during simulation, correcting for observed discrepancies where ribosomal RNA is retained more than expected based on transcript abundance. ```{r} penalty <- select_penalty( count_matrix = test_counts, max_penalty_trials = 3 # Shortened for the vignette ) penalty ``` ### `filter_threshold` `DamageDetective` offers the upper threshold 0.5 as the damage score above which cells are filtered, where values greater than 0.5 reflect more permissive filtering and values closer to 0 reflect more stringent filtering. We recommend the default, but suggest that if adjustments are made, they are informed by the output detect_damage plots, generate_plot = TRUE. For more information on parameters, please view the function documentation available on our website under **References**. # Running damaged cell detection Damage detection is run using the count matrix and ribosomal penalty as inputs. Below, we have additionally specified for `filter_counts` parameter to be TRUE. This will use the default `filter_threshold` and return the filtered count matrix that can be used immediately for the remainder of pre-processing. ```{r damage_detection} # Perform damage detection detection_results <- detect_damage( count_matrix = test_counts, ribosome_penalty = penalty, display_plot = FALSE, filter_counts = TRUE ) # View the resulting count matrix dim(detection_results$output) # View the plot detection_results$plot ``` # Session Information ```{r session-info, echo=FALSE} sessionInfo() ```