Normalization

Name*

1. Choose an incorrect statement about data normalization.

Normalization can remove technical biases from the data Skewness within the data is usually helpful for extracting meaningful insights Normalization should preserve the biological features of the data Downstream analysis steps such as clustering, differential expression, and enrichment analysis can be negatively affected if applied to unnormalized data

2. What is not a method for performing data normalization?

Z-Scoring Quantile Normalization Median Polish Affymetrix Genechip Log Transformation

3. After visualizing the original data shown in the histogram on the left (A), a researcher normalized the data to obtain the histogram on the right (B). Which normalization technique is the most likely to have been used in this process?

Log Transformation Z-Scoring Quantile Normalization Median Polish

4. Select the incorrect description of the Median Polish method.

Establishes an additive model for a two-way distribution Done by iteratively finding the overall, row, and column factors contributing to each value Iteration ends when residuals are minimized To obtain the best results, it is preferable to have as many iterations as possible

5. Choose the incorrect statement about RNA-Seq gene count normalization methods.

RPKM: Accounts for differences in gene length and read depth, useful for gene comparison across samples FPKM: Accounts for the quality of gene-gene co-expression correlation networks CPM: Account for read depth, useful for comparison between replicates TPM: Account for gene length and read depth, useful for gene comparison across samples

6. Select the incorrect statement about Z-score normalization.

It calculates the standard score using series mean and series standard deviation of each data point As a result of Z-scoring, the distribution will have a mean of 0 and a standard deviation of 1 As a result of Z-scoring, the distribution will have a mean of 1 and a standard deviation of 0 Each value is represented as the number of standard deviations away from the series’ mean

7. The image below represents the last step in the process of quantile normalization. Select the appropriate values to fill in the empty boxes.

(A): 3, (B): 8, (C): 5 (A): 3, (B): 6, (C): 5 (A): 4, (B): 6, (C): 5 (A): 4, (B): 5, (C): 8

8. The tables below show an example of the process of gene count normalization from original counts to RPKM/FPKM. Select the appropriate values to fill in the empty spaces. The processed tables are divided by reads per ten.

(A): 1kb, (B): CPM, (C): 1.43 (A): 10kb, (B): TPM, (C): 2.86 (A): 1kb, (B): Z-score, (C): 1.43 (A): 10kb, (B): CPM, (C): 1.42

9. Select the incorrect statement about the DESeq2 normalization method.

The DESeq2 normalization method uses the median of ratios method to normalize counts between samples to prepare the data for differential expression analysis The DESeq2 library is a widely used package for the analysis of microarray data The DESeq2 normalization method finds scaling factors for each sample accounting for differences in library size and composition The DESeq2 normalization method accounts for genes with zero counts and downplays genes that soak up many reads

10. Select the correct description of single-cell RNA-seq UMI.

Single-cell RNA-seq UMI counts are mostly 1 UMI counts display a continuous value distribution because UMI counts are predominantly in the range between 10 and 1000 counts Continuous values are difficult to normalize for library size compared to discrete value distributions UMI distribution typically fits a negative binomial distribution, and sometimes a zero-inflated model is used to model the observed distributions

BSR 6806: Programming for Big Data Biomedicine