This tutorial is intended as an addendum to the base
qiime-decontam-tutorial that can be found here
The decontam-identify-batches
functionality is intended for
identification of batch associated contamination. This form of
contamination identification will allow for tracking of individual
contaminants through the sequencing process and will assist in
identification of the sources of contamination.
The pipeline action decontam-identify-batches
initially
splits a feature table into multiple subset tables based on the
categories in a metadata column provided by the user. For example, to
compare contamination across sequencing runs it maybe useful to split
the table based which sequencing run each sample was generated from.
Then on each of the subset tables, decontam processes are run and a
comprehensive visual is produced. This action utilizes both
decontam-identify
and decontam-score-viz
however it is not intended will to take the place of
decontam-identify
and decontam-score-viz
actions.
This tutorial will include example commands with data that can be found below:
Feature Data Table | table.qza
Representative Sequence Table | rep_seqs.qza
To get the associated metadata file for the example data you will need to run the below command:
wget https://raw.githubusercontent.com/jordenrabasco/q2-decontam-tutorial/refs/heads/main/data_objs/metadata.tsv
The decontam-identify-batches
function has the inputs
and parameters for both the decontam-identify
action and
the decontam-score-viz
functions as well as the options for
feature-tables split
. Those inputs and parameters
are;
--i-table
: takes in a FeatureTable[Frequency] artifact such
as an ASV or OTU table
--m-metadata-file
: is your metadata file (needs to be tab
delimited)
--i-rep-seqs
: this takes in a FeatureData[Sequence]
artifact, which is the artifact generated from uploading a .fasta file
into a qiime environment.
--p-split-column
: this indicates the metadata column which
will subset the input table
--p-filter-empty-features
/
--p-no-filter-empty-features
: These are two options which
explicitly indicate whether the split tables will have their empty
features filtered out. --p-filter-empty-features
will
filter out empty features in the subsetted tables and
--p-no-filter-empty-features
will not remove empty features
from the subset tables.
--p-method
: denotes the method that will be used (in this
case “Frequency”)
--p-freq-concentration-column
: denotes the metadata column
that holds the concentration information of each sample within the
metadata file
--p-prev-control-column
: denotes the column in the metadata
file contains the information on whether a sample is an experimental or
a control sample
--p-prev-control-indicator
: text within the
--p-prev-control-column
that identifies a control
sample
--p-threshold
: this is the threshold at which features are
designated as not contaminants. Below this threshold the features are
considered contaminants.
--p-weighted
/ --p-no-weighted
: These are the
two options which explicitly indicate whether to weigh the histogram in
the .qzv
file by the read abundance.
--p-weighted
indicates that the histogram will be weighted
while --p-no-weighted
will produce a histogram of the
features instead of reads at each decontam score bin.
--p-bin-size
: This indicates what the bin size of the
histogram in the .qzv
output should be. It is recommend
that the bin size be 0.05.
--o-batch-subset-tables
: This indicates the folder where
the subset tables will be sent too. The tables will be renamed with both
the subset column name and their subset variable to allow for
differentiation.
--o-decontam-scores
: This indicates the folder where
decontam score tables for the subset tables will be sent too. The
decontam score tables will be renamed with both the subset column name
and their subset variable to allow for differentiation.
--o-score-histograms
: The output .qzv
file
from the the action that can be visualized on the qiime serves which can
be found here
To run decontam-identify-batches
with the example data
use the following command:
qiime quality-control decontam-identify-batches --i-table table.qza --i-rep-seqs rep_seqs.qza --m-metadata-file metadata.tsv --p-split-column subject --p-filter-empty-features --p-method combined --p-prev-control-column Sample_or_Control --p-prev-control-indicator control --p-freq-concentration-column Concentration --p-threshold 0.1 --p-no-weighted --p-bin-size 0.05 --output-dir batches_outputs
This will produce an output folder will all of the designated outputs which can be seen here