Analysing full TCR transcripts
In addition to performing regular annotation of the rearranged V and J genes, and the intervening hypervariable CDR3, autoDCR
is able under certain circumstances to look for some additional features that should also be present in a fully mature TCR mRNA: the leader sequence, and the constant gene. If analysing suitably amplified/sequenced (or assembled) transcriptomic data, the leader sequence (the TCR chain signal peptide, containing the leader intron) will be spliced together and found immediately upstream of the V-REGION, while the constant region should be spliced on to the 3’ end of the J-REGION.
This is achieved by changing the --dcr_mode / -m
flag from its default value (vjcdr3
, describing its search for the V, J, and CDR3 sequences) to full
. Note that in order to be capable of doing so, additional steps need to be performed when generating the necessary files as described in the Generating reference data section. Doing this generates a second set of tags, specific for the constant and leader regions (C/L), which will be applied in a second round of Aho-Corasick string matching to a read if a rearranged V-J gene is detected.
This can be performed for both autoDCR annotate
and autoDCR cli
(being default for the latter command) as described below:
autoDCR annotate -in some-file -m full
tcr="TACCGGCTATCTGTACGGGGACAGATACAGAAGACCCCTCCGTCATGCAGCATCTGCCATGAGCATCGGCCTCCTGTGCTGTGCAGCCTTGTCTCTCCTGTGGGCAGGTCCAGTGAATGCTGGTGTCACTCAGACCCCAAAATTCCAGGTCCTGAAGACAGGACAGAGCATGACACTGCAGTGTGCCCAGGATATGAACCATGAATACATGTCCTGGTATCGACAAGACCCAGGCATGGGGCTGAGGCTGATTCATTACTCAGTTGGTGCTGGTATCACTGACCAAGGAGAAGTCCCCAATGGCTACAATGTCTCCAGATCAACCACAGAGGATTTCCCGCTCAGGCTGCTGTCGGCTGCTCCCTCCCAGACATCTGTGTACTTCTGTGCCAGCAGACTGGACAGGGAGTACGAGCAGTACTTCGGGCCGGGCACCAGGCTCACGGTCACAGAGGACCTGAAAAACGTGTTCCCCTCT"
# Use the default 'full' autoDCR mode to analyse a long TCR read
autoDCR cli $tcr
TCR regions detected!
orientation forward
productive yes
v_call TRBV6-5*01
j_call TRBJ2-7*01
junction_aa CASRLDREYEQYF
l_call TRBV6-5*01
Things to note
When using ‘full’ mode with
annotate
, additional columns will be generated in the output TSV file, which aim to give a wider-inference as to the predicted productivity of a given rearrangement than just looking at the V-J region section.Currently ‘full’ TCR analysis is only valid when analysing nucleotide sequences (and thus cannot be used with the protein analysis capabilities as described in the Annotating TCR protein sequences section).