Annotating TCR protein sequences
While the vast majority of TCR analyses are concerned with gDNA or cDNA sequences, there are some rare applications that require annotation of V/J/CDR3 regions of TCR protein sequences. However tools capable of processing such data in a comparable manner as nucleotide data are hard to find (e.g. while IgBLAST can determine V genes it doesn’t seem to be able to annotate CDR3 junctions or J genes). However tasks like re-annotation of structural data that doesn’t have rearrangement information contained in its metadata are becoming more common, as people re-purpose such data for informing TCR antigen predictors.
The two major autoDCR
V/J/CDR3 annotation subcommands - autoDCR annotate
and autoDCR cli
are able to process such sequences, through generation of tag tries based off translated TCR protein germline sequences. Note that in order to do so the proper references must have been generated in the correct order, as described in the Generating reference data section.
In order to do so, users need supply the boolean --protein / -aa
flag to their autoDCR annotate
or autoDCR cli
commands. E.g. the following examples, using some sequences taken from PDB TCR-pMHC structures:
# Inspect the file of TCR polypeptide chains
cat prot-tcrs.fasta
>3MV7_4|Chain D|alpha chain of the TK3 TCR|Homo sapiens (9606)
QVTQSPEALRLQEGESSSLNCSYTVSGLRGLFWYRQDPGKGPEFLFTLYSAGEEKEKERLKATLTKKESFLHITAPKPEDSATYLCAVQDLGTSGSRLTFGEGTQLTVNPNIQNPDPAVYQLRDSKSSDKSVCLFTDFDSQTNVSQSKDSDVYITDKCVLDMRSMDFKSNSAVAWSNKSDFACANAFNNSIIPEDTFFPS
>3MV7_5|Chain E|beta chain of the TK3 TCR|Homo sapiens (9606)
DSGVTQTPKHLITATGQRVTLRCSPRSGDLSVYWYQQSLDQGLQFLIQYYNGEERAKGNILERFSAQQFPDLHSELNLSSLELGDSALYFCASSARSGELFFGEGSRLTVLEDLKNVFPPEVAVFEPSEAEISHTQKATLVCLATGFYPDHVELSWWVNGKEVHSGVCTDPQPLKEQPALNDSRYALSSRLRVSATFWQNPRNHFRCQVQFYGLSENDEWTQDRAKPVTQIVSAEAWGRAD
>2AK4_4|Chains D, I, N, S[auth T]|SB27 T cell receptor alpha chain|Homo sapiens (9606)
HMAQKVTQAQTEISVVEKEDVTLDCVYETRDTTYYLFWYKQPPSGELVFLIRRNSFDEQNEISGRYSWNFQKSTSSFNFTITASQVVDSAVYFCALSGFYNTDKLIFGTGTRLQVFPNIQNPDPAVYQLRDSKSSDKSVCLFTDFDSQTNVSQSKDSDVYITDKCVLDMRSMDFKSNSAVAWSNKSDFACANAFNNSIIPEDTFFPSPESS
>2AK4_5|Chains E, J, O[auth P], T[auth U]|SB27 T cell receptor beta chain|Homo sapiens (9606)
HMNAGVTQTPKFQVLKTGQSMTLQCAQDMNHNSMYWYRQDPGMGLRLIYYSASEGTTDKGEVPNGYNVSRLNKREFSLRLESAAPSQTSVYFCASPGLAGEYEQYFGPGTRLTVTEDLKNVFPPEVAVFEPSEAEISHTQKATLVCLATGFYPDHVELSWWVNGKEVHSGVCTDPQPLKEQPALNDSRYALSSRLRVSATFWQNPRNHFRCQVQFYGLSENDEWTQDRAKPVTQIVSAEAWGRAD
# Then analyse this with annotate
autoDCR annotate -in prot-tcrs.fasta -aa
Looking for TCRs in results/prot-tcrs.fa
Took 0.0 seconds
Found 4 rearranged TCRs in 4 reads (~100%)
# Or with cli
trb="HMNAGVTQTPKFQVLKTGQSMTLQCAQDMNHNSMYWYRQDPGMGLRLIYYSASEGTTDKGEVPNGYNVSRLNKREFSLRLESAAPSQTSVYFCASPGLAGEYEQYFGPGTRLTVTEDLKNVFPPEVAVFEPSEAEISHTQKATLVCLATGFYPDHVELSWWVNGKEVHSGVCTDPQPLKEQPALNDSRYALSSRLRVSATFWQNPRNHFRCQVQFYGLSENDEWTQDRAKPVTQIVSAEAWGRAD"
autoDCR cli $trb --protein
TCR regions detected!
productive yes
v_call TRBV6-1*01
j_call TRBJ2-7*01
junction_aa CASPGLAGEYEQYF
Things to note
Currently the protein version of
autoDCR
only works in standard ‘vjcdr3’ mode, not ‘full’ (i.e. it cannot be used to detect leader or constant regions).