I just mapped some bulk Seq reads to reference VDJ genes of T cell receptors in order to extract the T cell clonotypes (using mixcr). The resulted clonotypes come in a txt file per sample that looks like this:
However, in some cases, I would like to filter out some TRD clones, like for example here the first one that contains a TRDV3 and a TRDJ gene. Can one do this easily directly on the txt file in R? Or is there another way of doing instead of reading the table as a data frame, filtering, and then exporting it as txt again? I eventually import these txt files to vdjtools for further processing.
What mixcr command created that output? From their docs it sounds like outputs are generally just TSV files, so it'd be easy enough to do a read.table or what have you in R and go from there. But they also have a feature to convert things to AIRR format which could be handy too.
In any case filtering the table will involve some kind of of read+filter+write, whether with R or whatever else. Is something like an awk one-liner all you need?
Hi, just to add, its important to notice that a lot of times tra and trd clones share the same segments (V and J). From our experience only C gene can reliably distinguish between those two.
Oh, that's good to know about the segments! (chi.delta, watch out, then, if you're trying to recognize TRD from V+J gene names like I mentioned.) Though, aren't alpha/beta/gamma/delta TCR chains assembled from totally different loci? I'm confused how a beta chain could end up using a V gene from TRD for example. This is probably where my ignorance of TCRs vs IGs is showing though.
this just worked perfectly, thanks a lot!