Hello,
I am trying to extract the fasta sequences corresponding to the MSTRG genes (RNA sequence/ tuxedo packages) identified during the DESeq2 analysis.
I´ve created a fasta file with all my genes IDs and MSTRGs and the corresponding sequences (transcripts.fa):
>rna-gnl|WGS:LPUQ|mrna.ZEAMMB73_Zm00001d048578 gene=MSTRG.1459
ATGCCAGGCGCCCGCGGCAAGGTCGCCGGAGCCGCCGAGGCGGGCGT...
and I have a list with all the DE MSTRG (res05_all.txt):
MSTRG.34995
MSTRG.16561
...
MSTRG.26406
I´ve tried this and it works for single MSTRG but not when I try to loop it:
grep -a "MSTRG..34995" stringtie_merged.gtf | head -n 1 > MSTRG..34995.bed
bedtools getfasta -fi genome.fa -bed MSTRG..34995.bed -fo MSTRG..34995.bed.fa.out
I´ve also tried samtools faidx without success. Any recommendations?
Thanks for this. really useful.
just a small correction, there is a missing ")" :
Find sequences with header that contains gene name.
matching_seqs <- seqs[str_detect(names(seqs), str_c(genes, collapse = "|"))]
Thanks! I'll fix it.