Hi everybody,
I am new to bioinformatics and I am following a de novo transcriptom assembly workflow. I have about 20 paired end fastq files (10*2 files). The assembly has finished and I got a Trinity.fasta as output. I am now working on a susbset of interested transcripts in a fasta file : subset.fasta.
What I did is running a bowtie2 (using my reads and subset.fasta) and now I have 10 .bam files associated. I have checked some tools like HTseq but I am honestly lost in the documentation.
Does anyone know an easy ro reach my goal from there (the number of reads mapping each transcript of subset.fasta for each of my 10 samples) ?
Thank you for your support !
It's still not clear to me what you are doing. This is DNASeq? RNASeq? Do you have assemblies or just bams?
It is RNA seq. From my reads files I got an assembly Trinity.fasta. I am working now on a subset of the Trinity.fasta : subset.fasta. And now I have 10 .bam files that I got thanks to bowtie2 (using as inputs all the reads and subset.fasta).
When you align to a subset of what you know is there, it can force reads to align to places they really don't belong. Better to align to the whole reference, then filter out what you don't care about after.
Samtools idxstats will quickly give you a count of how many reads aligned to each sequence in your reference, though you'll need to think about what you want to do with reads that align to multiple places, versus what Bowtie actually does to such reads.