Hello! I am pretty new to the bioinformatic field and I am dealing with some RNAseq data.
I am trying to run the protocol described in Callari et al. BMC Genomics 2018 in order to discriminate mouse and human reads into my PDX-derived samples.
In order to do so, as suggested in the paper and following some of their commands, I merged the human and mouse genome and run Tophat on that. I finally got a .bam file that is now containing reads aligned to both the human and the mouse genome.
In order to proceed with the remaining RNAseq analysis, I want to subselect only those reads that matched only to either the human's or to the mouse's genome.
The problem is that when I run:
samtools view -h path/to/ICRG_aligned.bam
| grep -v $'\t''m.chr'
| grep -v 'SN:m.chr'
| samtools view -b - > path/to/genome_human_only.bam
it returns me an error saying: "[bam_header_read] EOF marker is absent. The input is probably truncated. [bam_header_read] invalid BAM binary header (this is not a BAM file). [main_samview] fail to read the header from "-"."
This code was taken from the original code reported in the paper.
Also, I checked my bam file and it contains normal @-starting headers.
Can anybody help me to figure out what is the problem in my code? Or maybe if there is an easier approach to tackle this easy task?
Note: the chromosome's names of my fastq and gtf files were modified so that the human's ones are reported as "chr" whereas the mouse's as "m.chr".
Thank you in advance!