Entering edit mode
4.7 years ago
David
▴
240
Hi,
I´m mapping a set of ont long reads to the human genome:
I´m using this genome version from gencode: ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_33/GRCh38.p13.genome.fa.gz
See description here: https://www.gencodegenes.org/human/
I´m using a skin sample so i expect to find bacteria ... and fungi from Malazessia genus for exemple. But i get no hits since they are filtered when using minimap against the HG reference above.
I´m wondering how are you delaing with these false positives sequences (in such case fung) in human assemblies ??
Thanks
Are you certain? Why should they be filtered?
I am just speculating but I assume you prepped this sample to get fragments that were as long as possible. Perhaps non-human DNA was removed in that process? Fungi have tough cell walls so they could have been excluded?
Have you tried to use a software package like this to see if you are able to find other sequences in your entire data set?
The fungi sequences are filtered together with the human genome, the reason why is because the human assemblies contain sequences that unfortunatly belong to contaminates (such as bacterial and fungi). These are hard to filter so was wondering.
The following post from Brian bushnell (bbmap package) discuses in 3) the fungi contamination. His suggestion is to map fungi databases against human genome and mask these sequences from the Human assembly.
http://seqanswers.com/forums/archive/index.php/t-42552.html
Just wondering if others have used a similar approach.
Brian is referring to short reads which are much more likely to be multi-map/false map.
What is the average read size in your case? At nanopore sized read length you should see something (if you expect non-human sequences) in your data. If you are seeing nothing then my hunch is still for sample prep eliminating non-human sequences. I guess you are not interested in human data for this experiment?
Yes that´s a metagenomics experiment on skin, i´m not interested in human , but bacteria and fungi. For some reasons fungi are filtered out:
This is the command line:
Is that ok ?
Can you try:
Command is from this thread.
contaminant.fasta
would be your human genome.The -F 0x900 is for excluding secondary and supplementary alignments from the unmapped reads. I don´t see why secondary alignments would filter fungi unless human reads map to fungi ?
Do you have secondary alignments? You have not said anything about the length of your reads so far.
Yes i do have secondary aligments:
As for the reads length: Average read length is 2553 bp
So i´m going to investigate secondary aligmments as i guess that´s the problem