SAM to eland conversion
1
0
Entering edit mode
10.3 years ago
geneart$$ ▴ 50

Hello Biostars,

I was trying out a new ( relatively ) software proTrac to map piRNA clusters. THe software calls for ELAND formatted files. In my quest for conversion tools I found pyicos to convert SAM to eland but the eland output had many fields than specified in the document for proTrac.Looks like that is a different version of ELAND format. However the bottom line is the format needed for input files into proTrac needs to have (shown below) as indicated in the proTrac documentation(shown below).

My question is :

Does anyone have any experience using this software? or have converted SAM to ELAND3 using a tool? or any suggestions helps :)

Thanks:)

Geneart.

4. Input file proTRAC uses a list of mapped sequence reads (ELAND3) generated by the SeqMap mapping tool (Jiang, H., Wong, W.H. (2008) SeqMap: Mapping Massive Amount of Oligonucleotides to the Genome, Bioinformatics, 24(20)). SeqMap is freely available at http://www-personal.umich.edu/~jianghui/seqmap/. Map your sequence dataset in FASTA-format to a genome of your choice. Many genomes are available at ftp://ftp.ncbi.nih.gov/genomes/. To obtain the correct output format, run SeqMap with the option /output_all_matches. Use the generated output file without any changes as input file for proTRAC. If your sequence dataset contains transcriptional information (a non-redundant FASTA file where each FASTA title refers to the number of identical sequence reads),

1 ATGGCTCGACTCGCGATAC
45 TGGCTTTATTGCGCTTTTAACA
12 ATTCGCTAACGGGCGAAAAG

this information can be used to display different transcription rates within one cluster, since FASTA titles are saved and can be extracted from the SeqMap output file:

trans_id trans_coord target_seq probe_id probe_seq num_mismatch strand
Chr1 10368 ATGGCTCGACTCGCGATAC 1 ATGGCTCGACTCGCGATAC 0 -
Chr1 44754 ATTCGCTAACGGGCGAAAAG 12 ATTCGCTAACGGGCGAAAAG 0 -
Chr1 56834 TGGCTTTATTGCGCTTTTAACA 45 TGGCTTTATTGCGCTTTTAACA 0 -
Chr1 96823 ATTCGCTAACGGGCGAAAAG 12 ATTCGCTAACGGGCGAAAAG 0 -
RNA-Seq • 2.4k views
ADD COMMENT
0
Entering edit mode
9.4 years ago
A. Domingues ★ 2.7k

This comes a bit late, but maybe it can be useful for others. proTrac does need read Eland format, but it is a modified format in which the reads with the same sequence are collapsed into one fast entry whose header contains the number of reads with that sequence. Here is an example:

>34
TGGCACGTGCGATGGCAGTTCAACGTTCA
>23
TAAATCCCGTAGGCTTTTAGCATCGACG
>2
TTATTCGAAGCGCTTTACGACACGCGCGCA

This is done with the script TBr2_collapse.pl of their NGS toolbox, and it is detailed in proTrac's documentation. I strongly recommend to and follow the instructions using the scripts available in the Toolbox. It will save you a lot of trouble.

ADD COMMENT

Login before adding your answer.

Traffic: 1705 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6