Skip to content

Fastool does not properly parse SRA files #5

@tsackton

Description

@tsackton

When processing SRA RNA-seq fastq files with Fastool as part of the Trinity package, Fastool appends a /H to the end of sequence ids which then causes errors downstream in Trinity.

Here are the first few lines of an SRA file: https://gist.github.com/tsackton/8c5508a4b60a1e33f6f2

When I run: fastool --to-fasta --illumina-trinity sra_test.fq > sra_test.1.fa , the output headers look like this:

SRR488565.1/H
SRR488565.2/H
SRR488565.3/H
SRR488565.4/H
SRR488565.5/H
SRR488565.6/H

If I remove everything after the first space in the sra example (with seqtk seq -C), the output is normal:

SRR488565.1
SRR488565.2
SRR488565.3
SRR488565.4
SRR488565.5
SRR488565.6

The /H files do not work with Trinity, while the normal files after seqtk seq -C processing do.

This is tested with the latest version of fastool, compiled on Centos 6 with gcc 4.8.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions