Skip to content

Add prophex filter #16

@simonepignotti

Description

@simonepignotti

Description

An interesting application of prophyle would be to filter reads based on k-mer hits. To make this more efficient and user-friendly, we should add a separate command for it.

Specification

Usage:   prophex filter [options] -k INT <index_prefix> <in1.fq> [in2.fq]

Options:
         -k INT   length of k-mer
         -m FLOAT keep only reads with proportion of kmers >= FLOAT (0.0,1.0] [0.3]
         -n INT   keep only reads with number of kmers >= INT (alternative to -m)
         -o       prefix for fastq for passing reads
         -f       prefix for fastq for filtered reads
         -u       use k-LCP for querying
         -b       print sequences and base qualities
         -l STR   log file name to output statistics
         -t INT   number of threads [1]
         -h       print help message
  • If -n is used, -m is ignored.

  • If [in2.fq] is provided, ProPhex will create pref.1.fq and pref.2.fq in case of -o or -f options (pref.fq otherwise). The thresholds are applied on the merged read (while subtracting the N...N separator from counts).

  • Output is in the Kraken-like format. The first column encodes whether read passes C (passes) / U (filtered out).

  • When k-mer blocks are formed, X can be used for unclassified (similarly to A = ambiguous).

Example

prophex filter -k 13 -u -m 0.2 -o passed -f filtered index_prefix in1.fq > output.txt

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions