-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Labels
Description
Description
An interesting application of prophyle would be to filter reads based on k-mer hits. To make this more efficient and user-friendly, we should add a separate command for it.
Specification
Usage: prophex filter [options] -k INT <index_prefix> <in1.fq> [in2.fq]
Options:
-k INT length of k-mer
-m FLOAT keep only reads with proportion of kmers >= FLOAT (0.0,1.0] [0.3]
-n INT keep only reads with number of kmers >= INT (alternative to -m)
-o prefix for fastq for passing reads
-f prefix for fastq for filtered reads
-u use k-LCP for querying
-b print sequences and base qualities
-l STR log file name to output statistics
-t INT number of threads [1]
-h print help message
-
If
-nis used,-mis ignored. -
If
[in2.fq]is provided, ProPhex will createpref.1.fqandpref.2.fqin case of-oor-foptions (pref.fqotherwise). The thresholds are applied on the merged read (while subtracting theN...Nseparator from counts). -
Output is in the Kraken-like format. The first column encodes whether read passes
C(passes) /U(filtered out). -
When k-mer blocks are formed,
Xcan be used for unclassified (similarly toA= ambiguous).
Example
prophex filter -k 13 -u -m 0.2 -o passed -f filtered index_prefix in1.fq > output.txt