Skip to content

Different counts: Full genome vs split genome #109

@ReemaSingh

Description

@ReemaSingh

Hi,

I wonder if anyone in this community can help me figure out the reason behind the discrepancy in counts. Please see below:

I am using HTSeq-count to generate counts from two different types of alignments. These alignments were generated using two different types of genome indexes built from the same genome using the STAR aligner. See below the detailed process I have used to generate the counts:

Genome used: Mus_musculus.GRCm39.dna_sm.toplevel.fa

Scenario 1: Using full genome to generate genome index and counts:

  1.   Here, I have generated a genome index for the entire genome (i.e., Mus_musculus.GRCm39.dna_sm.toplevel.fa)
    
  2.   Then aligned my fastq (e.g., Sample1.fq.gz) file
    
  3.   Generates the counts
    

Scenario 2: Split the genomes into separate chromosomes.

  1.  ./faSplit byname Mus_musculus.GRCm39.dna_sm.toplevel.fa Genome/ - This generates 61 separate fasta sequences
    
  2.  Then, I generated 61 genome indexes for 61 fasta sequences
    
  3.  Aligned fastq (e.g., Sample1.fq.gz) file separately to the 61 genome indexes. This generates 61 different alignments
    
  4.  Combined 61 bam files into one bam file.
    
  5.  Generated the counts
    

Ideally, the final counts generated from both the above scenarios should be the same. But, in my case, the counts generated from Scenario 2 are much higher than Scenario 1 in a lot of IDs.

I have also tried feature-count (from the Rsubread Bioconductor package) to generate counts from the above two scenarios but again the counts are different.

I would highly appreciate any response.

Many Thanks,
Reema,

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions