EuPaGDT

Eukaryotic Pathogen CRISPR gRNA Design Tool

with (1) custom genome upload, (2) off-target analysis, (3) on-targets searching (for targeting gene families), (4) efficiency/activity prediction, (5) assisted oligo repair template design , (6) gRNA transcription problem identification, (7) flanking microhomology searching (for predicting deletions)




  For each gRNA, the target score reflects how well a gRNA can theoretically target on-targets while avoid hitting off-targets, it is calculated as follows:

  The first term of the target score equation evaluates how well the current gRNA hits on-targets compared to the gRNA in current input sequence that hits most on-targets. Maximum on-target index is the maximum value of "on target indexes" of all the gRNA found in a given sequence, it represents the best on-target number that gRNAs in a given input sequence might have, for example, a single-locus 2-allele gene will have 2 as the "maximum on target index”, and a 5-copy gene would have 5 as the "maximum on target index”. The first term of the equation will have a maximum value of 1.
  The second term of the formula evaluates how the current gRNA avoids hitting off-target. Breifly, the second term would be a small fraction number, which is deducted from first term, if a gRNA has far more on-targets than off-targets; the second term would be >1 if a gRNA have more off-targets than on-targets, rendering a negative targets score. The target score equation as a whole works uniformaly for single locus genes as well as large gene families.


  Targets are identified by aligning the 20nt target sequence and 3bp PAM motif to the genome, up to 5 mismatches are allowed given they are not in PAM motif


  On-target is determined by regional homology(containing gRNA) of input sequence with genome sequence. Briefly, the gRNA sequence with homology arms of 50bp flanking sequence up and down-stream (total 123bp) is taken from the input sequence, then BLASTed against the genome, if a hit is found to have >70% coverage with >70% identity with the 123bp query sequence, that genomic position is considered a on-target hit for the gRNA. (we are working to have homology arm length and homology parameters adjustable, to accommodate sequence diversity of gene alleles and gene family members ). Out pilot runs show this method could reliably identify on-target hits for both single locus genes and gene families. It would also reveal any previously unnanotated duplicated-copies of your input gene.
  Off-target is automatically assigned to a genomic hit position if no on-target is identified for a gRNA. Please note that some off-targets might be misidentified due to misidentification of on-target as a result of truncated gene copies, excessive divergence etc. User should refer to the off-target detailed information to examine the off-targets manually.


  Scoring microhomologies is experimental at this stage because more information on how parasites use microhomology to repair DSB is required. Therefore EuPaGDT assigns each gRNA a score on an arbitrary scale of 0-1 reflecting length of microhomology pair(s) and its/their proximity to the gRNA-directed cut site. Briefly, the microhomology score will be 1 for an ideal pair of microhomology (>20 in length, and immediately flank the gRNA cut site), for non-ideal pairs, the score is calculated as follows:

Note: 24000 is the maximum value, dividing by which converts the score to 0-1 scale


  gRNA efficiency/activity is scored by GC content and positional-specific nucleotide composition, based on a scoring matrix developed by Doench et al.2014 which scores 4bp upstream of targeting sequence, the 20nt targeting sequence, the first base of PAM motif, and 3bp downstream of the PAM motif.


  Total score is an unweighted average of the target score, flanking microhomology score and efficiency/activity score.