Data process


The processing of EuphausiiDB data is done by following the assembly and annotation workflows whose scripts are present in the euphausiiDB-bioanalysis directory.

The transcriptome assembly workflow:

It includes 5 distinct steps:
  1. Quality evaluation of raw data with FastQC;
  2. Raw data processing with Trimmomatic to filter and trim reads according to their sequence quality;
  3. Quality evaluation of cleanned data with FastQC;
  4. De novo assembly step using rnaSPAdes / Trinity ;
  5. Quality evaluation of the assembled transcripts using BUSCO and salmon .

Github repository:

The annotation workflow:

Downstream analyses of assemblies includes:
  1. Transcriptome completion evaluation using Busco;
  2. Prediction of ribosomal RNA gene locations in transcripts
  3. Diamond a sequence aligner for protein and translated DNA searches against Uniref90 and uniprot-swissprot databases
  4. Prediction of coding regions prediction using TransDecoder;
  5. cmsearch uses the covariance model (CM) in cmfile to search for homologous RNAs in seqfile, and outputs high-scoring alignments
  6. Functional annotation of predicted proteins using the InterProscan pipeline from EMBL-EBI.
  7. PS: as of assembly 018, there are no associated KEGG terms because InterProscan version 5.59-91.0 does not include them in the annotation

Github repository: