A novel representation of genomic sequences for taxonomic clustering and visualization by means of self-organizing maps.

Journal: Bioinformatics (Oxford, England)
Published Date:

Abstract

MOTIVATION: Self-organizing maps (SOMs) are readily available bioinformatics methods for clustering and visualizing high-dimensional data, provided that such biological information is previously transformed to fixed-size, metric-based vectors. To increase the usefulness of SOM-based approaches for the analysis of genomic sequence data, novel representation methods are required that automatically and objectively transform aligned nucleotide sequences into numeric vectors, dealing with both nucleotide ambiguity and gaps derived from sequence alignment.

Authors

  • Soledad Delgado
    Department of Information Structure and Organization, Universidad Politécnica (UPM), Madrid 28031, Department of Biochemistry and Molecular Biology I, Universidad Complutense (UCM), Madrid 28040, Department of Computer Architecture and Computer Technology, Universidad de Granada (UGR), Granada 18071, Spain, CITIC, Campanillas, Malaga 29590, Spain, Department of Molecular Evolution, Centro de Astrobiología (CSIC-INTA), Torrejón de Ardoz, Madrid 28850 and Centro de Investigación Biomédica en Red de enfermedades hepáticas y digestivas (CIBERehd), Barcelona 08036, Spain.
  • Federico Morán
    Department of Information Structure and Organization, Universidad Politécnica (UPM), Madrid 28031, Department of Biochemistry and Molecular Biology I, Universidad Complutense (UCM), Madrid 28040, Department of Computer Architecture and Computer Technology, Universidad de Granada (UGR), Granada 18071, Spain, CITIC, Campanillas, Malaga 29590, Spain, Department of Molecular Evolution, Centro de Astrobiología (CSIC-INTA), Torrejón de Ardoz, Madrid 28850 and Centro de Investigación Biomédica en Red de enfermedades hepáticas y digestivas (CIBERehd), Barcelona 08036, Spain.
  • Antonio Mora
    Department of Information Structure and Organization, Universidad Politécnica (UPM), Madrid 28031, Department of Biochemistry and Molecular Biology I, Universidad Complutense (UCM), Madrid 28040, Department of Computer Architecture and Computer Technology, Universidad de Granada (UGR), Granada 18071, Spain, CITIC, Campanillas, Malaga 29590, Spain, Department of Molecular Evolution, Centro de Astrobiología (CSIC-INTA), Torrejón de Ardoz, Madrid 28850 and Centro de Investigación Biomédica en Red de enfermedades hepáticas y digestivas (CIBERehd), Barcelona 08036, Spain Department of Information Structure and Organization, Universidad Politécnica (UPM), Madrid 28031, Department of Biochemistry and Molecular Biology I, Universidad Complutense (UCM), Madrid 28040, Department of Computer Architecture and Computer Technology, Universidad de Granada (UGR), Granada 18071, Spain, CITIC, Campanillas, Malaga 29590, Spain, Department of Molecular Evolution, Centro de Astrobiología (CSIC-INTA), Torrejón de Ardoz, Madrid 28850 and Centro de Investigación Biomédica en Red de enfermedades hepáticas y digestivas (CIBERehd), Barcelona 08036, Spain.
  • Juan Julián Merelo
    Department of Information Structure and Organization, Universidad Politécnica (UPM), Madrid 28031, Department of Biochemistry and Molecular Biology I, Universidad Complutense (UCM), Madrid 28040, Department of Computer Architecture and Computer Technology, Universidad de Granada (UGR), Granada 18071, Spain, CITIC, Campanillas, Malaga 29590, Spain, Department of Molecular Evolution, Centro de Astrobiología (CSIC-INTA), Torrejón de Ardoz, Madrid 28850 and Centro de Investigación Biomédica en Red de enfermedades hepáticas y digestivas (CIBERehd), Barcelona 08036, Spain Department of Information Structure and Organization, Universidad Politécnica (UPM), Madrid 28031, Department of Biochemistry and Molecular Biology I, Universidad Complutense (UCM), Madrid 28040, Department of Computer Architecture and Computer Technology, Universidad de Granada (UGR), Granada 18071, Spain, CITIC, Campanillas, Malaga 29590, Spain, Department of Molecular Evolution, Centro de Astrobiología (CSIC-INTA), Torrejón de Ardoz, Madrid 28850 and Centro de Investigación Biomédica en Red de enfermedades hepáticas y digestivas (CIBERehd), Barcelona 08036, Spain.
  • Carlos Briones
    Department of Information Structure and Organization, Universidad Politécnica (UPM), Madrid 28031, Department of Biochemistry and Molecular Biology I, Universidad Complutense (UCM), Madrid 28040, Department of Computer Architecture and Computer Technology, Universidad de Granada (UGR), Granada 18071, Spain, CITIC, Campanillas, Malaga 29590, Spain, Department of Molecular Evolution, Centro de Astrobiología (CSIC-INTA), Torrejón de Ardoz, Madrid 28850 and Centro de Investigación Biomédica en Red de enfermedades hepáticas y digestivas (CIBERehd), Barcelona 08036, Spain Department of Information Structure and Organization, Universidad Politécnica (UPM), Madrid 28031, Department of Biochemistry and Molecular Biology I, Universidad Complutense (UCM), Madrid 28040, Department of Computer Architecture and Computer Technology, Universidad de Granada (UGR), Granada 18071, Spain, CITIC, Campanillas, Malaga 29590, Spain, Department of Molecular Evolution, Centro de Astrobiología (CSIC-INTA), Torrejón de Ardoz, Madrid 28850 and Centro de Investigación Biomédica en Red de enfermedades hepáticas y digestivas (CIBERehd), Barcelona 08036, Spain.