From LipidomicsWiki
Contents |
Standard Processing Procedures, an Introduction
The definition of Standard Processing Procedures or SPPs constitutes one of the deliverables for the bioinformatics workpackage (and thus the bioinformatics task force) of LipidomicNet.
Typically, an SPP should consider inputs, outputs, and the actual processing that will take place.
From this, it is clear that a good coordination with the different Task Forces that deliver input data is achieved, as well as with the project participants who aim to utilize the output that is presented. Domain experts and bioinformaticians can then define the necessary processing steps to achieve the desired results from the given input data.
Another important aspect of SPPs is that they rely on underlying standards for data formatting and syntax. This primarily has a bearing on the integration of results, derived from data obtained from various domains, and may also facilitate reuse of certain components across domains.
Data Format Standards to Aid Integration
Gene names and identifiers
It has been agreed in the First Bioinformatics Workshop of the LipidomicNet project, that all genes should be referred to by their at least their Ensembl Gene ID (for human genes, these take the form 'ENSG###', with '###' a number -- for mouse genes, they take the form 'ENSMUSG###'). The main advantages of using Ensembl Gene ID's, are that Ensembl provides consistent coverage of the genome, transcriptome and proteome within one database, and that Ensembl is extremely well connected to outside resources. This means that most commonly used identifiers in various domains can be quickly and reliably mapped to Ensembl Gene IDs. A great way of doing this is by using the Ensembl BioMart. Furthermore, Ensembl provides mappings between various organisms, providing direct support for the translation of mouse-derived results into potential human targets. If you would like to obtain more information on Ensembl and BioMart, you can have a look at the presentations and tutorial material from EMBL-EBI given in the presentations of the Bioinformatics Workshop. You can also email Lennart Martens for more information.
Sample Annotation and General Metadata
It has also been agreed in the First Bioinformatics Workshop of the LipidomicNet project, that existing controlled vocabularies or ontologies should be used for the annotation of sample annotation and experiment metadata. Various specialized and widely used free ontologies exist, quite a few of which can be browsed online at the Ontology Lookup Service (OLS). For an example of how these annotations can be used to browse assembled data, see the bottom half of the 'Browse Experiments...' page of the Proteomics Identifications Database (PRIDE) at EBI. Here, you can browse the stored proteomics data by species, tissue, cell type, Gene Ontology annotation or disease state. The page is automatically generated from the real data in the database. Finally, clicking a term such as brain in the tissue section will not only retrieve data annotated as brain, but also data annotated as cerebral cortex; this happens because the ontology used, indicates that cerebral cortex is a part of brain. Using an ontology to annotate data thus implicitly makes powerful data retrieval intuitive and simple.
Standard Processing Procedures
SPPs by Platform
Lipidomics and metabolomics
Microarray SPPs
RT-PCR SPPs
Mass Spectrometry Proteomics and Protein Chip SPPs
Fluorescence Microscopy (High-Content Imaging) SPPs
TaqMan SPPs
Roche SPPs
