From LipidomicsWiki
For the analysis of high throughput data it is important that the processing and analysis tools are set up in a pipeline, such that the output of one tool can be used as input for the next tool in the pipeline with a minimum of manual intervention. Only then it is possible to cope with the large data quantities in a practical amount of time with a minimum of handling errors.
Contents |
Protein Mass Spectrometry
MALDI-MS
For MALDI-MS there are two major challenges when setting up a processing pipeline.
Firstly, the large number of individual data sets generated in a fairly short time-frame can only be handled by automatically handing the data sets from one software to the other.
Secondly, the acquisition of PFF data is much more effective, if the PMF data is processed first and based upon these results the acquisition of PFF data sets is initiated.
Both challenges have been addressed and the interfacing of the proteinscape database with the mass spectrometer has been achieved. Automated processing scripts within the software ensure the upkeep of the data flow. Once the data is in the database, further scripting can be used for automation.
LC-ESI-MS/MS
For LC-ESI-MS/MS, the processing is rather easier as all data is contained within one data set. Thus the main challenge is the automatic import of the data sets into the database. This is especially important here, as the processing of these large data sets is very time-consuming. As the data acquisition runs round the clock, the automated import of data sets is important to use the entire available for data processing.
Further data processing
The further processing of the resulting protein identifications is very diverse by nature. The interfacing to the further data analysis steps is facilitated by the ability to export almost any given data view to Microsoft Excel for processing.
Protein Chips
The different programs and tools described under D7.3.3 have been designed and set up in such a pipeline as shown in the following figure. It shows the data flow starting from the biological sample to the final bioinformatical output. Programs and tools are shown in blue, input and output data are shown in black.
In a classification study dozens or hundreds of serum samples are processed leading to as many different GPR files. However, these files are organized in a well defined folder hierarchy that is automatically searched by the programs chipQM and mergeGPR. The output of mergeGPR is then directly used by the programs DTREG for classification and selectFeatures for antigen ranking. Further analysis tools are under construction that will also be integrated in the processing pipeline and which will make use of the output of mergeGPR.
Proteomics SPPs
Proteomics SPPs
Standard Processing Procedures Main Page
