High throughput data analysis

With acquisition of a massive parallel sequencer (Illumina NextSeq 500), Nanostring and microarray platforms (Illumina BeadStation) along with other high-throughput experimental technologies, The Wistar Institute's Cancer Center is well positioned to pursue new avenues of cancer research, particularly in understanding and modeling the genomic changes in cancer development and progression. These technologies generate huge genome-wide multiple data sets that require equally complex and sophisticated databases and analyses tools. The Bioinformatics Shared Resource collaborates with the Center for Systems and Computational Biology to develop integrative analytical frameworks for the analysis of the data sets generated by Wistar investigators. Bioinformatics provides consulting and integrative data-mining support for:

  • Analyzing NGS data including ChIP-seq, RNA-seq, small RNA-seq, ChIP-seq, whole genome and exome sequencing, SNP genotyping, genome re-sequencing, and de-novo sequencing.
  • Analyzing microarray data including gene expression, ChIP-chip, methylation profiling, copy number variation (CNV), SNP genotyping, miRNA profiling, protein/peptide array data.
  • Analyzing proteomics data (e.g. mass spectrometry-based spectra, LCMS, DIGE).
  • Analyzing molecular screening data by working with the molecular screening facility
  • Analysis of any external datasets from various data repositories, including TCGA, GEO, ArrayExpress, ENCODE, etc.

Data analysis and consulting support

Bioinformatics works closely with Wistar Cancer Center investigators to assist them with use of computational bioinformatics tools and methods for processing and interpretation of genomic, molecular, and proteomic data. Bioinformatics staff also help investigators in integrating data processing results in their reports and proposals. The facility uses publicly available tools, database and in-house developed software for the analyses (from raw data to functional analysis) and offers consultation and training in the areas of bioinformatics, such as:

  • Advice on experimental design and sample size estimation
  • Point and Confidence Interval estimation
  • Comparative data analysis such as t-test, ANOVA, SAM, Non-parametric test
  • Association studies/Contingency table analysis (e.g. chi-square test)
  • High dimensional data analysis such as repeated measurement, dimension reduction (e.g. SVD, PCA, MDS), permutation test
  • Survival analysis such as Kaplan-Meier or Cox Proportional Hazards models
  • Time series data analysis
  • Statistical modeling/Predictive modeling/Machine learning - Data mining in multivariate settings (supervised and unsupervised learning from data, Regression, Classification, Clustering, Generalized Linear Model)
  • Sequence analysis, provide assistance with annotation of protein sequences, genes and gene regulatory regions predictions, such as promoters, transcription factor binding sites, and motifs.
  • Gene Ontology and Pathway analysis
  • 3D molecular modeling, particularly homology modeling, analysis of protein structure properties such as electrostatic potential, surface area, protein-ligand docking, small molecule screening, protein-protein interaction, molecular dynamic simulation.

Support for data management and custom programming

Data management


Large volumes of high-dimensional data are generated by Cancer Center Shared Resources as well as other research programs such as microarray and sequencing data, tissue related data, image data, and pharmacodynamics data. The Bioinformatics Shared Resource uses a combination of locally installed and public databases and provides consulting support to design and maintain databases for various datasets, securely share data within or across Cancer Centers, store and backup data generated by the users. 

Custom programming


This support is provided for researchers who wish to use the software systems developed and deployed by the shared resource or develop their own software or tools. The resource provides users with basic training to set up and use the existing software system, develop new tools and web applications.  Consulting support is also provided to investigators who want to develop databases and workflow in their labs. The Bioinformatics staff analyzes the data handling requirements of the investigator’s lab and help them choose the best software solution for their studies.