
Bioinformatics Core Facility
We aim to help scientists advance their research with cutting-edge bioinformatics methodologies, by providing data analysis services, consultation, and training.
About Us
Our main areas of expertise include the design and analysis of experiments which use genomic technologies (e.g. Next Generation Sequencing, Mass spec proteomics & metabolomics, Nanostring technology and microarrays), as well as mining and re-analysis of publicly available datasets.
As a core facility, we are required to encompass an extremely wide range of research areas. So far, we have been involved in studies in human genetic disorders, infectious diseases, plant sciences, stem cells & development, applied agriculture, aquaculture, biomedicine & biotechnology and more.
The Bioinformatics Core Facility was established on September 2003, with the aim of helping scientists advance their research with cutting edge bioinformatics methodologies. The facility provides data analysis services, consultation and training to scientists at BGU and all over Israel. Our clients are both from academia and industry.
We continuously learn and expand our set of skills and capabilities, to cope with the ever-growing range of applications of genomic technologies. We see ourselves as applied bioinformaticians, using existing software tools as much as possible, and developing new software when necessary.
As the large majority of our projects are unique in their own way, there is rarely a standard analysis. For each project we carefully select and test a specific set of software and parameters. Often, projects arrive to our door which are not even similar to anything we have done before.
Throughout the work on a project, we smoothly navigate from complex biological questions to complex computational analyses, and then return to the biologist with data and knowledge which he/she can comprehend. We go hand-in-hand with the scientists in designing their genomic-scale experiments, in interpreting them, and in exploring options for further research. Finally, we assist in writing grant proposals, and take part in writing the bioinformatics parts in articles submitted for publication.
Among our customers are scientists from all Israeli Universities, from government research institutes, from the Ministry of Health, and from a large number of companies.
Our Services
Data analysis software deveopement
our unit developed software that helps to perform data analysis:
- NeatSeq Flow: a lightweight software for efficient construction and execution of high throughput sequencing workflows
- Microbe flow: a specialized workflow for bacterial pathogenomics and genomic epidemiology
- mompS: A tool for calling the SBT allelic profile of Legionella pneumophila, overcoming multiple copies of mompS

general bioinformatics services
New technologies and new approaches for data analysis are continuously emerging, and accordingly the set of services offered by the Bioinformatics Core Facility is constantly expanding.
we offer general bioinformatics services such as:
- Mining and analysis of high-throughput data from public databases
- Bioinformatics programming
- Database and web development

Next Generation Sequencing Data Analysis
Next Generation Sequencing (NGS) has progressed enormously over the past ten years, transforming the biological sciences and opening up many new opportunities in basic, applied and clinical research. In some respects, the potential of NGS is akin to the early days of PCR, with one's imagination being the primary limitation to its use.
The major advance offered by NGS is the ability to produce an enormous volume of sequence data at an unprecedented speed and a constantly decreasing cost. This feature expands the realm of experimentation beyond just determining the order of bases (Metzker, Nature Reviews Genetics 11, 31-46, 2010), and has prompted the development of a rich catalog of NGS applications.
The Bioinformatics Core Facility has extensive experience in analyzing NGS data, of both model and non-model organisms. We have the necessary hardware infrastructure, as well as cutting edge commercial and publicly available software for high throughput analysis of genomic data.
Among the NGS data analysis services we provide are:
- de novo sequence assembly and annotation of novel genomes and transcriptomes
- Gene expression profiling (RNA-Seq)
- Single cell RNA-Seq
- Protein-DNA interaction analysis (Chip-Seq)
- Micro-RNA discovery and profiling (miRNA-Seq)
- SNP discovery and genotyping from exome sequencing, RAD-Seq, GBS and whole-genome sequencing
- Human, plant and livestock genetic analyses
- Microbial genomics
- Metagenomic studies of microbial populations ("microbiome") using 16S, 18S, ITS and shotgun metagenomic sequencing
- Comparative genomics
- Development of genomic markers
Microbial Genomics and Epidemiology
In the last years we have specialized in microbial genome analyses, of both Whole Genome Sequencing (WGS) and metagenomic samples.
We have developed dedicated pipelines for a broad spectrum of bacterial analyses, aimed at basic research as well as clinical, agricultural and biotechnological applications.
Our facility is well equipped for simultaneus analysis of hundreds and thousands of bacterial samples, using the leading software in the field.
Our services include:
Whole Genome Sequencing (WGS)
Analysis of DNA sequence reads obtained from isolated bacteria:
- Quality assurance and de novo assembly of raw sequence reads
- Gene prediction and functional annotation of the assemblies
- in silico molecular typing at species and sub-species levels, including MLST and Core Genome MLST (cgMLST) analyses
- SNP calling and construction of SNP-based phylogenetic trees
- Generation of Minimum Spanning Trees from SNP, MLST, cgMLST and concatenated core gene sequences
- Identification of virulence and antimicrobial resistance determinants
- Molecular epidemiology and outbreak investigation
- Comparative genomics and biomarker development
Metagenomics - amplicon sequencing (16S and 18S rRNA, ITS etc.)
Analysis of sequence reads obtained from PCR amplification of marker genes from bacterial communities / environmental samples:
- Quality assurance and preprocessing of the raw sequence reads
- Identification of the taxa present in environmental samples and their relative abundance (OTU picking and generation of a BIOM table)
- Assessing the degree of species diversity in each sample (alpha diversity analysis)
- Comparative analyses among samples (beta diversity analyses)
- Statistical testing for differential abundance of individual taxa among the samples
- Advanced downstream analyses such as functional analyses, association networks, source tracking and more
Metagenomics - shotgun sequencing
Analysis of DNA (or RNA) sequence reads obtained from bacterial communities / environmental samples:
- Quality assurance and preprocessing of the raw sequence reads
- Taxonomy, abundance and diversity analyses as described for amplicon sequencing
- Gene prediction and functional classification
- De novo assembly of individual genomes from the metagenomic samples, followed by gene prediction, functional annotation and molecular typing
- Identification of virulence and antimicrobial resistance determinants
Mass Spectrometry Proteomics Profiling

In the last decade, proteomics has emerged as a promising field for studying global gene expression profiles at the protein level. Developments in mass spectrometry-based technologies enable the identification of proteins and measurement of their abundance in biological samples. This approach, often termed "quantitative proteomics", provides a useful platform for the study of quantitative differences in protein abundance among conditions, tissues or cell types.
Initial preprocessing and analysis of the mas-spec data is usually carried out at the proteomics laboratory, yielding an excel file specifying estimated abundance per protein per sample.
Subsequent statistical tests for comparing protein abundance among the biological treatments or states, can then be performed at the Bioinformatics Core Facility, along with downstream analyses such as clustering, functional and pathway enrichment.
So far, we have analyzed numerous mass-spec proteomics profiling datasets, using Partek(R) Genomics Suite, Perseus and in-house R programs.
MiRNA Expression Profiling Using Nanostring Technology
NanoString's nCounter miRNA Expression Assays enable direct digital detection and counting of up to 800 miRNAs in a single reaction without amplification. Nanostring offers fixed-content panels for analysis of miRNA transcriptomes of human, mouse and rat.
At the Bioinformatics Core Facility we offer the analysis of nCounter data, achieved from experiments which profile miRNA expression under different conditions, tissues or cells. The analysis includes quality assurance and normalization of the miRNA count data, statistical tests for differential expression among the different conditions and clustering analyses.
DNA Microarray Analysis
DNA microarrays used to be the leading technology for genome-wide studies, until Next Generation Sequencing (NGS) approaches took over and often replaced microarrays. However, DNA microarrays are still used for certain applications, and occasionally scientists wish to re-analyze old microarray datasets.
Throughout the years, the Bioinformatics Core Facility has analyzed numerous microarray experiments of various platforms (Affymetrix, Illumina, Agilent, Nimblegene, custom spotted arrays etc.) and types. Our expertise covers various applications of DNA microarrays, including:
- Differential gene expression, at the exon, gene and pathway levels
- Identification of chromosomal aberrations (CNV, CGH, LOH)
- Identification of protein binding sites in DNA or RNA samples (e.g. ChIP on chip)
- Genetic studies using SNP and CNV arrays: linkage, association, cytogenetics, homozygosity & shared haplotype analyses, pharmacogenomics.
These analyses are complemented by advanced clustering analyses, as well as functional analyses, gene ontology and pathway enrichment, and overlay on protein-protein interaction networks.

Metabolic and Regulatory Pathways

The laborious process of performing and statistically analyzing genomic experiments typically culminates in a list of genes or proteins. However, deriving biological meaning from these lists is often possible only after overlaying them on known biological pathways.
A variety of bioinformatics tools are available for this task, providing sophisticated visualizations of the genes within the pathways, advanced pathway enrichment analyses, and integration of vasts amounts of knowledge from pathway databases and published literature. In addition, pathway data may be integrated with protein-protein, protein-DNA and protein-compound interactions data.
At the Bioinformatics Core Facility we have gained vast experience in downstream analysis of high-throughput genomic experiments using cutting edge pathway databases and tools such as KEGG, STRING, BioCyc, Reactome, Cytoscape, GSEA and others. In addition, we provide consultation on the use of highly popular commercial software for pathway analysis such as Ingenuity Pathway Analysis and GeneGo MetaCore. We will be happy to match the most adequate tool to any particular experiment, and to consult the user or analyze the data accordingly.
Our services include:
- Mapping IDs, names or sequences of provided molecules to corresponding entities in pathway databases
- Identifying pathways or reactions which are statistically over-represented in provided gene or protein lists
- Visually overlaying gene expression data on biological pathways and coloring the pathways according to either absolute or differential gene expression
- Visually overlaying gene expression data on protein-protein interaction networks
- View and compare equivalent pathways in various species
Biostatistics
The Bioinformatics Core Facility provides statistical and data mining services for the analysis of both basic science and health research studies, with a focus on analysis of large-scale information. Our experience in genomic, transcriptomics and other biology research fields with high throughput, gives us an advantage in analyses of studies combining information from different domains, such as clinical and genomic datasets.
We offer the following biostatistics and machine learning services:
- Central tendency and dispersion analysis
- Exploratory data analysis and diagnostic plots
- Feature selection and dimensionality reduction
- Hypothesis tests: t-test, ANOVA, chi-square, Mann-Whitney, Kruskal-Wallis etc.
- Regression models: Linear regression, Logistic regression, GEE etc.
- Machine learning models: Decision trees, Artificial Neural Network, Support Vector Machine etc.
- Clustering: K-means, SOM, hierarchical clustering, CLICK and more
- Enrichment analysis: hyper geometric test, chi-square test, GSEA etc.

NeatSeq-Flow: A Lightweight Software for Efficiant Construction and Execution of High Throughput Sequencing Workflows
NeatSeq-Flow is a cross-platform, easy-to-install, lightweight python package that manages the creation and execution of workflows on computer clusters. The software is primarily used for high throughput sequencing data analysis, but it can accomodate other types of data analysis just as well.
Contact
Liron Levin: levinl@post.bgu.ac.il
NeatSeq-Flow receives a parameter file in YAML format and creates a hierarchy of shell scripts, which may then be executed through running a master script. This ensures full transparency, ducomentation and reproducibility of the workflow. Alternatively, the user is offered a friendly GUI which guides him/her through building the workflow and then executes it.
The user has full control over the execution process, and he/she may execute the workflow either entirely automatically, step-by-step or even sample-by-sample.
Adding modules to NeatSeq-Flow and creating new workflows is a straighforward process, and can be achieved by anyone with basic knowledge of phython, based on existing templates.
NeatSeq -Flow is routinely used in our Bioinformatics Core Facility in many types of analyses, such as DNA assembly and annotation, RNA-Seq, ChIP-Seq, variant detection, comparative genomics, microbial typing and phylogeny, metagenomics and more.
Read More
Article at bioRxiv
ISMB 2017 Poster at F1000
NeatSeq-Flow Documentation at Read the Docs
Learn how to use NeatSeq-Flow through a friendly Graphical User Interface (GUI) - NeatSeq-Flow GUI Tutorial
Download and Install
Download NeatSeq-Flow from GitHub
Download NeatSeq-Flow GUI from GitHub
Installation instructions: Read the Docs
Cite
Menachem Y. Sklarz, Liron Levin, Michal Gordon and Vered Chalifa-Caspi (2018) NeatSeq-Flow: A Lightweight High Throughput Sequencing Workflow Platform for Non-Programmers and Programmers Alike. bioRxiv doi: 10.1101/173005
Documentation and installation
NeatSeq-Flow workshops
A major aim of the Bioinformatics Core Facility has always been to deliver our knowledge to scientists and students. We have gained years of experience in the art of simplifying and communicating bioinformatics knowledge to varying audiences.
During the years we have developed numerous courses, workshops and single presentations which we will be happy to offer our clients. Teaching modalities range from illustrating basic concepts to presenting gory details; from frontal lectures to hands-on tutorials and self-paced exercises.
We strive to give our listeners the tools and the knowledge which will render them the freedom to devise their original approaches and to implement their wildest ideas.
To request a course, a workshop or a lecture please contact us:
Vered Chalifa-Caspi, veredcc@bgu.ac.il
Liron Levin, levinl@post.bgu.ac.il
Microbe-Flow: A Specialized Workflow for Bacterial Pathogenomics and Genomic Epidemiology
Microbe-Flow is an easy to setup, modular and flexible workflow for analyzing bacterial whole genome sequencing (WGS) data. It is based on the NeatSeq-Flow platform for modular workflows design and efficient execution on computer clusters.
Microbe-Flow uses a wide range of tools to enable a comprehensive overview on large numbers of samples. Microbe-Flow offers an automated solution for WGS, starting from FASTQ files, including raw reads quality control (QC), de novo assembly and its QC, gene prediction and annotation including virulence/resistance genes, phylogenetic analyses and typing based on variant calling and core SNPs, multilocus sequence typing (MLST) and core genome (cg)MLST.
Notably, Microbe-Flow enables a range of studies including pan-genome and association analysis for desired bacterial attributes, bi-clustering of virulence genes including enrichment and gene clusters co-localization tests. Conveniently, Microbe-Flow is not species-specific and can be adjusted to any bacterium, while species-specific steps can be added as needed.
Additionally, Microbe-Flow produces a detailed final report. Microbe-Flow was designed to work hand in hand with CONDA environments to enable an easy installation of almost all the pre-required programs and packages. Finally, Microbe-Flow is well documented and has step by step walk through instructions.
Documentation and installation
Contact
Liron Levin, levinl@post.bgu.ac.il
mompS: A tool for calling the SBT allelic profile of Legionella pneumophila, overcoming multiple copies of mompS
Whole genome sequencing (WGS) has revolutionized the subtyping of Legionella pneumophila but calling the traditional sequence-based type from genomic data is hampered by multiple copies of the mompS locus. We propose a novel bioinformatics solution for rectifying that limitation, ensuring the feasibility of WGS for cluster investigation.
We designed a novel approach based on the alignment of raw reads with a reference sequence. With WGS, reads originating from either of the two mompS copies cannot be differentiated. Therefore, when non-identical copies were present, we applied a read-filtering strategy based on read alignment to a reference sequence via unique 'anchors'. If minimal read coverage was achieved after filtration, a consensus sequence was built based on mapped reads followed by calling the sequence-based typing allele.
mompS is written in Perl and is available both as source code and through a web server.
Use mompS
mompS Web Server
Download mompS from Github
Cite mompS
M. Gordon, E. Yakunin, L. Valinsky, V. Chalifa-Caspi, J. Moran-Gilad, on behalf of the ESCMID Study Group for Legionella Infections (2017) A bioinformatics tool for ensuring the backwards compatibility of Legionella pneumophila typing in the genomic era. Clinical Microbiology and Infection 23: 306-310. PMID: 28082190
Contact
Michal Gordon, gordonmi@bgu.ac.il
Policies and Guidelines
If you are planning an experiment for which you would like analysis support, the best time to approach the Bioinformatics Core Facility would be before you are collecting or preparing samples and prior to any data being generated. This is to ensure that good experimental design procedures are observed and that the experiment is sufficiently powered to answer your biological question of interest and any potential confounding factors have been addressed.
In an initial meeting, which is free of charge, we will discuss your experiment and requirements, and how we can assist. Before starting actual work (and if necessary, after review of available software and data) we will send you a price quote, at least for the first stages of the work. We will need written approval of the project PI before the work can commence.
Kickoff meetings to plan future experiments, short phone consultations and participation in writing bioinformatics parts in papers and grant proposals are free of charge. All other Bioinformatics and Consulting support from the Bioinformatics Core Facility are charged.
| Item No. | Item description | No. hours | Estimated time (net) | Fee (NIS) |
| 200.1.5 | Hourly fee | | An hour | 200 |
| 200.1.6 | Mini project | 2 - 8 | A day | 1,200 |
| 200.1.7 | Light project | 9 - 16 | Two days | 2,100 |
| 200.1.8 | Medium project | 17 - 40 | A week | 4,800 |
| 200.1.9 | Large project | 41 - 80 | Two weeks | 7,680 |
| 200.1.10 | Heavy project | 81 - 160 | A month | 12,800 |
The Bioinformatics Core Facility must charge for services rendered by its staff. However, charging for services does not preclude authorship on scientific publications. As in commonly accepted scientific practice, co-authorship is warranted whenever an individual has made a significant contribution to the scientific work being described.
- - Substantial contributions to the conception or design of the work; or the acquisition, analysis, or interpretation of data for the work
- - Drafting the work or revising it critically for important intellectual content
- - Final approval of the version to be published
- - Agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved
According to the Association of Biomolecular Resource Facilities (ABRF), the activities for which authorship is recommended are:
- Author should make substantive contributions to the project
- - Conception, design of project, critical input, or original ideas
- - Acquisition of data, analysis and interpretation, beyond routine practices
- - Draft the article or revise it critically for intellectual content
- - Write a portion of the paper (not just materials and methods section)
- - Intellectual contribution
- - Final authority for the approval of article
- Each author should have participated enough to accept responsibility for the content of the manuscript
Please note that research products, including co-authorship on publications, are one of the key factors taken into account when our staff scientists are considered for promotion and tenure at the University.
Principal investigators that are not present when members of their research groups are using the Bioinformatics Core Facility services are expected to discuss co-authorship issues with their group members, to sensitize them to the importance of this issue.
It is recommended to discuss scientific credit at the beginning of a project. If you are uncertain about co-authorship or have any questions or concerns about it, please discuss it with the head of the Bioinformatics Core Facility.
Our Staff - Contact Us

Dr. Liron Levin
Head, Bioinformatics Core Facility

Founder and former head, Bioinformatics Core Facility


Bioinformatician






