Data sets for reproducing and extending

"Gentle Integration of Gene Expressoion And Sequence Attributes (GIGEASA)"

for organism-wise predictions of novel DNA-binding proteins (DBPs).



Read me file: Outline about the data and codes available on these pages.

Feature data for Human proteins/genes

Final prediction scores from sequence/GE features for human, analyzed in the manuscript

Feature data for Arabidopsis proteins/genes

Feature data for Mouse proteins/genes

Common attribute data for different organisms

Data files and source code for processing sequence features (any organism)

Source codes for computing sequence and expression level features (portability issues such as local paths and dependencies may persist. This is still being worked out).

Source code and example data for training prediction models sequence features (any organism)

A short summary "how-to" for quick reference of reimplementing these analyses/prediction models.

Single download of the entire data in this directory (About 1.0 GB compressed file).



Access web server to compute the binding site based features for a large data set from PSSM data