These downloads are provided to reproduce and extend the sequence and gene expression integration for predicting novel DNA binding proteins. Human data is mainly for reproducing the work being presented for publication (July 2017) and mouse and arabidposis data are provided to enable extending these methods to other organisms. Data contains directoroes and files as follows: 1. arabi/ Arabidopis data: All predicted binding sites for protein (PPI), RNA, ATP, DNA etc. bs-preds-arabi.tar.gz Summary features generated from binding site and amino acid composition, for use in DBP prediction. sequence-bs-features.txt Gene expression equal frequency bins based on their rank values at a global scale. exp-bin-breaks-20.txt Gene expression profiles for all genes in the expression array of arabidopsis (GPL198 from GEO) distribited in the breaks defined above. expdata-histo-20bins.txt Co-expression (Preason correlation) values equal frequency bin definitions for the entire data as above. coexp-bin-breaks-20.txt Co-expression values of each query genes with all others in the database. coexpdata-histos-20bins.txt Gene ontology distribution for the top co-expressed genes for every query gene. gene-cgomatrix.txt Gene ontology terms associated with each gene in GPL198. gene-gomatrix.txt Mapping of affymetrix probe ID to a gene name. probe-gene-map.txt 2. mouse/ Similar to arabidopsis, with GPL1261 being used as an expression platform. 3. human/ Similar as above with GPL570 for the expression platform. 4. feature-processing-codes/ Source code of R program and data used to extract features from the binding site data. code-data.tar.gz Details about using the tar archive: readme.txt 5. training-code/ Source code of R program used to train the model using random forest and linear regression. model-train-fsel-and-cv-sq2dbp.R Files requires to run the above code. dbp-annotations-profile.txt (curated by various DBP annotations) feature-set-all.txt (produced by (4)) Other notes for training. readme.txt