![]()
Data Mining Competition 2008
Department of Statistics
& Actuarial Science
University of Central Florida
Announcement
The Data Mining program at the University of Central Florida (UCF) is announcing a data mining competition on marketing response analysis in collaboration with BlueCross BlueShields of Florida (BCBSFL). The purpose of this project is to develop a predictive model the can generate a list of potential responders in a future promotion mailing campaign. The response/target variable is 0-1 binary with value1 indicating a response in the previous mail campaign. Most of the explanatory variables or inputs used in this study are from census data and the rest are from a list data vendor. We have renamed all input variables as X1, X2, ... for data security and privacy concerns.
Two formats of the datasets are made available: SAS formatted and comma-separated values (CSV). Please select the one that serves best to your convenience after registration.
|
|
Training |
Test |
Answer |
|
SAS |
training.sas7bdat (392.53 mb) |
test.sas7bdat (43.89 mb) | answer.sas7bdat |
|
CSV |
training.csv (257.00 mb) | test.csv (28.55 mb) | answer.csv |
This competition is open to anyone interested. Please review the following rules carefully and contact us with any questions at data.mining.2008@gmail.com.
- Please build your model using the training data set and accordingly obtain your predicted probability of response for each individual in the test sample. Two deliverables must be submitted by 5:00 pm (Eastern Time) on 3/31/2008 in order to participate in the contest.
- A data set with two columns: one is ID and the other is your predicted probabilities of response (not 0-1 predicted outcomes).
- A one-page write-up that contains your contact information and a brief description of your modeling methods and approaches. The contact information should list the names, titles, academic degrees, affiliations, and locations (city, state, and country, if international) of all authors.
- The top three winners will be selected according to predicted probabilities on the test sample data. All participants will be ranked using the following two specific model performance measures.
- Criterion 1: area under the receiver operating characteristic (ROC) curve.
- Criterion 2: percentage of responders caught among the first 10,000 individuals with highest prediction response probabilities.
Then the final ranking will be the sum of these two separate ranks. In the case of ties (e.g., Tom has got No.1 in terms of Criterion 1 and No.3 in terms of Criterion 2, while Jerry has got No. 2 with both criteria), the one with higher rank in terms of Criterion 1 (i.e., Tom) would win out.
- All sponsored by BLBSFL, a cash prize of $1,000 will be awarded to the best performer; $500 for the second and $250 for the third. The three winning individuals or teams will also be invited to present their results at the Fourth Annual Business Intelligence Symposium in Orlando, FL on April 11, 2008. Award plates will be presented to the winners during the symposium. The work can be completed by an individual or group, but only one individual will be invited to present their work at the Symposium for a winning team.
- Feburuary 08, 2008 Competition Announced
- March 31, 2008 Submissions for Competition by 5:00 pm (Eastern Time)
- April 02, 2008 Announcement of Winners
- April 11-12, 2008 Fourth Annual Business Intelligence Symposium in Orlando, FL
We thank all participants for their participation in this competition. Congratulations to the three top winners (highlighted below in green):
- Hualin Wang, Retail Marketing Insights, Alliance Data, Columbus, OH
- Jan Wijffels, Belgium Network of Open Source Analytical Consultants (BNOSAC), Belgium
- Nan Yang, UCF - Statistics
They will be invited to present their data mining procedures during the symposium. Please note that both symposium attendance and the presentation are required in order to claim the award.
| Percent of | |||||||||
| Name of | Area | Responders Caught | Overall | ||||||
| participant | Team Leader | Affiliation | Under ROC | Rank I | in the Top 10,000 | Rank II | Score | Rank | |
| 1 | Nan Yang | UCF - Statistics | 0.6620 | 5 | 24.79% | 3 | 8 | 3 | |
| 2 | Peter Thorne | UCF- Statistics | 0.6107 | 11 | 18.76% | 10 | 21 | 11 | |
| 3 | Braulio Medina Dias | PUC-RIO- Brazil | 0.6411 | 8 | 31.01% | 1 | 9 | 5 | |
| 4 | Kirolos Maged Haleem | UCF - Transportation | NA | NA | NA | NA | NA | NA | |
| 5 | Istvan Nagy | Budapest University of Technology and Economics, Hungary | 0.5012 | 14 | 10.86% | 14 | 28 | 14 | |
| 6 | Jin Su | BlueCross BlueShield of Florida | 0.6558 | 6 | 22.84% | 8 | 14 | 8 | |
| 7 | Adil Soyuer | UCF -Statistics | NA | NA | NA | NA | NA | NA | |
| 8 | Mengfei Yu | UCF -Statistics | 0.5041 | 13 | 11.05% | 13 | 26 | 13 | |
| 9 | Lan Zagar | University of Ljubljana, Slovenia | 0.6638 | 4 | 23.49% | 4.5 | 8.5 | 4 | |
| 10 | Li Liu | UCF - Data Mining Lab | 0.6466 | 7 | 23.21% | 6 | 13 | 7 | |
| 11 | Agung Rahmat Saleh | Telkom Institute of Technology, Indonesia | 0.5042 | 12 | 11.33% | 12 | 24 | 12 | |
| 12 | Qiling Shi | UCF - Mathematics | 0.6336 | 9 | 20.33% | 9 | 18 | 9 | |
| 13 | Danang Risang Djati | Telkom Institute of Technology, Indonesia | NA | NA | NA | NA | NA | NA | |
| 14 | Jan Wijffels | Belgium Network of Open Source Analytical Consultants (BNOSAC), Belgium | 0.6681 | 2 | 23.49% | 4.5 | 6.5 | 2 | |
| 15 | Hualin Wang | Retail Marketing Insights, Alliance Data, Columbus, OH | 0.6699 | 1 | 25.26% | 2 | 3 | 1 | |
| 16 | Vladimir Nikulin | Suncorp, Brisbane | 0.6645 | 3 | 23.12% | 7 | 10 | 6 | |
| 17 | Min Li | UCF - Data Mining Lab | NA | NA | NA | NA | NA | NA | |
| 18 | Paulo J. L. Adeodato | NeuroTech Ltd. Federal University of Pernambuco, Brazil | 0.6237 | 10 | 18.66% | 11 | 21 | 10 |
Upon the requests of several participants, we make available below the presentation files from the three top winners. Also, for UCF students who participated the competition, please stop by the statistical department main office (CCII 212 D) to pick up your certification for this competition. If you would like the certificates mailed to you, please contact our department secretary, Ms. JoAnne Roche, at jroche@mail.ucf.edu by email or (407) 823-5562 by phone.
- Hualin Wang Presentation.ppt
- Jan Wijffels Report_BNOSAC.zip, which contains the R codes used.
- Nan Yang mailing.ppt