Data Mining Competition 2008
 

Department of Statistics & Actuarial Science
University of Central Florida
 

 

Training

Test

Answer

SAS

training.sas7bdat (392.53 mb)

test.sas7bdat  (43.89 mb) answer.sas7bdat

CSV

training.csv (257.00 mb) test.csv (28.55 mb) answer.csv

 

This competition is open to anyone interested. Please review the following rules carefully and contact us with any questions at data.mining.2008@gmail.com.

  1. Please build your model using the training data set and accordingly obtain your predicted probability of response for each individual in the test sample. Two deliverables must be submitted by 5:00 pm (Eastern Time) on 3/31/2008 in order to participate in the contest.
    • A data set with two columns: one is ID and the other is your predicted probabilities of response (not 0-1 predicted outcomes).   
    • A one-page write-up that contains your contact information and a brief description of your modeling methods and approaches. The contact information should list the names, titles, academic degrees, affiliations, and locations (city, state, and country, if international) of all authors.

     

  2. The top three winners will be selected according to predicted probabilities on the test sample data. All participants will be  ranked using the following two specific model performance measures.  
    • Criterion 1:    area under the receiver operating characteristic (ROC) curve.
    • Criterion 2:    percentage of responders caught among the first 10,000 individuals with highest prediction response probabilities.

    Then the final ranking will be the sum of these two separate ranks. In the case of ties (e.g., Tom has got No.1 in terms of Criterion 1 and No.3 in terms of Criterion 2, while Jerry has got No. 2 with both criteria), the one with higher rank in terms of Criterion 1 (i.e., Tom) would win out.    

     

  3. All sponsored by BLBSFL, a cash prize of $1,000 will be awarded to the best performer; $500 for the second and $250 for the third. The three winning individuals or teams will also be invited to present their results at the Fourth Annual Business Intelligence Symposium in Orlando, FL on April 11, 2008. Award plates will be presented to the winners during the symposium. The work can be completed by an individual or group, but only one individual will be invited to present their work at the Symposium for a winning team.

 

 

We thank all participants for their participation in this competition. Congratulations to the three top winners (highlighted below in green):

  1. Hualin Wang, Retail Marketing Insights, Alliance Data, Columbus, OH
  2. Jan Wijffels, Belgium Network of Open Source Analytical Consultants (BNOSAC), Belgium
  3. Nan Yang, UCF - Statistics

 They will be invited to present their data mining procedures during the symposium. Please note that both symposium attendance and the presentation are required in order to claim the award.   

 

            Percent of      
  Name of     Area   Responders Caught     Overall 
participant Team Leader   Affiliation Under ROC Rank  I  in the Top 10,000 Rank II Score Rank
1 Nan Yang   UCF - Statistics 0.6620 5 24.79% 3 8 3
2 Peter Thorne   UCF- Statistics 0.6107 11 18.76% 10 21 11
3 Braulio Medina Dias   PUC-RIO- Brazil 0.6411 8 31.01% 1 9 5
4 Kirolos Maged Haleem    UCF - Transportation NA NA NA NA NA NA
5 Istvan Nagy   Budapest University of Technology and Economics, Hungary 0.5012 14 10.86% 14 28 14
6 Jin Su   BlueCross BlueShield of Florida 0.6558 6 22.84% 8 14 8
7 Adil Soyuer    UCF -Statistics NA NA NA NA NA NA
8 Mengfei Yu   UCF -Statistics 0.5041 13 11.05% 13 26 13
9 Lan Zagar   University of Ljubljana, Slovenia 0.6638 4 23.49% 4.5 8.5 4
10 Li Liu   UCF - Data Mining Lab 0.6466 7 23.21% 6 13 7
11 Agung Rahmat Saleh   Telkom Institute of Technology, Indonesia  0.5042 12 11.33% 12 24 12
12 Qiling Shi   UCF - Mathematics 0.6336 9 20.33% 9 18 9
13 Danang Risang Djati   Telkom Institute of Technology, Indonesia  NA NA NA NA NA NA
14 Jan Wijffels   Belgium Network of Open Source Analytical Consultants (BNOSAC), Belgium 0.6681 2 23.49% 4.5 6.5 2
15 Hualin Wang    Retail Marketing Insights, Alliance Data, Columbus, OH 0.6699 1 25.26% 2 3 1
16 Vladimir Nikulin   Suncorp, Brisbane   0.6645 3 23.12% 7 10 6
17 Min Li   UCF - Data Mining Lab NA NA NA NA NA NA
18 Paulo J. L. Adeodato   NeuroTech Ltd. Federal University of Pernambuco, Brazil 0.6237 10 18.66% 11 21 10

 

        Upon the requests of several participants, we make available below the presentation files from the three top winners. Also, for UCF students who participated the competition, please stop by the statistical department main office (CCII 212 D) to pick up your certification for this competition. If you would like the certificates mailed to you, please contact our department secretary, Ms. JoAnne Roche, at  jroche@mail.ucf.edu by email or (407) 823-5562 by phone. 

  1. Hualin Wang      Presentation.ppt
  2. Jan Wijffels        Report_BNOSAC.zip, which contains the R codes used.
  3. Nan Yang           mailing.ppt