Fall DataFest 2017

 

DataFest has concluded successfully.


Thanks for everyone who participated in our Fall 2017 DataFest and congratulations to our winning teams!

First Place – Team NLG

Members: Shan Guan, Xiayu Niu, and Xiaochen Liu

Second Place – Team Schrodinger’s Dog

Members: Keni Mou, Xincen Xie, Yuhan Zha, Xueting Sun, Yuqing Zhang, and Xiangpeng Zhao

Third Place – Team We Love KFC

Members: Wei Qin, Hanying Ji, Jiarui Tian, and Kuo Yang

 

More pictures can be found here!


All team leaders please send an email to [email protected] with a list of your team members (University ID or email address) and team name.

There are three awards for this competition valued at $450 total.

Grading is based on a cumulative grade with the following criteria:

Model completeness, Presentation/visualization, Business insight, Best use of external data

The final criteria may change at the judges’ discretion.

There is a requirement to use at least one of the datasets provided.

Make sure to present your methodology and results as best as possible on your slides and presentation.


Satow Room in Lerner Hall is booked for Saturday 9:00 AM – 4:00 PM if you need a place to work on the project.

Mudd 833 is booked for work space from 9:00 AM – 2:00 PM with presentation starting at 2:30 PM, going on till 5:00 PM.

DataFest Introduction presentation

 

Data Part 1:

Link

How to evaluate (info on evaluation) your model (Code for calculating your Gini score) The higher Gini score the better.

Data Part 2:

Link

Compiled by Dr. Zifa Wang from USGS Catalog

Data Part 3:

External data (https://www.data.gov/, https://www.usgs.gov/, https://www.census.gov/,… etc.)


Columbia Statistics Club (CSC) is holding its biggest event, DataFest, of the semester Fall 2017

DataFest Details:

  • All participants will be grouped into teams of 5 or 6.
  • Each team will receive a real-world dataset on Friday (11/03) night. Teams can develop their own topics / questions of interests to work on.
  • There are about one and a half days to complete a project for each team.
  • Each team is also asked to give a presentation on Sunday afternoon (11/05).
  • Any software and programming language is welcomed to be used for project / presentation.
  • Comments will be given by judges. (Judges are introduced below.)
  • During the event, we will also have tutorials from our guest speaker.
  • Many rewards, including gift cards and prizes, will be distributed to the teams who perform well in the presentation.
  • Food and drinks will be provided!

Dataset and Introduction:

The Fall 2017’s dataset comes from USGS catalog compiled by Dr. Zifa Wang, Chief Analytics Officer at Validus Research. There will also be an additional fire insurance dataset.

Competition Judges: 

  • Dr. Zifa Wang, Chief Analytics Officer at Validus Research, who obtained his Doctorate in Architectural Engineering from The University of Tokyo
  • Prof. Tian Zheng from Columbia Statistics Department, Winner of the Columbia Presidential Teaching Award
  • Prof. Siddhartha Dalal from Columbia School of Professional Studies, who previously served as the Chief Data Scientist & Senior VP of Advanced Research and Technology at AIG

They will be giving thorough comments to each team that presents at the DataFest.

Our Mission for the DataFest:

To give all students who love exploring data a hands-on, real-world experience. It does not matter what software and analysis methods you are using. Coding or programming is not our major purpose for this event, good ideas is indeed the take-home message we are trying to create for you. As long as you are interested in playing with real-world data, please come and join us at November 3rd!

 

 

Agenda

Friday Night (11/03):

6:00pm – 6:15pm:

Introduction to the DataFest

6:15pm – 6:45pm:

Tutorial by Prof. Dalal, [How to build insurance models + Q&A Session]

6:45pm – 7:10pm:

Introduction to the Datasets and requirements

7:10pm – 9:00pm:

Group forming

Saturday (11/04):

9:00am – 4:00pm:

Project preparation

 

 

 

 

 

 

 

Sunday(11/05): 

9:00am – 2:00pm: 

Time for group work

2:30pm – 4:00pm: 

Presentation

4:30pm – 5:00pm:

Evaluation from judges