News update:

Information regarding 2017 DataFest codes and logistics will be shared once ASA and Expedia team are finished conducting DataFest competitions at participating schools nationwide. Stay tuned for more information here!



Best in Show:

Team Diversity: Shiyue Zheng, Zeyuan Jin, Winston Lin, Leyao Weng, Jiayi Zhang

Best Visualization:

Team Hero: Zhao Dawei, Shiqi Duan, Ji Shen, Minghao Dong, Peiran Fang

Best Use of External Data:

Team Snarky Puppies: Minh Pham, Gary Buranasampatanon , David Arredondo, Burton Sacks , kyle johnson , Karen Xia

Columbia Statistics Club (CSC) is holding our biggest event, DataFest, of the semester, spring 2017; collaborating with the American Statistical Association (ASA).

DataFest Details:

  • All participants will be grouped into teams of 5 or 6
  • Each team will receive a real-world dataset on Friday (03/24) night. Based on each team’s interests and background, each team can develop their own topics / questions of interests to work on.
  • There are about one and a half days to complete a project for each team.
  • Each team is also asked to give a presentaion on Sunday afternoon (03/26).
  • Any software and programming language is welcomed to be used for project / presentation.
  • Any topic is welcomed in project / presentation.
  • Comments will be given by judges. (Judges are introduced below.)
  • During the event, we will also have tutotials from guest speakers.
  • Many rewards, including gift cards and prizes, will be distributed to the team who perform well in the presentation.
  • Food and Drinks will be provided!

Introduction for American Statistical Association:

The American Statistical Association DataFest is a celebration of data in which teams of students work around the clock to find and share meaning in a large, rich, and complex data set. Many professionals find ASA DataFest to be a great recruiting opportunity–they get to watch talented undergraduate students work under pressure in a team and examine their thinking processes. After two days of intense data wrangling, analysis, and presentation design, each team is allowed a few minutes and no more than two slides to impress a panel of judges. Prizes are given for Best in Show, Best Visualization, and Best Use of External Data.

2017’s Dataset and Expedia Introduction:

The 2017’s dataset comes from the Expedia, Inc., who is one of the largest travel company in the America. Expedia owns and operates several international global online travel brands, primarily travel fare aggregator websites and travel metasearch engines including Expedia.com, Hotels.com, Hotwire.com, trivago, Venere.com, Travelocity, Orbitz, and HomeAway. It also operates more than 200 travel booking websites in more than 75 countries, and has over 350,000 lodging listings and over 500 airline listings.


We are very honored to invite Prof. Zheng and Prof. Robbins, from Columbia Statistics Department; Ying Liu, Data Scientist from Google; and Ke Shen, Data Scientist from Ladders, as our judges. They will be giving thorough comments to each team that presents at the DataFest.

Sample dataset and presentations from last year

Our Mission for the DataFest:

For this DataFest is to give all students who love exploring data a hand-on, real-world experience. It does not matter what software and analysis methods you are using. Coding or programming is not our major purpose for this event, good ideas is indeed the take-home message we are trying to create for you. As long as you are interested in digging out real-world data, please come and join us at March 24th!
StatFest Spring 2017 Poster



 Friday Night (03/24):

6:45pm – 7pm: Reception

7pm – 7:30pm: Introduction to DataFest

7:30pm – 9pm: Group Forming

   Saturday (03/25): 9am-4pm      

Morning: Tutorial by Yuhan Sun, based on the experience on her data visulazation projects with the United Nations.

Afternoon: Time for Group Work

Sunday(03/26): 9am-4pm

Morning: Time for Group Work

Afternoon: Presentations



In addition:

DataCamp has offered a free six-month trial membership (access to all their courses) for DataFest participants.

Redemption Link


Telling a Story through Data Visualization by Yuhan Sun

DataFest resources: 

Link to Professor Tian Zheng presentation files : https://github.com/TZstatsADS/DataFest2017