Support Options

Report a problem

About you
This is a randomly generated image of letters and numbers. Letters not clear? Click to renew CAPTCHA.
About the problem

Statistics at a Crossroads: Challenges and Opportunities in the Data Science Era

Join Policy
22 Jun 2018
Please give your question a brief title Please add some more details to help us understand the question.
Language Confusion Frequently, data scientists are working on/should implement statistical techniques or statisticians on machine learning or big data. The terminology used is dissimilar; one might try to re-invent the wheel, being unaware CS/stats techniques exist already.
How to better grasp the full scientific process Scientific process (what it exactly means to be scientific) is presumed in statistical practice/thinking but less often explicitly studied, discussed and written about for statisticians (rather than say philosophers).  
Add data science component to qualifying exams The traditional qualifying exam needs to be supplemented with something accommodating data science components that are valued in the industry and academic world.
Interest in Application, not Theory As the last presenter said, PhDs seem more theoretical in the careers they prep for. Individuals like me want the courses that teach advanced skills, but have interest in application. Would love more education, but lack interest in generating theorems.
what topics to leave out to accommodate ones? Everyone values his/her own fields. When new curriculum is to be developed for data science, we have to make room in the study plan. What do we give up?
Burn the curriculum We inherited our graduate and undergraduate curricula, rather than built it to purpose.  Would it make sense to get input from data scientists at IT, banking, and other industries on what they think a data science BS, MS, or PhD program should contain?
Statistically integrated How can we better convince our non-statistical collaborators to bring us to the table earlier? At my work, I am often called \"\"at the last minute\"\" to solve some statistical issue that could have been solved in the design phase.
Define the Frontier Open Problems of Statistics I have given more details in the stats-crossroad forum:
The role of statistical models When faced with a data set and a question of interest, should we first propose a statistical model and use that to inform what algorithm to use, or dream up a plausible algorithm, and then seek to justify it?
Robustness We tend to value highly results on the optimality of statistical procedures under various conditions.  But with Big Data, it's increasingly hard to believe in a statistical model, so we should also focus attention on the robustness of our procedures.
Missing data An irony of the Big Data era is that the more subjects we have, and the more variables we measure, the more likely we are to encounter missing data.  Complete-case analyses become infeasible, and new methods for making appropriate inferences are required.
Robustness We tend to value highly results on the optimality of statistical procedures under various conditions.  But with Big Data, it's increasingly hard to believe in a statistical model, so we should also focus attention on the robustness of our procedures.
Interdisciplinary research Interdisciplinary means across schools and departments in a university setting, and between academia and industry. We have built such centers to do this.
Essential Collaboration Skills for Statisticians? What communication and collaboration skills do statisticians and data scientists need to thrive in this era of interdisciplinary research? How can individual statisticians learn these skills, and how can we teach them at scale to keep statistics relevant?
Object Data Analysis (ODA) ODA is BDA of objects of high complexity, such as 3D shapes, textured surfaces, spaces of colors, affine and  projective shapes ( AI - machine vision). Such object spaces are often compact, allowing for a more nuanced analysis of location and variability
TDA (Topological Data Analysis) As Big Data accumulates, topology types of supports of probability distributions appear to be nontrivial, unlike one would expect  in classical data  analysis, including in high or even infinite dimensional cases. TDA -a technique available to tackle this
Definitions There is a need for up-to-date definitions for statistics, data science, analytics, and business analytics.
Scaling statistical methodology on large systems Some communities are suggesting that statistics can be replaced with lots of data and computing. To demonstrate where this is not the case, we need to be able to play with lots of data using statistical methodology on large systems.
Anchoring statistics in important domain problems Statistics has drifted away from domain problems relative to 100 years ago and this is causing us to lose talents other than mathematical and have very limited impact and relevance to society -- in the long run, we will lose the legitimacy of stats depts.
DS vs Statistics We should not differentiate stats away from DS because they overlap a lot. Instead we need to strength statistics and include machine learning,and causal inference and train stats major with quantitative critical thinking (by adding case studies).
Emerging applications Neuroscience, material science, imaging in biology, precision medicine
Success stories and valuable lessons UK statisticians' leadership in cambridge cancer research inst (Simon Tavare), oxford big data inst (Gill McVean) and oxford human genetic inst (peter donnelley)
Most critical areas for the future quantitative critical thinking, communication/leadership skills -- patience to get into an important domain problem (it takes easily 2 or 3 years before publications): incentives and reward systems for people who engage deeply in a domain problem
Funding priorities at NSF and other agencies Value only quality and impact (not quantity) and negative incentives for publishing too many papers -- panelists need to implement this vision too.
Statistical Opportunities in the Physical Sciences There�s a need for new statistical methodology for complex data problems emerging in the physical sciences. Common themes: spatiotemporal data, uncertainty quantification, ill-posed inverse problems, and computationally intensive simulations
Challenges of Repeated Analyses How can we manage repeated analyses of publicly available data to control multiple test issues?
Extract information from big and complex data The statistical formulation of data information, as contained in a well-defined probability model, would be very limited for big and complex data. We need new thinking on data information and how Statistics can contribute to extract data information.
Translating Population models to individual health Most models on health related data do not translate to clinical change.  
Data vs. Intellectual satisfaction Data shouldnâ��t be used as an example to demonstrate a complicated method rather a data should drive the method. Embrace the data and spread this vive towards our students.  Think intelligent, simple, implementable and communicable solution to a problem.
Emerging application Neuroscience and the interface between statistics and neuroscience
The role of statistics in the era of big data A critical challenge facing statisticians is to establish our unique role in the era of big data. What are research or application areas statisticians can make unique contributions to? What kind of training in theory and practice our PhD students need?
How can Statisticians involve more in Science Computer simulations are widely used in science. By developing modeling and uncertainty quantification tools for simulations, I think Statisticians can significant impact in Scienc.
Uncertainty quantification for learning methods Most of the popular machine learning methods focus on point prediction. I think Statisticians can contribute a lot on a more robust and interpretable prediction and quantify the prediction uncertainty.
Increase interactions between theory/applications Modern statistical inference has been shaped by recent advances in science and technology. For statistics to advance, we need to encourage sustained collaborations between statisticians and domain scientists working in theory, methods, and applications.

The opinions, findings, and conclusions or recommendations expressed on this site are those of the author(s) and do not necessarily reflect the views of Knowinnovation Inc.