Data Science for Everyone

As a social scientist (PDF) and fellow of the Royal Statistical Society my skills include exploratory data analysis and confirmatory data analysis. Statistics was also the focus of my studies at the University of Cologne and Utrecht University (Netherlands). I got my doctorate at the Justus Liebig University Giessen for performing one of the first longitudinal media analyses and took additional courses in statistics at the University of Leuven (Belgium) and Bielefeld University. Reading Herbert George Wells, who said "statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write", made me interested in the Johns Hopkins University (USA) data science specialization (PDF) and the mathematics for machine learning specialization (PDF) of the Imperial College London (UK) on Coursera. Besides my current academic lectures I advise public as well as governmental organisations on the application of multivariate statistics, machine learning algorithms and current limitations of artificial intelligence by providing some catchy introductions to Python and R.

See you soon
Prof. Dr. Dennis Klinkhammer
University of Applied Sciences Teacher

Neural Networks with Python

Combining statistical thinking with a powerful programming language like Python can be used to create artificial neural networks. They are supposed to imitate neurons within the human brain in order to recognise patterns automatically and learn something new without the need to be specifically programmed. Given a common situation, as shown below, artificial neural networks can predict the correct output data (on the right) when provided with some corresponding input data (on the left):

A human brain identifies easily that the first input column seems to affect the output column. Thus, a new row of input data (010) should correspond to (0) as output data and (110) should correspond consistently to (1). By using a logistic regression model with three predictors (one for each column of the input data) the output data can be predicted correctly, if the automated learning process is capable of providing adjusted weights for each predictor. That's it - an artificial neural network that regocnises the patterns of each similar situation and adapts automatically. Furthermore, this code (ZIP) can be reprogrammed for linear and other non-linear contexts as well.

Screening for Political Extremism

Within social media left- and right-wing extremism can be considered as widespread phenomena with a rising number of radical content. A quantitative research focus (PDF) on the process of radicalisation and social structures may uncover underlying mechanisms like frames and pull-factors within YouTube. There placed extremist propaganda is highly frequented and commented on and can be accessed via Application Programming Interfaces (API) in order to identify actors that might be relevant for criminal charges. Since social media provides large amounts of unstructured data, statistical methods common for Big Data have to be applied. However, most relevant variables in order to identify extremist actors seem to be the number of comments, likes and replies on YouTube as well as the content of each comment which can be itemized via Natural Language Processing. Due to some security restrictions this specific code can't be made public, but a tutorial on Machine Learning can be found on this page.

Some Easy Chunks of R

Basic Analysis

Let's have some first experiences with R by using the SWISS (ZIP) dataset for sociological analysis.

Regression Models

Create a multivariate model with the MTCARS (ZIP) dataset and get some insights into applied physics.

Machine Learning

The IRIS (ZIP) dataset is a perfect playground in order to predict something that is really beautiful.

Nonlinear Models

Learn to calculate the goodness of fit with SIMULATED (ZIP) datasets and nonlinear regression models.

Data Visualisation

Install the package corrplot and visualise the bivariate structure inside the TREES (ZIP) dataset.

Clinical Studies

A TOOTHGROWTH (ZIP) dataset for comparing effects of ascorbin acid and orange juice in guinea pigs.