The Economics and Finance Letters

June 2018, Volume 5, 2, pp 28-45

Financial Market Predictions with Factorization Machines: Trading the Opening Hour Based on Overnight Social Media Data

Johannes Stubinger


Dominik Walter


Julian Knoll

Johannes Stubinger 1 ,

Dominik Walter 1 Julian Knoll 3

  1. University of Erlangen-Nürnberg, Department of Statistics and Econometrics, Lange Gasse 20, 90403 Nürnberg, Germany 1

  2. Technische Hochschule Nürnberg Georg Simon Ohm, Keßlerplatz 12, 90489 Nürnberg, Germany 3

Pages: 28-45

DOI: 10.18488/journal.29.2018.52.28.45

Article History:

Received: 29 August, 2018
Revised: 05 October, 2018
Accepted: 08 November, 2018
Published: 03 December, 2018


This paper develops a statistical arbitrage strategy based on overnight social media data and applies it to high-frequency data of the S&P 500 constituents from January 2014 to December 2015. The established trading framework predicts future financial markets using Factorization Machines, which represent a state-of-the-art algorithm coping with high-dimensional data in very sparse settings. Essentially, we implement and analyze the effectiveness of support vector machines (SVM), second-order Factorization Machines (SFM), third-order Factorization Machines (TFM), and adaptive-order Factorization Machines (AFM). In the back-testing study, we prove the efficiency of Factorization Machines in general and show that increasing complexity of Factorization Machines provokes higher profitability – annualized returns after transaction costs vary between 5.96 percent for SVM and 13.52 percent for AFM, compared to 5.63 percent for a naive buy-and-hold strategy of the S&P 500 index. The corresponding Sharpe ratios range between 1.00 for SVM and 2.15 for AFM. Varying profitability during the opening minutes can be explained by the effects of market efficiency and trading turmoils. Additionally, the AFM approach achieves the highest accuracy rate and generates statistically and economically remarkable returns after transaction costs without loading on any systematic risk exposure.
Contribution/ Originality
This study contributes in the existing literature by predicting financial markets based on overnight social media data. For this purpose, we observe tweets about the S&P 500 companies during the time span in which stock markets are closed and forecast the future price changes based on the collected information.


Finance, Social media data, Factorization Machine, Overnight information, Statistical arbitrage, High-frequency trading.



This research was supported by the GfK Verein e. V., which funded the purchase of the Twitter data set. We are especially grateful to Raimund Wildner and Holger Dietrich for their commitment and effort over the course of this project. Furthermore, we would like to thank Ingo Klein for many helpful discussions.

Competing Interests:

The authors declare that they have no competing interests.


All authors contributed equally to the conception and design of the study.

