Text as a source of data

Course description

The aim of the course is to provide students with a preliminary overview ofa fewdifferent methodologies used to convert digital text into input for economic research.


  • Dictionary basedmethods
  • “Text Regressions”
  • Supervised Machine Learning Methods
  • Unsupervised Machine Learning Methods



  • Acemoglu, D. and Hassan, T.A. and Tahoun, T. (2014). The Power of the Street: Evidence from Egypt's Arab Spring, NBER Working Papers 20665
  • Baker, S. R. and Bloom, N. and Davis, S. J. (2016) Measuring Economic Policy Uncertainty, The Quarterly Journal of Economics, Volume 131, Issue 4, 1, Pages 1593–1636,
  • Bandiera, O. and Hansen, S. and Prat, A. and Sadun, R. (2017) CEO Behavior and Firm Performance, NBER Working Papers 23248
  • Blei, D. (2012). Probabilistictopic models. Communications of the ACM 55, 77–84.
  • Blumenstock, J. and Cadamuro, G. and On, R. (2015). Predicting poverty and wealth from mobile phone metadata, Science. 350. 1073-1076.
  • Born, B. and Ehrmann, M. and Fratzscher, M. (2014). Central Bank Communication on Financial Stability, Economic Journal, vol. 124(577), pages 701-734, June.
  • Chalfin, A., Danieli, O., Hillis, A. and Jelveh, Z. and Luca, M. and Ludwig, S.  and Sendhil Mullainathan, S.  (2016). Productivity and Selection of Human Capital with Machine Learning. American Economic Review: Papers and Proceedings,106 (5),124-127.
  • Choi, H. and Varian, H. (2012) Predicting the Present with Google Trends, The Economic Record, 2012, vol. 88, issue s1, 2-9.
  • Dittmar, J. and Seabold, S. (2018), New Media and Competition: Printing and Europe’s Transformation After Gutenberg,mimeo
  • Fetzer, T. (2014) Social Insurance and Conflict: Evidence from India, EOPP Working Paper No. 53
  • Gentzkow, M. and J. M. Shapiro (2010, January). What drives media slant? Evidence from u.s. daily newspapers. Econometrica 78(1), 35–71.
  • Ginsberg J, et al. (2009) Detecting influenza epidemics using search engine query data. Nature 457(7232):1012–1014.
  • Griffiths, T. L. and M. Steyvers (2004). Finding scientific topics. Proceedings of the National Academy of Sciences 101, 5228–5235.
  • Groseclose, T., and J. Milyo (2005). A Measure of Media Bias,”Quarterly Journal of Economics, 120, 1191–1237.
  • Hansen, S., M. McMahon, and A. Prat (2018). Transparency and deliberation within the fomc: a computational linguistics approach. The Quarterly Journal of Economics.
  • Hoberg & Gordon Phillips (2016) Text-Based Network Industries and Endogenous Product Differentiation, Journal of Political Economy, vol 124(5), pages 1423-1465.
  • Khandani, A. E. and Kim, A. J. and Lo, A. W. (2010) Consumer Credit Risk Models Via Machine-Learning Algorithms, AFA 2011 Denver Meetings Paper
  • Kleinberg, J. and Lakkaraju, H. and Leskovec, J. and Ludwig, J. and Mullainathan, S. (2018). Human Decisions and Machine Predictions, Quarterly Journal of Economics. 133. 237-293.
  • Laver, M., K. Benoit, and J. Garry (2003, May). Extracting policy positions from political texts using words as data. The American Political Science Review 97(2), 311–331.
  • Lim, C. S. H., J. M. J. Snyder, and D. Str¨omberg (2015, October). The judge, the politician, and the press: Newspaper coverage and criminal sentencing across electoral systems. American Economic Journal: Applied Economics 7(4), 103–35.
  • Lucca, D. and Trebbi, F. (2009) Measuring Central Bank Communication: An Automated Approach with Application to FOMC Statements, NBER Working Papers 15367
  • Mcbride, L. and Nichols, A. (2016) Retooling poverty targeting using out-of-sample validation and machine learning, Policy Research Working Paper Series 7849, The World Bank.
  • Muco, A. (2018) Learn from thy neighbor: Do voters associate corruption with political parties?,mimeo
  • Nowak, A. and Smith, P. (2017). Textual Analysis in Real Estate,  Journal of Applied Econometrics, 32, 4, 896-918
  • Quinn, K. M., B. L. Monroe, M. Colaresi, M. H. Crespin, and D. R. Radev (2010, 228). How to analyze political attention with minimal assumptions and costs. American Journal of Political Science 54(1), 209.
  • Saiz, A. and Simonsohn, U. (2013) Proxying For Unobservable Variables With Internet Document-Frequency, Journal of the European Economic Association, European Economic Association, vol. 11(1), pages 137-165, February.
  • Stephens-Davidowitz, S. (2014). The cost of racial animus on a black candidate: Evidence using Google search data, Journal of Public Economics, vol. 118(C), pages 26-40.
  • Stock, J. H. and Trebbi, F. (2003) Who Invented Instrumental variable Regression, Journal of Economic Perspectives 17:177–194
  • Taddy, Matt (2013), Multinomial inverse regression for text analysis, Journal of the American Statistical Association 108.
  • Tetlock, P. (2007). Giving Content to Investor Sentiment: The Role of Media in the Stock Market, Journal of Finance, vol. 62(3), pages 1139-1168, June.