Peter Hafez, chief data scientist at RavenPack, knows about processing large volumes of unstructured data. In an effort to beat benchmarks, investment companies sometimes say they are looking at the entire dataset of Twitter, known in the business as the "full firehose". In actual fact, few people can manage the sheer scale and storage challenges that come with it, not to mention the costs. A hypothesis-driven attempt to do some of this manually is possible but challenging; for instance, you could start searching the social media stream using a hashtag approach. Peter Hafez, chief data scientist at big data analytics...