Courtesy of Khali Dalbaih
Courtesy of Khali Dalbaih

A $517,000 grant was recently given to UCR and several other collaborating universities to fund a database that is to be used for recording social media as a sort of historical record. This system,  named “Documenting the Now: Support Scholarly Use and Preservation of Social Media Content” or DocNow for short, will create a chronology of major events, described through the eyes of all the various users of social media, especially Twitter.

This database would provide a valuable resource to scholars of many fields who are looking for primary sources describing major historical events. Given the highly personal nature of social media, the data available on DocNow would guarantee its users infinitely diverse perspectives on the same event. It is also useful in real time — as events unfold and are added to DocNow, they become useful for nearly instantaneous analysis by researchers.

An individual tweet, to be sure, is probably of little significance to any researcher. What matters are the larger trends, and it is in these that DocNow has incredible potential. As a collection of millions of tweets, scholars can view the mass of data with a statistician’s eye; that is to say, they can attempt to use DocNow as a way to make conclusions about what people thought and did at a given point in time. For example, in viewing tweets about an event such as the Paris attacks, a researcher could conclude that a single tweet about a person who finds the tragedy truly hilarious is probably an outlier, when the larger trend is toward sympathy and horror.

All this said, there are limitations to the usefulness of DocNow. For starters, not every tweet is original. Research indicates that about 25 percent of all tweets are retweets, and therefore do not really represent the personal views of the user, certainly not to the degree that an original post might. Second, not every tweet can be called accurate, factual or sincere. Common sense tells that not everything on the Internet can be true, especially if its users are not fully informed or knowledgeable about a given topic.

Third, and most importantly, even after weeding out the copies, the lies and the blatantly (and often comically) wrong posts on social media, there is still an enormous amount of fluff that cannot possibly be of use to a scholar, current or future. What possible value a tweet from a college student saying “I’m eating breakfast” could have is not worth contemplating. Thus, a major impedance to assembling a useful database of tweets and other forms of social media is devising a way to filter out the meaningless garble that surely must make up a significant amount of the total data.

It is important to consider that even those larger social media trends should not necessarily be integral to any future scholar’s reporting on a subject. Granted, primary sources have their place in scholarly work, but they cannot replace numerical data, honest facts and the writing of experts on their field of study. At most, the information that will be available in DocNow can only be supplementary to research, and not the bones of the work.

Several options for filtering the tweets should be utilized, each having their benefits and detriments. One of these would be a sort of censor, in which tweets containing specific words, describing topics perceived to be of little or no historical value, are deleted. This is not to say that whole lines of thought could or should be removed; rather, anything pertaining to frivolous or meaningless subjects (i.e. “Kardashian,” “ice bucket challenge” or similar topics), which would only be a hassle for researchers to deal with, need to be eliminated from any database aiming at recording history. Basically, if it is not fit to be printed in a history book, it should not be on DocNow.

Another option for cutting down on the mass of tweets would be to search for tweets talking about specific historically significant events and add these to the database first. Since all these tweets that are to become part of DocNow are already stored in the cloud, it is just a matter of focusing on one event at a time and adding large quantities of tweets to the database after filtering them. This way, major events, such as the Paris attacks, the Black Lives Matter movement and the current presidential campaign season are recorded first, followed by less pressing matters, all the while adding new tweets on existing topics in real time if they pass the filter.

While creating such a database could be of value to students of history, it is also extremely relevant to the general public, and therefore should not be limited to being merely an academic tool. Any average citizen with an interest in history should be able to read about it through the eyes of other people. But, they also have the have the constraint of only being able to view so many tweets from a small number of sources. Thus, if DocNow is made publicly accessible when completed, it can become a tool for expanding the views of everyone, not just the subset of the population that is using it for academic purposes.

DocNow, if it lives up to its potential, provides the opportunity for a unique and powerful way of conducting a study of history, one that makes its value to humanity obvious.