Right now, as you’ve probably figured out, huge amounts of data is being collected about every conceivable topic. From interactions between people to scientific data, everyone wants to track something. Organizations, researchers, companies, and governments are creating new tools to meet their data needs. While much of the collected information collected is proprietary or private, many teams leave their results open.
There is a growing call for more of this accesible information from the open data movement. You may have seen progress already from sources like the US government (www.data.gov) and the UK government, (www.data.gov.uk). Across the web there are many open data sets that you can browse for your own purposes. One large and especially powerful one is GDELT.
The Global Data on Events, Location, and Tone (GDELT) is an open project that bills itself as a “global database of society.” The end result is a system of datasets that record broadcast, print, and online news. GDELT uses data mining and natural language algorithms to codify each article based on a variety of factors including themes, sources, locations, and emotions. The data is accessible to anyone who wants to use it. Scores of people, teams, and organizations have used this data to create some amazing projects. This freely available source of big data can be a game-changer for mission success.
The most common uses of the GDELT data are for mapping, event analysis, and studying media. The data is invaluable for creating powerful visualizations and drilling down for more analysis. It can also applied to create one-off snapshots as well as real-time and ongoing updates.
The GDELT Project, lead by Kalev Leetaru, has partnered with Google to give it immense power for advanced users. Average users can also export a variety of data through searches. The database has incredible reach and breadth for use in a variety of applications. The system monitors sources in 65 languages and articles date back to 1979. New data gets added every 15 minutes.
The original data came in 2 flavors, the Event Database and the Global Knowledge Graph (GKG). In January 2016, they introduced the Visual Global Knowledge Graph (VGKG) which includes millions of images. The project also maintains special collections of the GKG.
The easiest way to access the database is using the GDELT Analysis Service. Anyone can enter keywords about the data they’re looking for and immediately export the results. The exports include spreadsheets, timelines, maps, word clouds, and graphs. The spreadsheets are the most advanced exports and are the best for drilling down. You will need to review the documentation to make the best use of it. If you have programming resources at your disposal, you can use Google BigQuery or download the entire dataset for more advanced querying and analysis.
Have a look at this powerful tool to see if it’s right for your project. If you’re interested in learning more about GDELT, check out the links or post a comment!
Originally published on April 4, 2017.