focus

The internet – a wealth of data

How IGB researchers harness the online world to generate research findings
Researchers need data to generate scientific findings. Two new research methods – culturomics and iEcology – use the internet for this purpose. These approaches offer many opportunities, especially for the exploration of aquatic habitats.

For example, IGB researchers use social media data to draw conclusions about the use of freshwaters. | Photo: shutterstock/Dmytrenko Vlad

Enormous quantities of photos, videos and texts of all kinds are posted on the internet every day. On YouTube alone, users upload 500 hours of video material per minute; the English version of Wikipedia now contains in excess of 6,000,000 articles. For a while now, scientists have also been taking advantage of big data on the internet: the term culturomics first appeared in an article on digitised books published in Science in 2010; in the past five years, the new line of research has also gained in importance in biodiversity research.

“Culturomics involves analysing how humans react to the environment, whereas iEcology focuses on the nature side of data. For example, we search for indications of how populations of certain species are developing or how ecological states are changing,” explained Gregor Kalinkat, a postdoctoral fellow belonging to the Light Pollution and Ecophysiology research group.

The researcher has already used the new methods in a number of studies. For one study, the results of which were published in summer 2020, a team led by Ivan Jarić from the Institute of Hydrobiology of the Academy of Sciences of the Czech Republic, former IGB researcher, analysed German, British and French websites that reported on species on the Red List of Threatened Species. “We were interested in finding out which threat factors the reports focused on, and particularly the importance assigned to invasive alien species,” stated Ivan Jarić. The most frequently mentioned threat factor was climate change, whereas reports on the role of invasive alien species were rare. “It had already been assumed that much more is known about the impact of climate change on species loss, resulting in more frequent reports. But our analysis has now enabled us to prove this in a simple, quick and cost-effective way,” Gregor Kalinkat remarked.

Key areas would be species monitoring, ecosystem status and anthropogenic impacts

The example highlights the potential of culturomics and iEcology. Gregor Kalinkat and Ivan Jarić are convinced that the new, internet-based methods offer many possibilities to investigate aquatic habitats. Together with other researchers, the two scientists have written an overview study in which they identify key areas where culturomics and iEcology can provide particularly valuable insights. These include the distribution of threatened, rare and alien species, ecosystem status and anthropogenic impacts. Gregor Kalinkat sees great potential in the area of monitoring: “We envisage automated species recognition to analyse background information in digital data, such as species captured unintentionally in the background of photos and videos. That would make it much easier to monitor less conspicuous elements of biodiversity, such as vegetation,” he remarked.

The study, published in late October 2020, also identifies some of the problems associated with the new, internet-based methods. For example, data becomes sparse as the distance from shore and water depth increases. In addition, social media users represent specific groups of the population, meaning that material posted online by tourists may contradict assessments and behaviours of local residents. One of the main problems is significant bias in the selection of species: while there are countless videos and photos of birds, amphibians and mammals, little material exists on fish and invertebrates. Data retrieval also poses difficulties. With commercial platforms such as Twitter, Google and Facebook, researchers use an interface to download the required data. “If these services change their interface, it causes us problems. Altered algorithms make temporal analysis difficult because data collected before and after the change can only be compared to a certain extent,” stated Ivan Jarić.

A comparison with offline data helps to validate the results

Simone Podschun, Project Coordinator of AQUATAG, has also encountered this problem. The aim of this project is to identify when and where particularly intensive use is made of freshwaters for recreational activities and to find out how recreational use can be managed more efficiently. The team uses social media data to determine visitor numbers – and it is repeatedly the case that code developed to retrieve data no longer works or overall data structure changed. The researchers analyse data from platforms such as Twitter and Strava, which is an app for runners, cyclists and water sports enthusiasts. “We are particularly interested in geo-referenced tweets, which enable us to see when and where they were posted. Data from Strava are a good addition because they provide us with extra information such as distances, times and the types of activities as well as number of athletes in a specific area,” stated Simone Podschun.

Since the number of social media users has increased, previous years can only be compared with caution: “More and more people use fitness trackers and post their data online,” commented the biologist and expert in geographic information systems. For this reason, the team led by Markus Venohr pays particular attention to relations – how many come when it is warm, and how many when temperatures drop? The benefits of using social media data are very clear to Simone Podschun: “On-site-counts are tremendously time-consuming and only provide a snapshot of reality. In contrast, online services provide us with information almost in real time: we were immediately able to see that the coronavirus led to an increase in the number of leisure activities of locals in the Spree-Havel area,” she stated. Nonetheless, “we are aware that social-media data are prone to outliers e.g. when a marathon is held or in areas with no cell phone connection”. The researchers do not rely solely on data available on Twitter and Strava. The AQUATAG team also compares the numbers of tweets from bathing lakes around Berlin with information provided by Berliner Bäder-Betriebe on the use of lidos. And the numbers of cycling activities are matched with electronic counts introduced by the city of Berlin for cyclists, the researcher reported.

Using YouTube to detect species distribution or to gain knowledge about recreational fishing

Two recent studies conducted by former IGB researcher Valerio Sbragaglia in collaboration with the fisheries professor Robert Arlinghaus from IGB show how video analysis provides valuable insights. Together with other researchers, the behavioural ecologist analysed YouTube videos that had been posted online by recreational anglers. One of the studies involved comparing recreational anglers and spearfishers. To this end, the team scrutinised videos of Italian recreational fishers who had caught an iconic Mediterranean fish species, the common dentex. The researchers were interested in how the size of the fish caught correlates with YouTube users’ social feedback. “To do this, we searched YouTube for videos that were appropriate for our research question, and then analysed the metadata, i.e. details such as the title and the description of the videos. We also looked to see how many likes, views and comments the videos received” reported the researcher, who completed a Leibniz-DAAD postdoc fellowship in the group of Robert Arlinghaus in 2017, and now works for the Spanish National Research Council’s Institute of Marine Sciences. The search was performed by a script developed in R (a free programming language) which simplified the process enormously, given that more than 20,000 had to be analysed.

The second study focused on the macroecological patterns in groupers, with a specific focus on the dusky grouper and the white grouper. In this case, too, the team led by Valerio Sbragaglia analysed a large volume of videos published by Italian recreational fishers on YouTube. “To ensure the correct classification of species, we also watched some of the videos for this study, but we are now trying to automatize this process as well” the researcher explained. The team discovered that the body mass of dusky grouper is often higher at deeper depths, but such pattern seems not to be exclusively related to exploitation adding a new perspective to a controversial discussion in fishery science. In addition, the researchers were able to use the digital information extracted from YouTube to quantitatively measure the recent northward expansion of the white grouper in the Mediterranean. The analysis therefore provided valuable insights – without the researchers actually setting eyes on a single fish.

Gregor Kalinkat is convinced that these new methods are here to stay: “The potential of culturomics and iEcology is growing so rapidly that current problems will become less important,” the researcher commented. However, it will not be possible to completely do away with traditional research methods: sampling cannot be replaced by photos of anglers which are analysed to learn more about the state of a lake.

Text: Wiebke Peters

Selected publications
November 2020

Expanding conservation culturomics and iEcology from terrestrial to aquatic realms

Ivan Jarić; Uri Roll; Robert Arlinghaus; Jonathan Belmaker; Yan Chen; Victor China; Karel Douda; Franz Essl; Sonja C. Jähnig; Jonathan M. Jeschke; Gregor Kalinkat; Lukáš Kalous; Richard Ladle; Robert J. Lennox; Rui Rosa; Valerio Sbragaglia; Kate Sherren; Marek Šmejkal; Andrea Soriano-Redondo; Allan T. Souza; Christian Wolter; Ricardo A. Correia
PLoS Biology. - 18(2020)10, e3000935
Contact person

Markus Venohr

Programme Area Speaker
Research group
River System Modelling

Robert Arlinghaus

Research Group Leader
Research group
Integrative Recreational Fisheries Management
Projects

Share page