Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

A distributed architecture for large scale news and social media processing

A distributed architecture for large scale news and social media processing When designing a data processing and analytics pipeline for data streams, it is important to provide the data load and be able to successfully balance it over the available resources. This can be achieved more easily if small processing modules, which require limited resources, replace large monolithic processing software. In this work, we present the case of a social media and news analytics platform, called PaloAnalytics, which performs a series of content aggregation, information extraction (e.g., NER, sentiment tagging, etc.) and visualisation tasks in a large amount of data, on a daily basis. We demonstrate the architecture of the platform that relies on micro-modules and message-oriented middleware for delivering distributed content processing. Early results show that the proposed architecture can easily stand the increased content load that occasionally occurs in social media (e.g., when a major event takes place) and quickly release unused resources when the content load reaches its normal flow. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png International Journal of Web Engineering and Technology Inderscience Publishers

A distributed architecture for large scale news and social media processing

Loading next page...
 
/lp/inderscience-publishers/a-distributed-architecture-for-large-scale-news-and-social-media-h6FuMhH8BW
Publisher
Inderscience Publishers
Copyright
Copyright © Inderscience Enterprises Ltd
ISSN
1476-1289
eISSN
1741-9212
DOI
10.1504/IJWET.2020.114029
Publisher site
See Article on Publisher Site

Abstract

When designing a data processing and analytics pipeline for data streams, it is important to provide the data load and be able to successfully balance it over the available resources. This can be achieved more easily if small processing modules, which require limited resources, replace large monolithic processing software. In this work, we present the case of a social media and news analytics platform, called PaloAnalytics, which performs a series of content aggregation, information extraction (e.g., NER, sentiment tagging, etc.) and visualisation tasks in a large amount of data, on a daily basis. We demonstrate the architecture of the platform that relies on micro-modules and message-oriented middleware for delivering distributed content processing. Early results show that the proposed architecture can easily stand the increased content load that occasionally occurs in social media (e.g., when a major event takes place) and quickly release unused resources when the content load reaches its normal flow.

Journal

International Journal of Web Engineering and TechnologyInderscience Publishers

Published: Jan 1, 2020

There are no references for this article.