Abstract— Inthis paper, we propose a new system of continuous generalization called Sumblrto solve the problem. Unlike traditional methods of adding documents, whichfocuses on small-scale and static data set, Sumblr designed to work with adynamic stream, coming quickly and on a large scale of tweets. Our proposedstructure consists of three main components. First, we offer Tweets FlowAlgorithm online flowchart to create and maintain tweets statistical groupsdistilled into a data structure called a tweet-cluster-vector (TCV). Second, we developed a TCV-Rank addition technique for generatingonline reports and historical reports from arbitrary time periods. Third, wedeveloped an effective method to detect the evolution of the subject, whichfollows changes based on the resume/volume to automatically generate a timeseries of tweets flows. Our tweets in real large-scale experiments demonstratethe efficiency and effectiveness of our framework.
In this article, a newstructure is introduced, called Sumblr generalization (continuouSsUMmarization By stream cLusteRing). The structure consists of three maincomponents: Sequence Module Sequence Clustering, Summing Module, and High-levelgeneration timeline module. The current clustering module for tweets, wedeveloped an efficient algorithm for tweets flow pooling, an online algorithmto efficiently group tweets with a single pass through the data. The module ofthe high-level sum compatible with the creation of two types of reports: onlineand historical reports. The core module generates a wires evolution algorithmfor timeline detection, which uses online/historical summaries to generateschedule in real time/range timelines. The algorithm controls the amount ofchange during flow processing.