Tech Radar, our Web Crawler
Our Web Crawler to scan the news in the IT world
Finding the newest trends in the Information Technology area by implementing a Web Crawler. With Tech Radar, Spindox Labs meets the need of the client to scan the web, in search for updated news regarding the IT world. Tech Radar analyses editorial content online based on the interests of the client and via tools of Augmented Intelligence. Content indexing is performed through classification techniques belonging to the NL area. Information, presented through a graphic interface comes from a vast amount of data sources.
Our client had one need: developing a tool to scan the web and search for the latest available trends in the Information Technology market. In order to develop our Web Crawler, multiple data sources were selected to adhere to multiple standards of reliability, accessibility and completeness through a unified solution.
- StackOverflow is unanimously relied on by developers for the extraction of information concerning technologies and their adoption trend in the IT panorama.
- StackShare is a search engine to explore technological stacks in the major global actors.
- Feedly is a news collector that allows the subscription to personalised feeds according to specific thematical areas. This gives referenced feedback on the interest levels regarding every news.
In the area of data analysis, too, different algorithms were combined:
- NL classification techniques, to codify the various technologies.
- Analysis of the connection levels between the technological tags, based on simultaneous occurrences of the tags, so that each technology is associated to one or more application areas.
- Classification of the technologies in terms of their maturity, operated through indicators such as date of appearance, peaks of usage, percentual degrowth and relative time.
The usefulness of the Tech Radar resides in it being a personalised system, able to automatically search specific information from a vast amount of data, as well as communicate these contents in a graphically structured and intuitive. Particularly, the contents selected by the Web Crawler are rearranged in two categories:
- A news page, indexed by their thematic area and filtered by their interest score, based on the engagement rate recorder on social media. Just like a specialised newspaper, the news page offers titles and summaries, selected from the search criteria set by the user.
- A technological radar for mapping, where the different slices represent the thematic areas (NL, IOT, Big Data, Augmented Intelligence) and the coloured pattern refer to the development of the different technologies (library, framework, language). The distance from the centre of the radar indicates the maturity level of the technologies.
Finally, information is validated by an analyst operator which, through a graphic interface, verifies the results and sends feedbacks on how to optimise them.