TY - BOOK AU - Acs,S. AU - Arnes Novau,X. AU - Hradec,J. AU - Listorti,G. AU - Macmillan,C. AU - Ostlaender,N. AU - Tomas,R. ED - Comisión Europea TI - Semantic text analysis tool: SeTA : supporting analysts by applying advanced text mining techniques to large document collections T2 - JRC Technical Reports SN - 978-92-76-01518-5 SN - 1831-9424 PY - 2019/// CY - [Luxemburgo] PB - Oficina de Publicaciones KW - Tecnologías habilitadoras digitales KW - análisis de la información KW - búsqueda documental KW - informática documental KW - informe de investigación KW - obra de referencia KW - tecnología de la información N1 - Bibliografía: p. 43-43 N2 - Much of the world's data is textual – in large document archives, in scientific papers, in scattered websites, in social media. The information contained in text is invaluable and yet hard to access. The sheer volume of text means that, unassisted, we cannot hope to read all available sources, nor even to keep up to date with all advances in a particular field. For example, EUR-Lex, the database of EU Legal texts, grows by over 15 000 texts per year while Scopus, a database of scientific papers, has over 70 million entries. The problems of scale are compounded by other challenges such as the breadth of topics covered, their jargon specific to each field and the changes in meanings of phrases over time. The mission of the JRC is to provide scientific support to policy development, through original and applied research and knowledge management (JRC Strategy 2030). The challenges of accessing information "trapped in text" are very relevant to this mission of the JRC, as timely, relevant information is needed at all stages of the policy development process. To help overcome the challenges posed by text the JRC has produced a new tool, SeTA – Semantic Text Analyser – which applies advanced text analysis techniques to large document collections, helping policy analysts to understand the concepts expressed in thousands of documents and to see in a visual manner the relationships between these concepts and their development over time. A pilot version of this tool has been populated with hundreds of thousands of documents from EUR-Lex, the EU Bookshop and other sources, and used at the JRC in a number of policy-related use cases including impact assessment, the analysis of large data infrastructures, agri-environment measures and natural disasters. The document collection which have been used, the technical approach chosen and key use cases are described in this document UR - http://publications.europa.eu/publication/manifestation_identifier/PUB_KJNA29708ENN UR - https://data.europa.eu/doi/10.2760/577814 ER -