Semantic text analysis tool

SeTA : supporting analysts by applying advanced text mining techniques to large document collections.

Autor(es):
Acs, S | Arnes Novau, X | Hradec, J | Listorti, G | Macmillan, C | Ostlaender, N | Tomas, R
Comisión Europea Centro Común de Investigación
Series JRC Technical ReportsEditor: [Luxemburgo] : Oficina de Publicaciones, 2019Descripción: 47 p. : il. colISBN: 978-92-76-01518-5ISSN: 1831-9424Serie normalizada: JRC Technical ReportsTema(s): Tecnologías habilitadoras digitales | análisis de la información | búsqueda documental | informática documental | informe de investigación | obra de referencia | tecnología de la informaciónRecursos en línea: Acceso a la publicación

Acceso a la publicación
Resumen: Much of the world's data is textual – in large document archives, in scientific papers, in scattered websites, in social media. The information contained in text is invaluable and yet hard to access. The sheer volume of text means that, unassisted, we cannot hope to read all available sources, nor even to keep up to date with all advances in a particular field. For example, EUR-Lex, the database of EU Legal texts, grows by over 15 000 texts per year while Scopus, a database of scientific papers, has over 70 million entries. The problems of scale are compounded by other challenges such as the breadth of topics covered, their jargon specific to each field and the changes in meanings of phrases over time. The mission of the JRC is to provide scientific support to policy development, through original and applied research and knowledge management (JRC Strategy 2030). The challenges of accessing information "trapped in text" are very relevant to this mission of the JRC, as timely, relevant information is needed at all stages of the policy development process. To help overcome the challenges posed by text the JRC has produced a new tool, SeTA – Semantic Text Analyser – which applies advanced text analysis techniques to large document collections, helping policy analysts to understand the concepts expressed in thousands of documents and to see in a visual manner the relationships between these concepts and their development over time. A pilot version of this tool has been populated with hundreds of thousands of documents from EUR-Lex, the EU Bookshop and other sources, and used at the JRC in a number of policy-related use cases including impact assessment, the analysis of large data infrastructures, agri-environment measures and natural disasters. The document collection which have been used, the technical approach chosen and key use cases are described in this document.
    Valoración media: 0.0 (0 votos)
Tipo de ítem Ubicación actual Colección Signatura Estado Notas Fecha de vencimiento Código de barras
Informes Informes CDO

El Centro de Documentación del Observatorio Nacional de las Telecomunicaciones y de la Sociedad de la Información (CDO) os da la bienvenida al catálogo bibliográfico sobre recursos digitales en las materias de Tecnologías de la Información y telecomunicaciones, Servicios públicos digitales, Administración Electrónica y Economía digital. 

 

 

Colección digital Acceso libre online pdf 1000020175336

Bibliografía: p. 43-43.

Much of the world's data is textual – in large document archives, in scientific papers, in scattered websites, in social media. The information contained in text is invaluable and yet hard to access. The sheer volume of text means that, unassisted, we cannot hope to read all available sources, nor even to keep up to date with all advances in a particular field. For example, EUR-Lex, the database of EU Legal texts, grows by over 15 000 texts per year while Scopus, a database of scientific papers, has over 70 million entries. The problems of scale are compounded by other challenges such as the breadth of topics covered, their jargon specific to each field and the changes in meanings of phrases over time. The mission of the JRC is to provide scientific support to policy development, through original and applied research and knowledge management (JRC Strategy 2030). The challenges of accessing information "trapped in text" are very relevant to this mission of the JRC, as timely, relevant information is needed at all stages of the policy development process. To help overcome the challenges posed by text the JRC has produced a new tool, SeTA – Semantic Text Analyser – which applies advanced text analysis techniques to large document collections, helping policy analysts to understand the concepts expressed in thousands of documents and to see in a visual manner the relationships between these concepts and their development over time. A pilot version of this tool has been populated with hundreds of thousands of documents from EUR-Lex, the EU Bookshop and other sources, and used at the JRC in a number of policy-related use cases including impact assessment, the analysis of large data infrastructures, agri-environment measures and natural disasters. The document collection which have been used, the technical approach chosen and key use cases are described in this document.

Reutilización autorizada, con indicación de la fuente bibliográfica. La política relativa a la reutilización de los documentos de la Comisión Europea fue establecida por la Decisión 2011/833/UE (DO L 330 de 14.12.2011, p. 39). Cualquier uso o reproducción de fotografías u otro material que no esté sujeto a los derechos de autor de la Unión Europea requerirá la autorización de sus titulares ; Unión Europea.

No hay comentarios en este titulo.

para colocar un comentario.

Haga clic en una imagen para verla en el visor de imágenes

Copyright© ONTSI. Todos los derechos reservados.
x
Esta web está utilizando la política de Cookies de la entidad pública empresarial Red.es, M.P. se detalla en el siguiente enlace: aviso-cookies. Acepto