Login | Register

Triple Viz: A tool to explore document content from a graphical representation of subject-verb-object triples

Title:

Triple Viz: A tool to explore document content from a graphical representation of subject-verb-object triples

Dhananjaya, Jahnavi (2016) Triple Viz: A tool to explore document content from a graphical representation of subject-verb-object triples. Masters thesis, Concordia University.

[img]
Preview
Text (application/pdf)
Dhananjaya_MCompSc_F2016.pdf - Accepted Version
Available under License Spectrum Terms of Access.
11MB

Abstract

Most of the data available is unstructured. Text mining is the process of automatically extracting information from text. This thesis combines text mining with visualization to develop TripleViz, a lightweight, web-based tool used to process and analyze documents extracting subject-verb-object (SVO) triples, and visualize them as graphs. The SVO triples extracted from documents are visualized using the open-source visualization tools Turtled and Gephi. TripleViz extracts noun phrases and visualizes them in either full or head format to avoid overcrowding on the screen. For the same reason, TripleViz provides an option to select only triples that contain words of interest as provided by the user in the form of a word list. Within TripleViz, the user can also view color-coded output text highlighting words from a word list. This thesis presents an experiment in classifying newspaper articles and blogs into either "specific event" or "generic", which shows a moderate improvement over a strong baseline.

Divisions:Concordia University > Faculty of Engineering and Computer Science
Item Type:Thesis (Masters)
Authors:Dhananjaya, Jahnavi
Institution:Concordia University
Degree Name:M. Comp. Sc.
Program:Computer Science
Date:12 August 2016
Thesis Supervisor(s):Bergler, Sabine
Keywords:Text mining, TripleViz, SVO triples
ID Code:981517
Deposited By: JAHNAVI DHANANJAYA
Deposited On:08 Nov 2016 16:15
Last Modified:08 Nov 2016 16:15

References:

Eytan Adar. Guess: a language and interface for graph exploration. InProceedings of the SIGCHI conference on Human Factors in computingsystems, pages 791{800. ACM, 2006.Mathieu Bastian, Sebastien Heymann, Mathieu Jacomy, et al. Gephi: anopen source software for exploring and manipulating networks.ICWSM,pages 361{362, 2009.Dave Beckett and Art Barstow. N-triples.W3C RDF Core WG InternalWorking Draft, 2001.Benjamin B Bederson and Ben Shneiderman.The craft of information visu-alization: readings and re
ections. Morgan Kaufmann, 2003.Sabine Bergler and Jahnavi Dhananjaya. Graphical view of blog contentusing B2G. InW3PHI Workshop at the Twenty-Ninth AAAI Conferenceon Arti�cial Intelligence, 2015.Christian Bizer, Tom Heath, and Tim Berners-Lee. Linked data-the story sofar.Semantic Services, Interoperability and Web Applications: EmergingConcepts, pages 205{227, 2009.78
Christian Bizer, Jens Lehmann, Georgi Kobilarov, S�oren Auer, ChristianBecker, Richard Cyganiak, and Sebastian Hellmann. Dbpedia-a crystal-lization point for the web of data.Web Semantics: science, services andagents on the world wide web, 7(3):154{165, 2009.Michelle A Borkin, Azalea A Vo, Zoya Bylinskii, Phillip Isola, ShashankSunkavalli, Alfonso Oliva, and Hanspeter P�ster. What makes a visual-ization memorable?IEEE Transactions on Visualization and ComputerGraphics, 19(12), 2013.Michael Bostock, Vadim Ogievetsky, and Je�rey Heer.D3data-drivendocuments.IEEE transactions on visualization and computer graphics,17(12):2301{2309, 2011.John S Brownstein, Clark C Freifeld, Ben Y Reis, and Kenneth D Mandl.Surveillance sans frontieres: Internet-based emerging infectious disease in-telligence and the healthmap project.PLoS Med, 5(7):e151, 2008.Josep Maria Brunetti, S�oren Auer, and Roberto Garc��a. The linked datavisualization model. InInternational Semantic Web Conference (Posters& Demos), 2012.Andreas Bruns, Andreas Kornstadt, and Dennis Wichmann. Web applicationtests with selenium.IEEE software, 26(5):88{91, 2009.Hyunyoung Choi and Hal Varian. Predicting the present with Google Trends.Economic Record, 88(1), 2012.Wolfgang Nejdl Christian Kohlsch�utter, Peter Fankhauser. Boilerplate de-tection using shallow text features.WSDM, The Third ACM International79
Conference on Web Search and Data Mining New York City, NY USA.,2010.Prabhakar Raghavan Christopher D. Manning and Hinrich Sch�utze.Intro-duction to Information Retrieval. Cambridge University Press, 2008.Hamish Cunningham. Gate, a general architecture for text engineering.Com-puters and the Humanities, 36(2):223{254, 2002.Marie-Catherine de Marne�e and Christopher D Manning.The Stan-ford typed dependencies representation. InColing 2008: Proceedings ofthe workshop on Cross-Framework and Cross-Domain Parser Evaluation,pages 1{8, 2008.John Ellson, Emden Gansner, Lefteris Koutso�os, Stephen C North, andGordon Woodhull. Graphviz|open source graph drawing tools. InGraphDrawing, pages 483{484. Springer, 2001.Jon Ferraiolo, Fujisawa Jun, and Dean Jackson. Scalable vector graph-ics (svg) 1.1 speci�cation.World Wide Web Consortium (W3C). URLhttp://www. w3. org/TR/SVG11, 2003.Nigel Ford.Web developer. com guide to building intelligent Web sites withJavaScript. John Wiley & Sons, Inc., 1998.Thomas MJ Fruchterman and Edward M Reingold. Graph drawing by force-directed placement.Software: Practice and experience, 21(11):1129{1164,1991.G�unther Gediga, Kai-Christoph Hamborg, and Ivo D�untsch. Evaluation80
of software systems.Encyclopedia of computer science and technology,45(supplement 30):127{53, 2002.Kevin Gimpel, Nathan Schneider, Brendan O'Connor, Dipanjan Das, DanielMills, Jacob Eisenstein, Michael Heilman, Dani Yogatama, Je�rey Flani-gan, and Noah A Smith. Part-of-speech tagging for Twitter: Annotation,features, and experiments. InProceedings of the 49th Annual Meeting ofthe Association for Computational Linguistics: Human Language Tech-nologies: short papers-Volume 2, pages 42{47, 2011.Oktie Hassanzadeh and Mariano P Consens. Linked movie data base. InLinked Data On Web, 2009.Marti A Hearst. Untangling text data mining. InProceedings of the 37thannual meeting of the Association for Computational Linguistics on Com-putational Linguistics, pages 3{10, 1999.Marti Hearst. What is text mining.School of Information Meetings, UCBerkeley, 2003.Marti Hearst.Search user interfaces. Cambridge University Press, 2009.S Heymann, M Bastian, M Jacomy, C Maussang, A Rohmer, J Bilcke, andA Jacomy. Gexf �le format.GEXF Working Group, 2009.Matthew B Hoy. HTML5: a new standard for the Web.Medical referenceservices quarterly, 2011.Brian Johnson and Ben Shneiderman. Tree-maps: A space-�lling approach tothe visualization of hierarchical information structures. InVisualization,81
1991. Visualization'91, Proceedings., IEEE Conference on, pages 284{291,1991.Dan Klein and Christopher D Manning. Accurate unlexicalized parsing. InProceedings of the 41st Annual Meeting on Association for ComputationalLinguistics-Volume 1, pages 423{430. Association for Computational Lin-guistics, 2003.Graham Klyne and Jeremy J Carroll.Resource description framework(RDF): Concepts and abstract syntax. 2006.Michael Kuhn, Ivica Letunic, Lars Juhl Jensen, and Peer Bork. The siderdatabase of drugs and side e�ects.Nucleic acids research, page 1075, 2015.H�akon Wium Lie, Bert Bos, C Lilley, and I Jacobs. Cascading style sheets.WWW Consortium,(September 1996), 2005.Ste�en Lohmann, Philipp Heim, Timo Stegemann, and J�urgen Ziegler. Therel�nder user interface: Interactive exploration of relationships betweenobjects of interest. InProceedings of the 14th International Conferenceon Intelligent User Interfaces (IUI 2010), pages 421{422, New York, NY,USA, 2010. ACM.C Manning, T Grow, T Grenager, J Finkel, and J Bauer. Stanford tokenizer,2010.Christopher D Manning, Mihai Surdeanu, John Bauer, Jenny Rose Finkel,Steven Bethard, and David McClosky. The stanford corenlp natural lan-guage processing toolkit. InACL (System Demonstrations), pages 55{60,2014.
Mitchell P Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. Build-ing a large annotated corpus of English: The Penn Treebank.Computa-tional linguistics, 19(2):313{330, 1993.Melvin Earl Maron and John L Kuhns. On relevance, probabilistic indexingand information retrieval.Journal of the ACM (JACM), 7(3):218{219,1960.Larry Masinter, Tim Berners-Lee, and Roy T Fielding. Uniform resourceidenti�er (URI): Generic syntax.The Internet Society, 2005.Karl Moss.Java servlets. McGraw-Hill, Inc., 1999.Finn Arup Nielsen. AFINN.Informatics and Mathematical Modelling, Tech-nical University of Denmark, mar 2011.Azadeh Nikfarjam, Abeed Sarker, Karen O'Connor, Rachel Ginn, and Gra-ciela Gonzalez. Pharmacovigilance from social media: mining adversedrug reaction mentions using sequence labeling with word embedding clus-ter features.Journal of the American Medical Informatics Association,22(3):671{681, 2015.Natalya F Noy, Deborah L McGuinness, et al. Ontology development 101:A guide to creating your �rst ontology, 2001.Gabriele Paolacci, Jesse Chandler, and Panagiotis G Ipeirotis. Running ex-periments on amazon mechanical turk.Judgment and Decision making,5(5), 2010.Dale Patterson. Interactive 3d web applications for visualization of World
Health Organization data. InProceedings of the Australasian ComputerScience Week Multiconference, page 76. ACM, 2016.Samuele Pedroni and Noel Rappin.Jython essentials. " O'Reilly Media,Inc.", 2002.Guido Powell, Kate Zinszer, Jahnavi Dhananjay, Chi Bahk, Lawrence Mad-o�, John Brownstein, Sabine Bergler, and David Buckeridge. Monitoringdiscussion of vaccine adverse events in the media: Opportunities from thevaccine sentimeter. InW3PHI Workshop at the Thirtieth AAAI Confer-ence on Arti�cial Intelligence, 2016.Eric Prud'Hommeaux, Andy Seaborne, et al. SPARQL query language forRDF.W3C recommendation, 15, 2008.Eric Prud'hommeaux, Gavin Carothers, and Lex Machina. RDF 1.1 turtle.W3C Recommendation, 2014.Greg Roelofs and Richard Koman.PNG: the de�nitive guide. O'Reilly &Associates, Inc., 1999.Martin G Skj�veland. Sgvizler: A JavaScript wrapper for easy visualizationof SPARQL result sets. InThe Semantic Web: ESWC 2012 SatelliteEvents, pages 361{365. Springer, 2012.
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Downloads per month over past year

Back to top Back to top