Some topic analysis is done here, using techniques from Latent Dirichlet Allocation.


The data (stamp: ~ 2014-04-05 21:23 CET) is publicly available from the INDECT website:

For the time being the following 80 documents (PDF) are given: (raw text)


Up to now, after converting the documents via pdftotext to raw text, removing stop words, following topics emerge:

indect portal data system user web users services information www
indect block output key public function evaluation behavioural system project
video system data indect camera detection station event prototype communication
object detection indect event analysis events fig features objects final
methods indect data pro relation entity public www project consortium
indect project university security technology ethical issues www consortium agh
image results indect figure algorithm deliverable face detection images system
indect security wp data system information public ontology access management
information search system indect text pattern tool www pu enhanced
indect network protocol nodes node manet routing protocols analysis networks

Ten groups of topics, using ten words per group. One line, one group.

Meine Werkzeuge