Some topic analysis is done here, using techniques from Latent Dirichlet Allocation.


The data (stamp: \~ 2014-04-05 21:23 CET) is publicly available from the INDECT website:

For the time being the following 80 documents (PDF) are given: (raw text)


Up to now, after converting the documents via pdftotext to raw text, removing stop words, following topics emerge:

indect portal data system user web users services information www<br /> indect block output key public function evaluation behavioural system project<br /> video system data indect camera detection station event prototype communication<br /> object detection indect event analysis events fig features objects final<br /> methods indect data pro relation entity public www project consortium<br /> indect project university security technology ethical issues www consortium agh<br /> image results indect figure algorithm deliverable face detection images system<br /> indect security wp data system information public ontology access management<br /> information search system indect text pattern tool www pu enhanced<br /> indect network protocol nodes node manet routing protocols analysis networks

Ten groups of topics, using ten words per group. One line, one group.