Some topic analysis is done here, using techniques from Latent Dirichlet Allocation.
The data (stamp: \~ 2014-04-05 21:23 CET) is publicly available from the INDECT website:
For the time being the following 80 documents (PDF) are given:
Up to now, after converting the documents via pdftotext to raw text, removing stop words, following topics emerge:
indect portal data system user web users services information www<br />
indect block output key public function evaluation behavioural system project<br />
video system data indect camera detection station event prototype communication<br />
object detection indect event analysis events fig features objects final<br />
methods indect data pro relation entity public www project consortium<br />
indect project university security technology ethical issues www consortium agh<br />
image results indect figure algorithm deliverable face detection images system<br />
indect security wp data system information public ontology access management<br />
information search system indect text pattern tool www pu enhanced<br />
indect network protocol nodes node manet routing protocols analysis networks
Ten groups of topics, using ten words per group. One line, one group.