Aus CCCZH

Task

Some topic analysis is done here, using techniques from Latent Dirichlet Allocation.

Data

The data (stamp: ~ 2014-04-05 19:23 CET) is publicly available from the American Civil Liberties Union (ACLU):

https://www.aclu.org/nsa-documents-search

For the time being following 213 documents (PDF) are given:

http://paste.the-compiler.org/index.php/view/8526681e (raw text)

Results

Up to now, after converting the documents via pdftotext to raw text, removing stop words, following topics emerge:

rel usa secret top fvey data target comint si analytic
nsa br top metadata fisa number compliance court order analysts
nsa intelligence national information al security activities classified communications declaration
information court order nsa application authorized metadata records intelligence states
government information section collection intelligence security states united privacy data
games game gaming world virtual influence online trends fouo activities
si ts nf noforn top secret metadata ras nsa id
court judge access review committee issues report act rules house
ll ii infonnation telephone se data el en ed es
intelligence states united foreign communications person information general activities persons

Ten groups of topics, using ten words per group. One line, one group.

Meine Werkzeuge