Topic-Analysis-NSA-Archive
Some topic analysis is done here, using techniques from Latent Dirichlet Allocation.
Data
The data (stamp: \~ 2014-04-05 19:23 CET) is publicly available from the American Civil Liberties Union (ACLU):
https://www.aclu.org/nsa-documents-search
For the time being following 213 documents (PDF) are given:
http://paste.the-compiler.org/index.php/view/8526681e (raw text)
Results
Up to now, after converting the documents via pdftotext to raw text, removing stop words, following topics emerge:
rel usa secret top fvey data target comint si analytic<br />
nsa br top metadata fisa number compliance court order analysts<br />
nsa intelligence national information al security activities classified communications declaration<br />
information court order nsa application authorized metadata records intelligence states<br />
government information section collection intelligence security states united privacy data<br />
games game gaming world virtual influence online trends fouo activities<br />
si ts nf noforn top secret metadata ras nsa id<br />
court judge access review committee issues report act rules house<br />
ll ii infonnation telephone se data el en ed es<br />
intelligence states united foreign communications person information general activities persons
Ten groups of topics, using ten words per group. One line, one group.