Some topic analysis is done here, using techniques from
Latent Dirichlet Allocation.
The data (stamp: ~ 2014-04-05 19:23 CET) is publicly available from
the American Civil Liberties Union (ACLU):
For the time being following 213 documents (PDF) are given:
Up to now, after converting the documents via pdftotext to raw text,
removing stop words, following topics emerge:
rel usa secret top fvey data target comint si analytic nsa br top metadata fisa number compliance court order analysts nsa intelligence national information al security activities classified communications declaration information court order nsa application authorized metadata records intelligence states government information section collection intelligence security states united privacy data games game gaming world virtual influence online trends fouo activities si ts nf noforn top secret metadata ras nsa id court judge access review committee issues report act rules house ll ii infonnation telephone se data el en ed es intelligence states united foreign communications person information general activities persons
Ten groups of topics, using ten words per group. One line, one group.