Erin & Anne present CATPAC & WordStat
Download
Report
Transcript Erin & Anne present CATPAC & WordStat
+
CATPAC
&
WordStat
Anne D. Sito
&
Erin Sonenstein
COM 633: FA 09
+
CATPAC
+
Overview of CATPAC
Designed
to recognize
frequently used words in text
Identifies and groups
patterns of similar words
Provides output of clustering
algorithms, perceptual maps,
and interactive clustering
+
Data Preparation: Text
+ 1. Convert document into .txt file
+
2. Inputting Data
+
3. Select Text File You Want to Analyze
+
4. Select “Make Dendrogram”
+
5. Initial Output Screen
+
6. Output Data Screen
+
7. Output: Dendrogram
+
8. Data Presented in ThoughtView 2D
+
9. Data Presented in ThoughtView 3D
+
10. Thought View 3D (Rotated)
+
Discussion and Limitations
+’s
Found words like “you”, “you’ll”, & “and” to be the most used
in this text.
Examines relationships between words based on proximity
in the text.
-’s
Words are measured based on frequency, not importance.
Focuses less on what words “mean” or how they fit together based
on dictionaries.
+
WordStat
http://www.provalisresearch.com/wordstat/wordstat.html
+
Overview of WordStat
Content Analysis Module for SIMSTAT
Specifically designed to process textual information geared
for open-ended data which includes: journal articles,
speeches, electronic communication, interviews, etc.
Has existing dictionary library and can also run analyses
from new dictionaries built by the user
Can perform statistical analyses (i.e., factor analysis, word
frequencies, multiple regression, etc.)
KWIC: Key Word In Context tables are available for any
included or not included word or word pattern
+ Data: Comparing Reviews of the Book on
Amazon.com Between Men and Women
+
1. Create a Text File
+
2. Input Text File to WordStat
+
3. Define Your Variables
+
4. Running the Analysis
+
5. Existing Dictionary Was Not
Relevant for Our Data
+
6. New Dictionary Available Online!
+
7. (Free) New Dictionary Download
+
8. Import New Dictionary; Maintain
Exclusion List
+
9. Level 1 Analysis
+
10. Level 2 Analysis
+
11. Overall Frequencies
+
12. Gender Differences
+
13. Dendrogram
+
14. Clustering
+
15. 3-D Figure of Output
+
16. Concurrence Matrix
+
17. KWIC by Gender
+
18. Words by each Text Case
+
19. Word Count Category
Frequency
+
20. Aggression Example
+
21. Limitations: Terrific=Anxiety?
+
Discussion & Limitations
Allows multiple independent variables
Dictionaries may not always be complete
Words in .txt file must be be spelled correctly
Could not distinguish between quotes from the book and
original thoughts
May not account for different usage of certain words, (e.g.,
combating, terrific)
+
Any Questions?
Thank You!