Erin & Anne present CATPAC & WordStat

Download Report

Transcript Erin & Anne present CATPAC & WordStat

+
CATPAC
&
WordStat
Anne D. Sito
&
Erin Sonenstein
COM 633: FA 09
+
CATPAC
+
Overview of CATPAC
Designed
to recognize
frequently used words in text
 Identifies and groups
patterns of similar words
Provides output of clustering
algorithms, perceptual maps,
and interactive clustering
+
Data Preparation: Text
+ 1. Convert document into .txt file
+
2. Inputting Data
+
3. Select Text File You Want to Analyze
+
4. Select “Make Dendrogram”
+
5. Initial Output Screen
+
6. Output Data Screen
+
7. Output: Dendrogram
+
8. Data Presented in ThoughtView 2D
+
9. Data Presented in ThoughtView 3D
+
10. Thought View 3D (Rotated)
+
Discussion and Limitations
+’s

Found words like “you”, “you’ll”, & “and” to be the most used
in this text.

Examines relationships between words based on proximity
in the text.
-’s

Words are measured based on frequency, not importance.

Focuses less on what words “mean” or how they fit together based
on dictionaries.
+
WordStat
http://www.provalisresearch.com/wordstat/wordstat.html
+
Overview of WordStat

Content Analysis Module for SIMSTAT

Specifically designed to process textual information geared
for open-ended data which includes: journal articles,
speeches, electronic communication, interviews, etc.

Has existing dictionary library and can also run analyses
from new dictionaries built by the user

Can perform statistical analyses (i.e., factor analysis, word
frequencies, multiple regression, etc.)

KWIC: Key Word In Context tables are available for any
included or not included word or word pattern
+ Data: Comparing Reviews of the Book on
Amazon.com Between Men and Women
+
1. Create a Text File
+
2. Input Text File to WordStat
+
3. Define Your Variables
+
4. Running the Analysis
+
5. Existing Dictionary Was Not
Relevant for Our Data
+
6. New Dictionary Available Online!
+
7. (Free) New Dictionary Download
+
8. Import New Dictionary; Maintain
Exclusion List
+
9. Level 1 Analysis
+
10. Level 2 Analysis
+
11. Overall Frequencies
+
12. Gender Differences
+
13. Dendrogram
+
14. Clustering
+
15. 3-D Figure of Output
+
16. Concurrence Matrix
+
17. KWIC by Gender
+
18. Words by each Text Case
+
19. Word Count Category
Frequency
+
20. Aggression Example
+
21. Limitations: Terrific=Anxiety?
+
Discussion & Limitations

Allows multiple independent variables

Dictionaries may not always be complete

Words in .txt file must be be spelled correctly

Could not distinguish between quotes from the book and
original thoughts

May not account for different usage of certain words, (e.g.,
combating, terrific)
+
Any Questions?
Thank You!