Data, graphics, and programming in R

Download Report

Transcript Data, graphics, and programming in R

Basics in R
part 2
Variable types in R
• Common variable types:
•
•
•
•
•
Numeric - numeric value: 3, 5.9, 0.0001
Logical - logical value: TRUE or FALSE (1 or 0)
Factor - categorical value: “male”, “female”
Strings - sequence of letters: “cat”
Lists – set of variables
• Typically variables can be given as scalars, vectors or matrixes
What is a factor?
• A categorical variable. Instead of numerical values a factor has
levels. Levels are indicated with a ‘tag’, i.e. the name of the level.
• Examples
–
–
–
–
Sex: “male” or “female”
Disease: “0” or “1”
Age class: “young”, “middle age”, “old”, “ancient”
Temperature treatment: “cold”, “intermediate”,”hot”
• Numeric variables can be converted into a factor by splitting the
values into categories
– [0,3), [3,8), [8,14), [14,…)
What is a logical variable?
• It always gets a value TRUE or FALSE
(can be indicated with T or F or sometimes coded as “0” and “1”)
• A logical variable is a response to a logical proposition
> a = 3
> a > 0
[1] TRUE
• Proposition can be a combination of several propositions
> (a>1)&(a<4)
> (a<1)|(a>2)
What is a string?
• a = “Species: Pinus sylvestris”
• Always given in quotation marks
• Typically used in graphics and parameterizations: for instance an
axis label for a graph has to be given as a string
What to do with variables: basic statistics
• Basic statistics for numeric variables
– mean: mean value
– var/sd: variance/standard deviation
– sum: sum of elements,
– min: minimum value
– max: maximum value,
– range: range within which values are
– quantile: proportion/probability quantiles
• For factors and logical variables: table() returns category
frequencies for factors, and frequencies of T and F for logical
variables
-> DEMO 1
Missing values
• Missing values are coded with NA
• When calculating statistics etc one has to declare how to
deal with NAs
xx=c(2,6,9,NA,11)
mean(xx,na.rm=T)
which() function
• Structure of the function call:
aa = which(“a logical proposition”)
Returns indices (locations)
for those elements of a
vector for which the
proposition is TRUE.
• This function is useful when choosing particular elements from a
vector
• Later on, this function will be used for subsetting datasets
ifelse() function
• Function call
Value to be given to the
corresponding element of
aa if test obtains value
TRUE
aa = ifelse(“a logical test for a vector”,yes,no)
Returns a vector of similar
length that the original vector:
each element contains the
value indicated above by ‘yes’
or ‘no’
-> DEMO 2