Categorize Titanic data hierarchically
I want to cluster the Titanic data and plot a graph. I am looking for a
package with functions that can cluster categorical data hierarchically,
in the way that I show below.
My earlier post has no relation to this question.
df<-data.frame(Titanic)
df_Crew <- df[df$Class=="Crew",]
L <- lapply(1:4, function(i) aggregate(df_Crew$Freq, by=df_Crew[1:i], sum))
L2 <- lapply(L, function(d) data.frame(group=do.call(paste,
c(as.list(d[names(d)!="x"]), sep="_")), freq=d$x))
L3<-data.frame()
for(i in 1:3){
d<-cbind(from=rbind(L2[[i]],L2[[i]])$group,L2[[i+1]])
L3<-rbind(L3,d)
}
L3
from group freq
1 Crew Crew_Male 862
2 Crew Crew_Female 23
3 Crew_Male Crew_Male_Child 0
4 Crew_Female Crew_Female_Child 0
5 Crew_Male Crew_Male_Adult 862
6 Crew_Female Crew_Female_Adult 23
7 Crew_Male_Child Crew_Male_Child_No 0
8 Crew_Female_Child Crew_Female_Child_No 0
9 Crew_Male_Adult Crew_Male_Adult_No 670
10 Crew_Female_Adult Crew_Female_Adult_No 3
11 Crew_Male_Child Crew_Male_Child_Yes 0
12 Crew_Female_Child Crew_Female_Child_Yes 0
13 Crew_Male_Adult Crew_Male_Adult_Yes 192
14 Crew_Female_Adult Crew_Female_Adult_Yes 20
Then I could create a tree like:
library(igraph)
g <- graph.data.frame(L3, directed=TRUE)
plot(g,layout=layout.reingold.tilford(g,root=1),edge.arrow.size=0.5)
A different layout the tree would be better, but that is not related to
the question.
No comments:
Post a Comment