I have big file which contains string information : postal addresses. Address example : "1780 wemmel rue hendrik de mol 59/7"
I need to do a PCA analysis on that Data in order to identify on the individuals graph the clusters that represent the physicals delivery posts (building, companies, ...). To do that I need to extract numeric (or not numeric) relevant information from the strings and make it my attributes, then I can analyze it using PCA.
I started with creating 36 attributes (A-Z and 0-9) that represent the occurrence of each alpha character and digit. But the PCA doesn't give a good result yet, I need to extract more attributes that can characterize the Data.
I need your ideas about what I can extract from the Data to have a good representation of the clusters on the individual graph. I'm using R.
Thank you.