ABSTRACT: Many numbers of users require sharing privatedata like electronic health records, financial transaction records or collegestudent’s records for data analysis and mining. Therefore anonymity is one ofthe most important privacy preserving techniques used for privacy concerns.Currently, the scale of data in many applications increases rapidly inaccordance with the Big Data trend. It is a big challenge for existing dataanonymization approaches to achieve privacy preservation on private orsensitive data sets due to their lack of efficiency. Here we introduce dataanonymization for processing large scale data using Distributed Bottom upapproach. In Bottom up approach we start process from bottom element of thetree that is child nodes and they are replaced with its Parent node.Distributed data anonymization improves the scalability and efficiency ofBottom up approach over existing approaches using MapReduce framework and it isexecuted until k-anonymity is violated. MapReduce increases parallelizationcapability of data anonymization on large scale data and it addresses thescalability problem of anonymizing large scale data for privacy preservation.
Keywords — Anonymization, Bottom up approach, MapReduce framework, Cloud, PrivacyPreservation