MapR Hadoop Distribution For Managing Big Data and Step by Step Mapr Cluster Install In The Best Way


  • Bigdata Manage
  • What is Hadoop?
  • Why is Hadoop important?
  • What is MapR Hadoop Distribution
  • Why you should choose MapR Hadoop distribution?
  • What are Services?
  • Steps Deploy Mapr Cluster
  • Mapr Components
  • Mapr Enterprise Edition or Community Edition
  • Migration Hdfs to Mapr-Fs
  • Advices before install and configure Mapr
  • Architectural Advice
  • Configure everything for installation
  • Pre-requests
  • Preparing
  • Mapr Installation on Admin Console
  • Licensing Process
  • Local installation
  • Known Issues
  • Mapr Forum
  • References


Bigdata Manage;

There are a few popular Hadoop distributions in the market such as  Cloudera, MapR, and Hortonworks.

What is Hadoop?

Apache Hadoop is an open source framework for efficiently storing and processing large datasets ranging in size from gigabytes to petabytes of data. Instead of using a single large computer to store and process data, Hadoop allows multiple computers to be clustered together to analyze large datasets more quickly in parallel.

Hadoop consists of four main modules:

HDFS, YARN, MapReduce, Hadoop Common.

Why is Hadoop important?

  • Ability to store and process huge amounts of any kind of data, quickly. This is an important consideration, especially with the ever-increasing volumes and variety of data from social media and the Internet of Things (IoT).
  • Computing power. Hadoop’s distributed computing model processes big data quickly.
  • Fault tolerance.  If a node goes down, jobs are automatically routed to other nodes to ensure distributed computation doesn’t fail. Multiple copies of all data are automatically stored.
  • Flexibility.  You can store as much data as you want and then decide how to use it.
  • Low cost. The open source framework is free and uses commercial hardware to store large amounts of data.
  • Scalability. You can easily grow your system to handle more data by simply adding nodes.

What is MapR Hadoop Distribution

The MapR Hadoop distribution is 100% plug-and-play compatible with Apache Hadoop, so you don’t need to make any changes to run your applications on a MapR cluster. MapR Hadoop automatically configures compression and memory settings, task heap sizes, and local volumes for mixed data.


The MapR hadoop deployment works on the concept that a market-driven entity should support market needs faster.

Unlike Cloudera and Hortonworks, MapR Hadoop Distribution has a more distributed approach to storing metadata in processing nodes, as it depends on a different file system known as the MapR File System (MapRFS) and does not have a NameNode architecture. MapR hadoop distribution is not based on Linux File system.

MapR is considered one of the fastest hadoop distributions.

Pig, Hive and Sqoop are the only Hadoop distributions with no Java dependencies – because they are based on MapRFS.


Why you should choose MapR Hadoop distribution?

Though MapR is still at number 3 in terms of number of installations, it is one of the easiest and fastest hadoop distributions when compared to others.

MapR is a great distributive file system which comes in two models: community edition and enterprise edition.

Hadoop Benchmark;


To more read;


What are Services?

A MapR cluster is a full Hadoop distribution. Hadoop itself consists of a storage layer and a MapReduce layer. In addition, MapR provides cluster management tools, data access via NFS, and a few behind-the-scenes services that keep everything running. Some applications and Hadoop components, such as HBase, are implemented as services; others, such as Pig, are simply applications that you run as needed. We will lump them together here, but the distinction is worth making.



  • MapReduce services: JobTracker, TaskTracker
  • Storage services: CLDB, FileServer, HBase RegionServer, NFS
  • Management services: HBase Master, Webserver, ZooKeeper

A daemon called the warden runs on every node to make sure that the proper services are running (and to allocate resources for them). The only service that the warden doesn’t control is the ZooKeeper. Part of the ZooKeeper’s job is to have knowledge of the whole cluster; in the event that a service fails on one node, it is the ZooKeeper that tells the warden to start the service on another node.

MapR Direct Access NFS offers usability and interoperability benefits and makes big data easier and cheaper to handle.

MapR allows for files to be modified and overwritten at high speeds in real time from remote servers via an NFS connection and provides multiple simultaneous reads and writes on any file.

MapR File System does not have a NameNode.

Mapr supports High Availability and Real-time Streaming and Ease of Data Integration and In YARN the Real Multi-tenancy


Steps Deploy Mapr Cluster

On very small clusters of just a few nodes, it’s impractical to isolate services on dedicated nodes. One layout approach is to run one CLDB and one ZooKeeper on the same node, leaving the other nodes free to run the TaskTracker. All nodes should run the FileServer. If you need HA in a small cluster, you will end up running the CLDB and ZooKeeper on additional nodes. Here is a sample layout:


Mapr Components


Hue,Impala, Webserver,Drill,Elasticsearch,HBase,Hive,HTTPFS,Impala,Kafka,Oozie,OpenTSDB,YARN,Spark,Zookeeper,Flume,Object store,Pig,Sqoop,Tez



Mapr Enterprise Edition or Community Edition

MapR is a great distributive file system which comes in two models: community edition and enterprise edition.

If you will use as prod , you should buy enterprise mapr .




Migration Hdfs to Mapr-Fs

Before you copy data from an HDFS cluster to a MapR cluster using the hdfs:// protocol, you must configure the MapR cluster to access the HDFS cluster. To do this, complete the steps listed in Configure a MapR Cluster to Access an HDFS Cluster for the security scenario that best describes your HDFS and MapR clusters, and then complete the steps listed in Verifying Access to an HDFS Cluster.

If the MapR cluster can read the contents of the file, run the distcp command to copy the data from the HDFS cluster to the MapR cluster:

hadoop distcp hdfs://<NameNode>:<NameNode Port>/<HDFS path> maprfs://<MapR-FS path>


Advices before install and configure Mapr

You should  use redhat 8 .

Root password will be reseted for direct access.

You should  use min 2 server.

You should  install epel repo .

Needs direct Internet or download related mapr packages.

Min resources  Mem=16gb (advices 64gb)  // Cpu=8    (advices 16cpu).

Needs high size on disks ( 64G   /tmp  ,  128G  /opt   , 32G   /home ) as lvm.

Needs min 3 raw disk min 15gb (not formatted)  . Do not use RAID . Do not use LVM (Logical Volume Manager).

All ip and hosts should be decribed on dns.

Selinux should be disabled.

Firewall should be stopped and disabled.

You should  install and configure ntp/ chronyd.

You should install and configure java jdk 8.

You should permit  port 9443  (mapr installation web interface port 9443) ,  port 8443 (mapr services web interface port 8443).

You should configure ssh passwordless between nodes.

You should configure  swap to 1  and transparent_hugepage to never .

You should stop and disable nfs ,nfs-server, nfs-lock.

You should configure  limits.conf (soft     nofile, hard     nofile, soft     nproc, hard     nproc) to 64000.

You should configure  ulimit -n 64000  in bash_profile.

You should configure  Umask 0022             .

PermitRootLogin to yes in  sshd_config.

You should configure  vm.overcommit_memory to 0 in sysctl.conf.

You should configure  /etc/pam.d/su for mapr.

You should configure  resolv.conf for network.


Architectural Advice

For cldb and ZooKeeper , On small clusters, may need to run CLDB and ZooKeeper on the same node.

For cldb and ZooKeeper , On medium clusters, assign to separate nodes.

For cldb and ZooKeeper , On large clusters, put on separate, dedicated control nodes.

For resourcemanager and ZooKeeper ,avoid running ResourceManager and ZooKeeper together.

For resourcemanager and ZooKeeper , with more than 200 nodes, run ResourceManager on dedicated nodes.

For large clusters , avoid running MySQL Server or webserver on a CLDB node.



Configure everything for installation

1-)  You should install and configure repo.

yum repolist all | grep enabled




yum -y install

dnf install

yum install



2-) Example /opt disk configure;

fdisk -l | grep ‘^Disk’

#Disk /dev/sde: 130 GiB, 139586437120 bytes, 272629760 sectors


fdisk /dev/sde

n p 1 (enter) (enter) w

mkfs.ext4 /dev/sde1


mkdir /opt/

mount /dev/sde1 /opt

df -Ph | grep opt


vi /etc/fstab

/dev/sde1               /opt           ext4    defaults        1 2


Other disks should be as below.

64G   /tmp

128G  /opt

32G   /home


3-) Row disks

fdisk -l

Mapr disk (added as row)

/dev/sdb: 15 GiB

/dev/sdc: 15 GiB

/dev/sdd: 15 GiB


4-) Hosts file for cluster . (We just use nosql2. )

cat /etc/hosts   localhost localhost.localdomain localhost4 localhost4.localdomain4

::1         localhost localhost.localdomain localhost6 localhost6.localdomain6 nosql1.localdomain nosql1 nosql2.localdomain nosql2


5-) It sholud be added to dns

W used just one node à nosql2


If there are multi-nodes , you must add to /etc/hosts nosql1.localdomain nosql1 nosql2.localdomain nosql2


#If you use virtual test machine


# nosql1.localdomain

# nosql2.localdomain



#vi /etc/hosts



6-1) Selinux should be disabled

cat /etc/selinux/config | grep SELINUX




#Disabled , değilse aşağıdaki işlem yapılır, reboot için irem hanım ile koardinasyon sağlanır.


vi /etc/selinux/config





shutdown -r now






6-2) Firewall should be stopped

systemctl stop firewalld

systemctl disable firewalld

systemctl stop iptables

systemctl disable iptables


7-) chrony install and configure

yum -y install chrony

vi /etc/chrony.conf






systemctl restart chronyd

chronyc sources



😎 You should install and configure java

Oracle java jdk,jre 8 download and configure on servers,


mkdir /usr/java

cd  /usr/java

tar -xvzf /tmp/jdk-8u281-linux-x64.tar.gz

chmod 755 /usr/java


cd  /usr/java

mv latest latest_old

ln -s /usr/java/jdk1.8.0_281 latest

ls -lrt

#default -> /usr/java/latest

#latest -> /usr/java/jdk1.8.0_281


java -version


vi /etc/profile

export JAVA_HOME=/usr/java/latest

export PATH=$JAVA_HOME/bin:$PATH





sudo update-alternatives –list

sudo update-alternatives –config java

sudo update-alternatives –config javac


9-) umask  should be configured


umask  0022



10-) Ssh connectivity should be configured between nodes .

We just used one node for installation.

If there are multi-nodes , you follow below ;


  1. ssh-keygen -t rsa

Press enter for each line

  1. cat ~/.ssh/ >> ~/.ssh/authorized_keys
  2. chmod og-wx ~/.ssh/authorized_keys


#You should do following step manually with vi

cat /root/.ssh/ >> root@nosql2:/root/.ssh/authorized_keys

cat /root/.ssh/ >> root@nosql1:/root/.ssh/authorized_keys


#You should monitor

cat /root/.ssh/

cat /root/.ssh/authorized_keys


# You should test each other

ssh root@localhost

ssh root@nosql1

ssh root@nosql2


11-) Swap should be configured

echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled

echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag


sysctl vm.swappiness=1

cat /proc/sys/vm/swappiness


12-) Nfs should be stopped


systemctl stop nfs

systemctl disable nfs-server

systemctl disable nfs-lock

systemctl stop nfs-server

systemctl stop nfs-lock



service mapr-loopbacknfs status

#Active: active (running)


13-) Ulimit in bash_profile

vim ~/.bash_profile

ulimit -n 64000


14- Tcp_retries2 Configure


cat /proc/sys/net/ipv4/tcp_retries2


15-) Limits.conf configure

vi /etc/security/limits.conf

*               soft     nofile          64000

*               hard     nofile          64000

*               soft     nproc           64000

*               hard     nproc           64000


16-) Home create


Mapr home create

mkdir -p /opt/mapr

chmod -R 777 /opt/mapr


Esearch home

mkdir -p /opt/mapr/es_db

chmod -R 777 /opt/mapr/es_db


17-) Sysctl.conf configure

vi /etc/sysctl.conf


sysctl -p


18-) Sshd_config

cat /etc/ssh/sshd_config

PermitRootLogin yes


19-) Pam configure

cp /etc/pam.d/su /etc/pam.d/su_bck

> /etc/pam.d/su

vi /etc/pam.d/su

#Check that the /etc/pam.d/su file contains the following settings:


# Uncomment the following line to implicitly trust users in the “wheel” group.

#auth           sufficient trust use_uid

# Uncomment the following line to require a user to be in the “wheel” group.

#auth           required use_uid

auth            sufficient

auth            include         system-auth

account         sufficient uid = 0 use_uid quiet

account         include         system-auth

password        include         system-auth

session         include         system-auth

session         required

session         optional


20-) Sample resolv.conf configure

cat /etc/resolv.conf

# Generated by NetworkManager

search home localdomain






Pre-requests must be provided on all nodes.

It must be installed on all nodes. (Otherwise, it throws an error for the second node because the map user has not been created.)




You download



You install on  all nodes. But we configure installation on just  first node for all nodes.

mkdir -p /tmp/mapr/

chmod -R 777 /tmp/mapr/

wget -P /tmp/mapr/

chmod +x /tmp/mapr/

sudo bash /tmp/mapr/

Connect browser, 

https://<Installer Node hostname/IPaddress>:9443






You can change pre-requests, but no advice.

vim /opt/mapr/installer/ansible/playbooks/library/prereq/

Example ,


You can set new env.



Installation web interface;





Version 6.2.0

MEP 7.0.1


Mapr Installation on Admin Console


1-)  Mapr Installation



2.1-) Version & Services

You can choose to install community or enterprise option.

You can choose to install custom services and versions.

You can choose licance option “after installation” or “login mapr hpe”



2.2-) You can choose to install custom services and versions.



3-) Database Setup

If you choose to install Hue, Hive, Oozie you will install mysql and you will choose related users.



4-) Monitoring

Grafana will be installed .






5-) Set Up Cluster

Mapr user was created with mapr-setup.

Cluster Name must be defined in dns.





6-) Node configuration.

Nodes can be written one under the other . Advice min 2 servers. Nodes  must be defined in dns.

Disks must be raw and must be separated with comma . Advice min 3 raw disks

Ssh user should be mapr. You can use root too.

All the data transfer NICs should be the same speed; if you use different speeds, all the NICs will operate at the lowest speed. If MAPR_SUBNETS is not set, MapR uses all NICs present on the node. For more information, see Designating NICs for MapR.





7.1-) Verify Pre-checks for nodes.

All nodes  will be checked if they satisfy minimum requirements.

Inprogress status codes;

White – Verification InProgress

Green – Ready for installation

Yellow – Warnings but can be installed

Red – Node cannot be part of cluster





7.2-) Critical and fail status should be fixed. Warning.  The warning can be passed, but it would be helpful to fix it later.


😎 Progress Confirmation




9.1-) Configuration Service Layout



9.2-) Node Layout




9.3-) Advanced Component Configuration




10-) Licensing




11-) Installation Step




12-) Installation Complete





13-) Mapr Login



14-) Licence Accept



15-)  Overview



16-) Overview



17-) Services interface



18-) Node interface



19-) Volume interface



20-) Tables interface



21-) Cluster setting interface




22-) Running process after installation





23-) Hadoop all applications




24-) All components are as below.

Links to UI Pages

Service Name      Browser URL






HBase Master


History Server




Impala Catalog


Impala Server



Impala Statestore




Spark History Server


Spark Thrift Server




YARN Node Manager



YARN Resource Manager


Hide common service links

API Services

Service Name      Service Ports








HBase Master


HBase Region Server






HBase Thrift


Hive Metastore


Hive Server 2


Hive WebHCat




Impala Server



Apache Kafka REST API






YARN Node Manager







YARN Resource Manager


Spark Thrift Server




Hide service verification info



Licensing Process

You can add licence after installation,

Click here to go to HPE Ezmeral Data Fabric Control System




By doing so, you default to agreeing to the community license agreement:

1-) Install license.

You login “HPE Ezmeral Data Fabric Control System” and accept community license. (1 user is equal license)

2-) Add license after installation complates.


(Upload or Copy-Paste the licenses and Click Apply Licenses. Then you should restart all services)


Free software  criterias;


For community;

There can be 1 activation with 1 account.


License agreement enterprise;


Licanse agreement community;

The MapR Data Platform – Community Edition* is available for free per restrictions specified in the MapR End User License Agreement (EULA).


Local installation

If you can not download from mapr repo you can do as below.

Currently, you can download and install the Ezmeral Data Fabric install packages from


Known Issues

Known issues are as follows if you need.


Mapr Forum

You can search or write problems on the forum.







About Fatih Gençali

- I have supported as Oracle and Nosql & Bigdata Dba for more than 9 years. - I worked in 24x7 production and test environment. - I have 12C OCP certificate. - I have europass diploma supplement. - Saving operations - I have supported for nosql databases (mongo,cassandra,couchbase) - I have supported for ambari&mapr hadoop distributions - I have couchbase certificate. - I have supported databases that are telecommunication , banking, insurance, financial, retail and manufacturing, marketing, e-invoicing . - Providing aligment between prod , prp , stb , dev - Providing management and performance tuning for app and database machines (linux) - Performance tuning and sql tuning - Consolidations, Migration (expdp,xtts,switchover vb...) , installation, patch , upgrade , dataguard , shell script writing , backup restore , exadata management , performans management , security management ,goldengate operations - Resolving performance and security problems for databases and linux machines - I managed oracle 10g/11g/12c databases (dev/test/prp/snap/prod/stby) on Linux/HP/AIX/Solaris O.S - Pl/sql operations , supported shell script, (for aligments and others) - Providing highly available it (software-hardware) systems, especially database systems. - Managing and monitoring availabilities and operations of all systems . - Goldengate operations (oracle to oracle , oracle to bigdata (hdfs , kafka)) - Exadata operations (cell management,upgrade,switchover) - My work processes is according to itil. - Preparing automation for everything to reduce human resource requirement and routine [email protected]çali-22131131/

Leave a Reply

Your email address will not be published. Required fields are marked *