Hi everyone, In this article we will see how to add a dataset we downloaded from kaggle as a Hive table.
Hive is not a database. This is to make use of SQL capabilities by defining a metadata to the files in HDFS. Long story short, it brings the possibility to query the hdfs file.
I’m doing it on the virtual machine I downloaded from Coudera’s site. https://www.cloudera.com/downloads/quickstart_vms/5-13.html
First, download a dataset from kaggle.
Let’s try this one,
Move the downloaded data set to the virtual machine with a program such as WinSCP or FileZilla.
Connect to the virtual machine and start our operations. As you can see in the picture, we see the file we threw into the virtual machine.
Let’s transfer this file to the hadoop file system.
hadoop fs -copyFromLocal african_crises.csv data/ hadoop fs -ls /data
Now we will export this csv file to a table we will create.
You can do this via “hive shell” or “hue”. You’ll be doing the same thing in both processes.
To make the text look more beautiful, let’s perform this process over Hue.
After reaching the hue via the web interface, you must open the location indicated by the arrow.
I took the column names of the table from the csv file and set the data types.
Ready to export our data from the csv file to the table.
load data local inpath '/home/cloudera/african_crises.csv' overwrite into table african_crises;
Check the table,
Everything looks fine.
See you in next article..