Clustering Wikipedia Hi, in this article i’ll make a simple clustering example using wikipedia. You …
Read More »Recent Posts
February, 2021
- 1 February
Kafka Stream API Json Parse
Kafka Stream API Json Parse Hello, in this article, I will talk about how to process data incoming to Kafka queue with Kafka stream api. We can send data from various …
Read More »
November, 2020
- 17 November
PySpark Makina Öğrenmesi (PySpark ML Classification Decision Tree)
PySpark Makina Öğrenmesi (PySpark ML Classification) Merhaba PySpark yazılarına devam ediyoruz. Bu yazıda classification algoritmalarından Decision Tree (Karar ağacı) ile örnek yapacağız. Bu yazıya geçmeden önce bir önceki yazıyı okumalısınız. …
Read More » - 17 November
PySpark Makina Öğrenmesi (PySpark ML Classification Preapering)
PySpark Makine Öğrenmesi PySpark Makina Öğrenmesi (PySpark ML Classification) Merhaba, PySpark yazılarına devam ediyoruz. Bu yazıda pyspark kullanarak ML modeli geliştireceğiz. Bu yazıya geçmeden önce bir önceki yazıyı …
Read More » - 16 November
PySpark Makine Öğrenmesi
PySpark Makine Öğrenmesi Merhaba, bu yazı serisinde PySpark kullanarak ML uygulamaları gerçekleştireceğiz. PySpark’ı python ile spark işbirliği olarak düşünebiliriz. Python dili ile Spark üzerinde geliştirme yapabilme imkanı tanıyor. Spark kurulumuna …
Read More »
October, 2020
- 21 October
DATA WAREHOUSE – ETL/ELT
What is the ETL / ELT? ETL or ELT is not a software abbreviation. It is the most important and complex stage of the data warehouse. ETL (Extract, Transform, Load) …
Read More » - 20 October
Advanced RDD Actions
Advanced RDD Actions reduce() action reduce(func) action is used for aggregating the elements of a regular RDD. The fucntion should be commutative (changing the order of the operands does …
Read More » - 14 October
PySpark RDD Example
PySpark RDD Example Hello, in this post we will do 2 short examples, we will use reducebykey and sortbykey. Rdd = sc.parallelize([(1,2), (3,4), (3,6), (4,5)]) # Apply reduceByKey() operation on …
Read More » - 13 October
Introduction to PySpark RDD
Introduction to PySpark RDD In this chapter, we will start with RDDs which are Spark’s core abstraction for working with data. What is RDD RDD = Resilient Distributed Datasets …
Read More » - 13 October
Introduction to Big Data analysis with Spark
Hello, we’ll be introducing Spark in this series of articles. Spark can also be developed with many programming languages. We will use python in our series of articles. Introduction to …
Read More »
September, 2020
- 25 September
Analyzing Social Media Data in Python -1
Analyzing Social Media Data in Python Welcome to analyzing social media data with python. In this tutorial series we’re going to analyze Twitter data using Python. There are millions of …
Read More » - 20 September
Clustering Wikipedia
Clustering Wikipedia Hi, in this article i’ll make a simple clustering example using wikipedia. You can access full code, here: https://drive.google.com/drive/folders/1FKAqwAvaSmEt0jzL3lHu5qQGEcw4FQGS?usp=sharing # Perform the necessary imports from sklearn.decomposition import TruncatedSVD …
Read More » - 19 September
Dimension reduction with PCA | Python Unsupervised Learning -6
Dimension reduction with PCA Dimension reduction represent the same data using less features and is vital for building machine learning pipelines using real-world data. PCA performs dimension reduction by …
Read More » - 18 September
Introduction of DATA WAREHOUSE-What is DATA WAREHOUSE?
What is the Data Warehouse? A data warehouse is a repository that can be made of questioning and analysis of related data. The data warehouse has been created in order …
Read More » - 18 September
Dimension reduction | Python Unsupervised Learning -5
Hello, in this article, we continue the topic Unsupervised Learning. 5,914 views last month, 1 views today
Read More » - 13 September
t-SNE visualization | Python Unsupervised Learning -4
t-SNE visualization of grain dataset I will make a short example about t-SNE in this article. from sklearn.manifold import TSNE import pandas as pd import numpy samples =[[15.26 , 14.84 …
Read More » - 13 September
Introduction of DATA WAREHOUSE-What is DATA?
What is Data? This word, which has a very high popularity, is actually called data, each letter number or date information entered in the computers we use as technology and …
Read More » - 13 September
Oracle XE Installation on Hortonworks Data Flow (HDF)
Oracle XE Installation on Hortonworks Data Flow (HDF) Hi, in this artile, i will show you how to install Oracle Express Edition (XE) on HDF (Hortonworks Data Platform). First of …
Read More » - 9 September
Apache Nifi on Google Cloud
Apache Nifi on Google Cloud Hello, in this article I will explain how to install Apache Nifi on Google Cloud. First, you have to create a Google Cloud account. I …
Read More » - 8 September
Introduction to gensim (Python)
What is gensim? Popular open-source NLP library Uses top academic models to perform complex tasks Building document or word vectors Performing topic identification and document comparison A word embedding or …
Read More » - 7 September
Introduction to Natural Language Processing in Python – (Simple text preprocessing)
Why preprocess ? Helps make for better input data When performing machine learning or other statistical methods Examples: Tokenization to create a bag of words Lowercasting words Lemmetization/Stemming Shorten words …
Read More » - 7 September
Introduction to Natural Language Processing in Python – (Words counts with bag-of-words )
Bag-of-words Bag of words is a very simple and basic method to finding topics in a text. For bag of words, you need to first create tokens using tokenization, and …
Read More » - 7 September
Transforming Features For Better Clustering | Python Unsupervised Learning -3
Hi, we continue where we left off on Unsupervised Learning. I recommend that you read our previous article before moving on to this article. Python Unsupervised Learning -2 Transforming …
Read More » - 6 September
Evaluating a Clustering | Python Unsupervised Learning -2
Hi, In this article, we continue where we left off from the previous topic. If you haven’t read the previous article, you can find it here. Python Unsupervised Learning -1 …
Read More » - 6 September
k-means clustering | Python Unsupervised Learning -1
k-means clustering | Python Unsupervised Learning -1 In this series of articles, I will explain the topic of Unsupervised Learning and make examples of it. Unsupervised learning is a class …
Read More »
August, 2020
- 12 August
Data Warehouse Architectures
Data Warehouse Architectures I would like to talk about the two most important models of the Data Warehouse architect. These models are Bill Inmon and Kimballs models. I will not …
Read More »