Recent Posts

September, 2021

  • 7 September

    DATA WAREHOUSE – OLTP/OLAP

    OLTP-OLAP

    OLTP (On-line Transaction Processing) By the mid-1970s, online transaction processing (OLTP) made even faster access to data possible, opening whole new vistas for business and processing. Shortly after the advent …

    Read More »

July, 2021

February, 2021

November, 2020

  • 17 November

    PySpark Makina Öğrenmesi (PySpark ML Classification Decision Tree)

    PySpark Makina Öğrenmesi (PySpark ML Classification) Merhaba PySpark yazılarına devam ediyoruz.  Bu yazıda classification algoritmalarından Decision Tree (Karar ağacı) ile örnek yapacağız.  Bu yazıya geçmeden önce bir önceki yazıyı okumalısınız. …

    Read More »
  • 17 November

    PySpark Makina Öğrenmesi (PySpark ML Classification Preapering)

      PySpark Makine Öğrenmesi   PySpark Makina Öğrenmesi (PySpark ML Classification) Merhaba, PySpark yazılarına devam ediyoruz.  Bu yazıda pyspark kullanarak ML modeli geliştireceğiz. Bu yazıya geçmeden önce bir önceki yazıyı …

    Read More »
  • 16 November

    PySpark Makine Öğrenmesi

    PySpark Makine Öğrenmesi Merhaba, bu yazı serisinde PySpark kullanarak ML uygulamaları gerçekleştireceğiz. PySpark’ı python ile spark işbirliği olarak düşünebiliriz. Python dili ile Spark üzerinde geliştirme yapabilme imkanı tanıyor. Spark kurulumuna …

    Read More »

October, 2020

  • 21 October

    DATA WAREHOUSE – ETL/ELT

    What is the ETL / ELT? ETL or ELT is not a software abbreviation. It is the most important and complex stage of the data warehouse. ETL (Extract, Transform, Load) …

    Read More »
  • 20 October

    Advanced RDD Actions

    Advanced RDD Actions   reduce() action reduce(func) action is used for aggregating the elements of a regular RDD. The fucntion should be commutative (changing the order of the operands does …

    Read More »
  • 14 October

    PySpark RDD Example

    PySpark RDD Example Hello, in this post we will do 2 short examples, we will use reducebykey and sortbykey. Rdd = sc.parallelize([(1,2), (3,4), (3,6), (4,5)]) # Apply reduceByKey() operation on …

    Read More »
  • 13 October

    Introduction to PySpark RDD

    Introduction to PySpark RDD In this chapter, we will start with RDDs which are Spark’s core abstraction for working with data.   What is RDD RDD = Resilient Distributed Datasets …

    Read More »
  • 13 October

    Introduction to Big Data analysis with Spark

    Hello, we’ll be introducing Spark in this series of articles. Spark can also be developed with many programming languages. We will use python in our series of articles. Introduction to …

    Read More »

September, 2020