Data Science Horizons

From John D. Cook • May 9, 2025

Large language models display emergence behaviors: when the parameter count is scaled to a certain value, suddenly the LLM is capable of performing a new task not possible at a...

How to Build Your Own Local AI: Create Free RAG and AI Agents with Qwen 3 and Ollama

From freeCodeCamp • May 9, 2025

The landscape of Artificial Intelligence is rapidly evolving, and one of the most exciting trends is the ability to run powerful Large Language Models (LLMs) directly on your local machine.

Ranked: The Most Visited Websites in the World

From Visual Capitalist • May 9, 2025

From Google to ChatGPT, we show the most visited websites in the world dominated by search engines and social media giants.

How to Create Serverless AI Agents with Langbase Docs MCP Server in Minutes

From freeCodeCamp • May 9, 2025

Building serverless AI agents has recently become a lot simpler.

Update turns Google Gemini into a prude, breaking apps for trauma survivors

From The Register • May 9, 2025

"I'm sorry, I can't help with that."

How to Deploy Your LLM to Hugging Face Spaces

From KDnuggets • May 9, 2025

Showcase your LLM project with Streamlit and Hugging Face Spaces using Free CPU Instances.

3 Excellent Practical Generative AI Courses

From KDnuggets • May 8, 2025

Learn to build AI agents, fine-tune reasoning models, and master practical AI skills with these courses.

Feature Engineering at Scale: PySpark, Python & Snowflake

From Towards Data Science • May 8, 2025

Automate feature extraction from merchant sites at scale, leveraging async scraping and Snowpark to improve your ML models.

The New Shadow IT: LLMs in the Wild

From The New Stack • May 8, 2025

Why developers are spinning up AI behind your back — and how to detect it.

Fine-Tuning vLLMs for Document Understanding

From Towards Data Science • May 7, 2025

Learn how you can fine-tune language models for specific use cases.

Domain-Driven RAG: Building Accurate Enterprise Knowledge Systems Through Distributed Ownership

From InfoQ • May 7, 2025

Modular RAG applications enhance accuracy and relevancy by assigning ownership to dedicated domain experts.

Build a Python + ChatGPT-3.5 Chatbot in 10 Minutes

From The New Stack • May 7, 2025

Learn to build a simple chatbot using Python and OpenAI's API in just minutes, with code examples that help beginners.

Not everything needs an LLM: A framework for evaluating when AI makes sense

From Venture Beat • May 4, 2025

Question: What product should use machine learning? Project manager answer: Yes.

AI models routinely lie when honesty conflicts with their goals

From The Register • May 4, 2025

Keep plugging those LLMs into your apps, folks. This neural network told me it'll be fine.

Django Crash Course for Beginners

From freeCodeCamp • May 4, 2025

Django is a high-level web framework built with Python that encourages rapid development and clean, pragmatic design.

Will AI Ever Understand Language Like Humans?

From Quanta Magazine • May 3, 2025

AI may sound like a human, but that doesn’t mean that AI learns like a human.

OpenRouter: A Unified Interface for LLMs

From Team DSH • May 3, 2025

Podcast: Achieving Sustainable Mental Peace in Software Engineering with Help from Generative AI

From Team DSH • May 3, 2025

How Diffusion-Based LLM AI Speeds Up Reasoning

From Team DSH • May 3, 2025

Create a SQL REPL for JSON Files in Python

From Team DSH • March 13, 2025

When working with JSON data, it’s common to need quick exploratory queries without writing a full application. By combining Pandas for data handling, DuckDB for SQL querying, and a few...

How to Become a Data Engineer in 2025

From Team DSH • March 13, 2025

Introduction The role of a data engineer has evolved dramatically as organizations harness the power of massive datasets to drive insights and innovation. In 2025, aspiring data engineers are stepping...

A Comprehensive Overview of Prompt Engineering Techniques

From Team DSH • March 7, 2025

Introduction Prompt engineering has emerged as a critical discipline in the age of large language models (LLMs). With applications spanning from simple query answering to complex reasoning tasks, understanding how...

A Comprehensive Overview of RAG Strategies

From Team DSH • March 5, 2025

Introduction Retrieval-Augmented Generation (RAG) represents a paradigm shift in natural language processing by integrating external knowledge into language generation. By combining traditional language models with robust retrieval systems, RAG addresses...

A Practical Guide to Writing a Python Command Line Script

From Team DSH • January 22, 2025

Command line scripts are invaluable for data scientists. They allow you to package tasks — such as data cleaning, analysis, or reporting — into a simple, repeatable interface. Whether you’re...

A Practical Guide to Concurrency and Parallelism in Python

From Team DSH • January 13, 2025

Concurrency and parallelism are crucial concepts for anyone seeking to build efficient, performant applications in Python. From web servers handling thousands of simultaneous requests, to data processing pipelines handling large...

What is Data Science? A Beginner’s Guide

From Team DSH • June 21, 2024

Introduction to Data Science Data science is an interdisciplinary field that combines domain expertise, programming skills, and knowledge of mathematics and statistics to extract meaningful insights from data. At its...

Advanced File Handling in Python: Working with CSV, JSON, and XML

From Team DSH • May 17, 2024

Introduction File handling is a fundamental skill for data scientists, enabling them to efficiently store, retrieve, and manipulate data. Different file formats, such as CSV, JSON, and XML, are commonly...

Building Python CLI Applications: A Step-by-Step Tutorial

From Team DSH • May 17, 2024

Introduction Command Line Interfaces (CLIs) play a crucial role in automating tasks, scripting complex workflows, and streamlining processes. They are especially useful in data science and engineering, where repetitive tasks...

5 Tips for Writing Efficient Python Code for Data Analysis

From Team DSH • May 13, 2024

Introduction Python's standing as a favorite in the data science community is well-earned, thanks to its simplicity and the powerful libraries it supports. However, efficiency in coding becomes crucial as...

Why Normalization Matters in Data Science

From Team DSH • October 31, 2023

Introduction Data normalization is an indispensable process in the realm of data science, often treated as a preliminary yet crucial step. While the term might sound a bit arcane, especially...

Large Language Model Crash Course for Data Scientists

From Team DSH • October 16, 2023

Introduction In recent times, the realm of data science has been buzzing with the advent of Large Language Models (LLMs). These models, epitomized by their ability to understand and generate...

Python Decorators Unleashed [eBook]

From Team DSH • October 11, 2023

Python Power Programming is now part of Data Science Horizons! Unlock the power of Python with our newest ebook release, Python Decorators Unleashed: Harness the Power of Function and Class...

Understanding Data Pipelines: Design and Implementation

From Team DSH • October 10, 2023

Introduction Data pipelines form the backbone of modern data analytics, transforming raw data into actionable insights. These pipelines are sequences of data processing stages, each tasked with a specific function,...

The Power of Ensemble Learning: A Comprehensive Python Guide

From Team DSH • October 9, 2023

Introduction to Ensemble Learning Ensemble learning harnesses the power of combining multiple machine learning models to generate a more formidable predictive model. The foundational idea of ensemble methods is that...

10 Must-Know Machine Learning Algorithms

From Team DSH • October 2, 2023

Machine learning powers the AI advancements we are all living through these days. Our world is being flipped upside down it's flipping our world upside down — in mostly good...

NumPy Crash Course for Data Scientists

From Team DSH • September 26, 2023

Introduction The role of numerical computations in data science, machine learning, and scientific computing is paramount. NumPy, short for Numerical Python, serves as the cornerstone for numerical operations in Python....

Beautiful Soup Crash Course for Data Scientists

From Team DSH • September 6, 2023

Introduction Welcome to our comprehensive guide on Beautiful Soup, a powerful Python library designed for web scraping tasks. This library allows you to parse HTML and XML documents, creating a...

Performance Tuning in SQL: Tips and Techniques

From Team DSH • September 5, 2023

Introduction Efficiency and responsiveness are key aspects of any database system. In the world of SQL databases, performance tuning is not merely an option; it's a necessity. Whether it's a...

Building Scalable and Maintainable REST APIs for Data Services

From Team DSH • August 31, 2023

Introduction As applications become more data-driven, RESTful APIs have emerged as a popular way to build interfaces that enable diverse client apps to interact with backend data and services. Well-designed...

Database Normalization: A Practical Guide

From Team DSH • August 30, 2023

Introduction Database normalization is the process of organizing data in a database to reduce data redundancy and improve data integrity. This practical guide covers the basics of normalization, including the...

Understanding Data Sharding

From Team DSH • August 29, 2023

Introduction Data sharding is the process of dividing a large dataset into smaller, more manageable pieces called "shards." These shards are distributed across multiple servers or databases, allowing parallel processing...

spaCy Crash Course for Data Scientists

From Team DSH • August 28, 2023

Introduction Natural Language Processing (NLP) has evolved into one of the most vital domains of Artificial Intelligence, enabling machines to understand, interpret, and generate human language. Whether it's sentiment analysis,...

An Overview of Data Virtualization

From Team DSH • August 25, 2023

Introduction to Data Virtualization Data virtualization refers to the sophisticated technology that allows applications to retrieve and manipulate data without requiring technical details about the data's underlying physical location or...

Deploying a Data Engineering Project to Production: A Checklist

From Team DSH • August 24, 2023

Introduction Deploying a data engineering project from development to production can be challenging. There are many moving parts that need to come together - infrastructure, data pipelines, models, monitoring, and...

Is Feature Engineering a Dying Art?

From Team DSH • August 24, 2023

The importance of feature engineering is being questioned with the emergence of automated feature engineering tools. While promising, these tools still lack the creativity and contextual understanding a human data...

PyTorch: A Quick & Dirty Intro

From Team DSH • August 10, 2023

Introduction Embarking on a journey into the realm of machine learning and artificial intelligence can be a daunting task. While the internet is chock-full of resources, beginners often struggle to...

Docker Crash Course for Data Scientists

From Team DSH • August 8, 2023

Introduction Welcome to our Docker crash course designed specifically for data scientists. This tutorial takes you on a journey through the essential components of Docker, from the fundamental concepts to...

Handling Categorical Variables in scikit-learn: Strategies and Encoding Techniques

From Team DSH • August 8, 2023

Introduction Categorical variables, which take on a limited set of discrete values rather than a continuous numerical range, are very common in real-world data sets. Examples include gender, country, profession,...

An Overview of Feature Selection Techniques in scikit-learn

From Team DSH • July 26, 2023

Introduction Feature selection is a critical process in machine learning pipelines to improve model performance and generalization. It involves identifying and selecting the most relevant features in your dataset that...

Evaluating Classification Model Performance in scikit-learn

From Team DSH • July 25, 2023

Introduction Evaluating the performance of machine learning models is a critical part of the model building process. For classification models, there are a number of important metrics that can be...

Scikit-learn Crash Course for Data Scientists

From Team DSH • July 24, 2023

Introduction Machine learning has transformed the landscape of data science, providing powerful capabilities to build predictive models from data. As datasets grow larger and more complex, having scalable and easy-to-use...

Handling Imbalanced Datasets in scikit-learn: Techniques and Best Practices

From Team DSH • July 21, 2023

Introduction to Imbalanced Datasets Imbalanced datasets, where the number of samples across different classes vary greatly, are very common in machine learning applications. Often, there are many more samples for...

Unsupervised Learning with scikit-learn: An overview

From Team DSH • July 20, 2023

Introduction to Unsupervised Learning In a world overflowing with data, making sense of it all can seem daunting. Fortunately, unsupervised learning techniques offer a way to find structure and meaning...

Introduction to Ensemble Learning with scikit-learn

From Team DSH • July 19, 2023

Introduction to Ensemble Learners Ensemble learning refers to combining multiple machine learning models to create a stronger overall model. The rationale is that by combining multiple models, the overall performance...

Exploring Explainable AI: Reasons & Techniques for Interpreting Black Box Models

From Team DSH • July 18, 2023

Introduction In the world of artificial intelligence (AI), the concept of black box models has long presented a fundamental challenge. These models, often based on complex machine learning techniques, can...

The Democratization of Data Science: The Impact and Promise of Large Language Models

From Team DSH • July 18, 2023

Introduction Over the last decade, the field of natural language processing (NLP) has observed an impressive evolution, culminating in the emergence of Large Language Models (LLMs). These advanced models, trained...

Pandas Crash Course for Data Scientists

From Team DSH • July 17, 2023

Introduction As data has exploded in volume and complexity in the modern world, the need for powerful yet easy-to-use data analysis tools is greater than ever. Python has become a...

SQL Crash Course for Data Scientists

From Team DSH • July 10, 2023

Introduction Welcome to our SQL crash course designed specifically for data scientists. This tutorial takes you on a journey through the essential components of SQL, from the fundamental to the...

Understanding Cross-Validation in scikit-learn: A Practical Guide

From Team DSH • July 6, 2023

Introduction Machine learning (ML) has rapidly become a key technique in a myriad of applications, from predicting stock prices to diagnosing diseases. An essential part of the machine learning workflow...

Data Visualization in Python: Creating Stunning Plots with Matplotlib

From Team DSH • July 4, 2023

Python, an open-source, general-purpose programming language, has become a favorite tool among data scientists and analysts due to its simplicity and vast library ecosystem. One of the libraries, Matplotlib, is...

Mastering Generative AI Text Prompts [eBook]

From Team DSH • July 3, 2023

Are you curious about the exciting possibilities of generative AI text prompts? Look no further! We're thrilled to introduce our free ebook, Mastering Generative AI Text Prompts: A Practical Guide...

Essential MLOps: What You Need to Know for Successful Implementation [eBook]

From Team DSH • June 30, 2023

In today's fast-paced, data-driven world, machine learning has become an indispensable tool for businesses across various industries. However, as the complexity of machine learning models and the volume of data...

Mastering Generative AI and Prompt Engineering: A Practical Guide for Data Scientists [eBook]

From Team DSH • June 29, 2023

Are you a data scientist looking to unlock the full potential of artificial intelligence (AI) in your work? The field of AI has evolved significantly, and two essential components driving...

Unleashing the Power of XGBoost for Machine Learning

From Team DSH • June 28, 2023

In the rapidly evolving field of data science, practitioners continually search for tools and techniques to extract meaningful insights from data. One of the most popular and potent algorithms in...

A Guide to Grid Search and Random Search for Hyperparameter Tuning

From Team DSH • June 27, 2023

As machine learning practitioners, one critical aspect we often grapple with is tuning the hyperparameters of our models. A delicate balance of these hyperparameters is essential to maximize the performance...

Data Preparation with Python: Dealing with Outliers

From Team DSH • June 27, 2023

This is an excerpt from our latest ebook Data Cleaning and Preprocessing for Data Science Beginners. Outliers are unusual observations that significantly differ from the rest of the data. While...

Data Cleaning and Preprocessing for Data Science Beginners [eBook]

From Team DSH • June 26, 2023

Are you eager to dive into the exciting world of data science, but unsure where to start? Well, we've got a fantastic resource just for you — a comprehensive free...

10 Underrated Soft Skills for Data Scientists

From Team DSH • June 22, 2023

Introduction As the field of data science continues to grow and evolve, the demand for skilled data scientists remains high. While technical abilities are undoubtedly important, soft skills – the...

Scikit-Learn for Data Standardization and Normalization

From Team DSH • June 21, 2023

Data standardization and normalization are essential preprocessing steps in machine learning. These techniques transform the input data to a consistent format and range, which can improve the accuracy of the...

A Gentle Introduction to AutoML with Auto-WEKA

From Team DSH • June 20, 2023

AutoML, short for automated machine learning, is a process of automating the development of machine learning models. AutoML has gained significant popularity in recent years, owing to its ability to...

Introduction to Scikit-learn: A Beginner’s Guide

From Team DSH • June 19, 2023

Introduction Scikit-learn is an open-source Python library that provides a wide range of simple and efficient tools for machine learning, data mining, and data analysis. Developed by a diverse team...

Introduction to Platform Engineering: Exploring Key Concepts, Principles, and Benefits

From Team DSH • June 17, 2023

The digital age has brought forth a plethora of advancements in software and technology. Amid these, a unique and pivotal discipline has emerged - platform engineering. This burgeoning field converges...

Thinking Fast & Slow: Tests for Large Language Models Like ChatGPT-4

From Team DSH • June 16, 2023

The advent of artificial intelligence and machine learning has brought about a significant shift in various sectors, particularly data science. Large Language Models (LLMs), like OpenAI's ChatGPT-4, are not only...

13 Prompt Engineering Tips

From Team DSH • June 16, 2023

As we continue to explore the capabilities of artificial intelligence, particularly in natural language processing, an important aspect arises: how we communicate with these models. Specifically, with OpenAI's GPT-4 and...

smol ai developer: Text to Codebase

From Team DSH • June 14, 2023

In a world increasingly driven by automation, it's not uncommon to find tools that make developers' lives easier. Yet, the idea of a tool that can help generate a whole...

OLTP vs OLAP: Key Differences, Use Cases, and Database Engine Overviews

From Team DSH • June 14, 2023

Introduction In the vast landscape of data management and processing, two categories of systems stand out for their critical roles: Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP). These...

Getting Started with Weka

From Team DSH • June 13, 2023

Machine learning and data mining have been revolutionizing numerous sectors, from healthcare to finance. With the increasing importance of data, we require powerful tools to make sense of it all....

The Emergent Abilities of Large Language Models: Mirage or Milestone?

From Team DSH • June 12, 2023

In the realm of artificial intelligence (AI), the concept of emergent abilities in Large Language Models (LLMs) has been a topic of fervent discussion. As LLMs continue to evolve, they...

Understanding Bias in Data Science

From Team DSH • June 9, 2023

The exciting journey into data science starts with a promise of the power and potential in data, but there is an essential facet of this field that often gets less...

A Beginner’s Guide to Feature Engineering with Python: Creating Relevant Features

From Team DSH • June 8, 2023

Feature engineering is a crucial aspect of the machine learning process. It involves creating new features or transforming existing ones to enhance the performance of a model. In this tutorial,...

3 Python Tips that Machine Learning Engineers Should Know

From Team DSH • June 7, 2023

As a machine learning engineer, you know how crucial it is to have a go-to language that’s capable of handling large datasets and has a multitude of libraries and frameworks....

Bias-Variance Trade-off: Modern Relevance

From Team DSH • June 6, 2023

As the field of data science continues to expand and mature, one key concept that remains at the forefront of its success is the bias-variance tradeoff. This fundamental principle is...

Prompt Engineering for ELIZA [eBook]

From Team DSH • June 5, 2023

Data Science Horizons is proud to announce our latest ebook, Prompt Engineering for ELIZA, an application of modern prompt engineering stratregies to what amount to a prehistoric rule-based chatbot. Are...

Exploratory Data Analysis (EDA) Techniques: A Step-by-Step Tutorial with Python

From Team DSH • June 2, 2023

Exploratory Data Analysis (EDA) is an essential step in any data science project. It involves examining and understanding the data before diving into modeling and analysis. In this tutorial, we...

Data Manipulation in Python: Working with Databases and CSV Files

From Team DSH • May 31, 2023

Data manipulation is a crucial skill in the world of data science. In this tutorial, we will explore how to work with databases and CSV files using Python, a popular...

Statistical Paradoxes for Data Scientists

From Team DSH • May 30, 2023

In the realm of data science, comprehending the intricacies of statistics is essential to accurately interpreting and conveying findings. One particularly challenging aspect for data scientists is navigating statistical paradoxes...

Ensuring Reproducibility in Data Science

From Team DSH • May 30, 2023

These days, organizations of all sizes and varieties use data science to make informed decisions and to gain a competitive advantage. As a result, data-driven decision-making has become an integral...

Regression to Random Forests: A Concise Guide to Predictive Modeling Techniques

From Team DSH • May 24, 2023

In today's data-driven world, predictive modeling has become an essential tool for businesses and researchers alike. By analyzing historical data and identifying patterns, predictive models can help us make informed...

The Data Lakehouse Walkthrough

From Team DSH • May 22, 2023

What is a Data Lakehouse? A data lakehouse is a novel approach to data storage and management that merges the benefits of a data warehouse and a data lake. It...

Transitioning from Software Engineering to AI Engineering: A Comprehensive Guide

From Team DSH • May 19, 2023

Moving from the realm of traditional software engineering into the fast-paced, evolving landscape of artificial intelligence (AI) engineering can be both an exciting and intimidating journey. With constant developments and...

Simplifying the Attention Mechanism in LLMs

From Team DSH • May 18, 2023

As machine learning engineers, we continually strive to push the boundaries of what is possible with artificial intelligence. One of the most recent advancements in this field is the development...

It Ain’t Origami: K-Fold Cross-Validation with Scikit-learn

From Team DSH • May 18, 2023

Evaluating the performance of a model is a critical step when working on machine learning classification tasks. One of the most widely used methods for model evaluation is k-fold cross-validation....

Getting Docker Up and Running

From Team DSH • May 17, 2023

For developers who run applications on multiple platforms and environments, Docker provides an ideal platform. Docker enables users to build, run, and deploy applications in a containerized environment that is...

Navigating the Data Engineering Landscape: Essential Practices and Tools You Should Be Familiar With

From Team DSH • May 17, 2023

Data Engineering Essentials The field of data engineering has witnessed remarkable advancements in recent years. As the volume, velocity, and variety of data generated continue to increase, it is crucial...

Demystifying Hyperparameter Tuning in MLOps

From Team DSH • May 15, 2023

Hyperparameters are adjustable settings in machine learning algorithms that control the model's behavior during training. Unlike model parameters, which are learned from the data during training, hyperparameters are set before...

Chatbot Progression: From ELIZA to ChatGPT

From Team DSH • May 15, 2023

The field of natural language processing (NLP) has experienced a remarkable progression in the development of chatbots, evolving from the early rule-based systems exemplified by ELIZA to the state-of-the-art deep...

Thinking Clearly: A Data Scientist’s Guide to Understanding Cognitive Biases [eBook]

From Team DSH • May 10, 2023

Are you a data scientist looking to enhance your decision-making and analytical skills? Are you interested in understanding how cognitive biases can impact your work and personal life? If so,...

The Importance of Data Storage Solution Selection in Data Engineering

From Team DSH • May 9, 2023

In today's increasingly digital world, the importance of data cannot be overstated. With more and more data being generated every day, organizations are recognizing the critical role that the selection...

The Role of MLOps in Large Language Models

From Team DSH • May 9, 2023

Large Language Models (LLMs) have brought about a paradigm shift in the field of natural language processing (NLP), opening up innovation in new NLP applications. Despite their potential, developing and...

Docker vs Kubernetes: An Overview

From Team DSH • May 9, 2023

The world of software development has been transformed with the advent of Docker and Kubernetes, two technologies that have revolutionized the way applications are deployed, tested, and managed. While both...

The Psychology of Prompt Engineering [eBook]

From Team DSH • May 8, 2023

As artificial intelligence and data science continue to advance at a rapid pace, prompt engineering has emerged as a crucial component for developing effective and engaging interactions between humans and...

10 Practical Python Programming Tricks: Boost Your Efficiency and Code Quality [eBook]

From Team DSH • May 4, 2023

Are you a Python programmer looking to enhance your skills? Then you'll love our new book, 10 Practical Python Programming Tricks: Boost Your Efficiency and Code Quality! We've covered 10...

(The Sometimes Thin Line Between) Data Engineering and MLOps

From Team DSH • May 1, 2023

In today's rapidly evolving data and AI landscape, two disciplines have emerged as critical components for building and maintaining data-driven systems: Data Engineering and MLOps. Although they serve different purposes...

All Posts