Top Open-Source Tools for Time-Series Analysis

Cake Team

Published: 10/2025

29 minute read

Open-source tools for time-series analysis displayed on a laptop.

Building a powerful time-series stack is about more than just picking a database. You need a complete ecosystem of tools that work together seamlessly. This includes a database for efficient storage, a visualization platform to create insightful dashboards, and advanced libraries for forecasting and anomaly detection. Putting these pieces together can feel complex, but it’s the key to turning raw data into actionable intelligence. This guide is your cheat sheet for assembling that toolkit. We’ll break down the leading open-source tools for time-series analysis in each category, helping you make informed decisions to build a cohesive and effective data infrastructure from the ground up.

Key takeaways

Select your database with a specific job in mind: There's no single best database for every task. Align your choice with your primary goal, whether it's real-time monitoring with Prometheus, high-frequency analysis with InfluxDB, or managing IoT data with TimescaleDB.
Look beyond the database to build a full toolkit: Storing data is just the beginning. To get real value, you need a complete system that includes tools for visualization, forecasting, and data processing to turn raw numbers into actionable insights.
Focus on long-term health and performance: A set-it-and-forget-it approach won't work. Consistently optimize your system by improving query performance, setting clear data retention policies, and monitoring its health to ensure your data is always secure, reliable, and ready for analysis.

What is time-series analysis and why does it matter?

At its core, time-series analysis is the practice of studying data points collected in chronological order. Think of it like looking at a chart of your website’s daily traffic over the past year or tracking a stock's price minute by minute. By examining this data over time, you can spot trends, identify seasonal patterns, and make informed predictions about what might happen next. It’s a powerful way to turn historical data into a roadmap for the future.

So, why does this matter for your business? Organizations are collecting a massive amount of time-stamped data from countless sources—think application logs, IoT sensor readings, financial transactions, and user engagement metrics. This constant stream of information is a goldmine, but only if you have the right tools to analyze it. Regular databases often struggle to handle the sheer volume and velocity of this data. Time-series analysis gives you the framework to make sense of these complex datasets, helping you move from simply collecting data to using it for automated decision-making and forecasting. It's the key to understanding performance, anticipating customer needs, and staying ahead of potential issues before they become critical problems.

How different industries use time-series analysis

The applications for time-series analysis are incredibly diverse because nearly every industry generates data over time. In finance, traders use it to analyze market fluctuations and predict stock price movements. E-commerce and retail companies rely on it to forecast product demand and manage inventory, ensuring popular items are always in stock. For tech companies, it’s essential for DevOps and IT operations, where teams monitor system performance and server health in real time to prevent outages. In manufacturing, data from machinery sensors can predict when a part might fail, allowing for proactive maintenance that saves time and money.

Why choose open-source tools?

When you're ready to get started, you’ll find both proprietary and open-source tools available. Open-source solutions offer some compelling advantages, especially for businesses that need flexibility and control. These tools are crucial for efficiently managing and analyzing time-based data, giving you a clear view of real-time information. Because the code is public, you get full transparency into how your data is being handled. Plus, a vibrant community means more support, faster innovation, and a wealth of integrations. This makes it easier to combine your time-series data with other business intelligence systems, creating a more complete picture of your operations without being locked into a single vendor’s ecosystem.

IN DEPTH: Time-series analysis, built with Cake

Key features every great time-series tool needs

Choosing the right tool for time-series analysis isn't just about picking the one with the most features. It's about finding the one that fits your specific needs for collecting, storing, and analyzing data. A great tool should feel like a natural extension of your team, making complex data easy to work with. When you're evaluating your options, focus on four core areas: how the tool handles data ingestion and storage, the power and usability of its query language, its ability to scale with your needs, and the strength of its community and documentation. These pillars will determine how effectively you can turn your time-stamped data into meaningful insights for your AI projects.

Data ingestion and storage

The foundation of any time-series toolkit is its ability to handle data that changes over time. You need a specialized database, often called a Time-Series Database (TSDB), designed specifically to manage this type of information. Think of data points collected at regular intervals, like server CPU usage, stock prices, or sensor readings from IoT devices. Your tool must be able to ingest this data quickly and store it efficiently. Open-source tools are particularly effective for managing and analyzing this kind of time-based data, providing the flexibility and control needed to build a robust data pipeline without being locked into a proprietary system.

Query language and performance

Once your data is stored, you need an intuitive way to ask questions of it. The best tools offer a query language that is both powerful and easy to learn. This lowers the learning curve and allows your team to start getting insights faster. Performance is just as important. Your queries should return results quickly, even when you're sifting through massive datasets. A slow query language can create bottlenecks and hinder your ability to make timely, data-driven decisions.

As your business grows, so will your data. A great time-series tool must be able to scale without slowing down or becoming prohibitively expensive.

Scalability and integrations

As your business grows, so will your data. A great time-series tool must be able to scale without slowing down or becoming prohibitively expensive. Look for features like data compression, which can significantly reduce storage costs. For example, some databases can compress time-series data by up to 90%, which is a game-changer for long-term data retention. The tool should also integrate smoothly with the other systems in your tech stack, from data visualization platforms to machine learning frameworks. This ensures that your time-series data can be used across your entire organization to drive different AI initiatives.

Community support and documentation

When you choose an open-source tool, you're also choosing its community. A vibrant, active community is one of the most valuable features you can find. It means you'll have access to a wealth of knowledge through forums, tutorials, and user-contributed code. When you run into a problem, chances are someone else has already solved it. Look for projects with strong activity on platforms like GitHub, indicated by a high number of stars and forks. Clear, comprehensive documentation is also non-negotiable. It’s your primary guide for setup, configuration, and troubleshooting, making it essential for your team's success.

The best open-source databases for time-series data

Choosing the right database is a critical first step in building a solid foundation for your time-series analysis. The best choice really depends on what you’re trying to accomplish. Are you monitoring live systems, tracking millions of IoT sensors, or analyzing high-frequency financial data? Each scenario has unique demands for speed, scale, and querying capabilities.

To help you find the perfect fit, I’ve broken down four of the best open-source databases, each excelling in a different area. Think of this as your cheat sheet for matching the right tool to your specific project needs.

For real-time monitoring

If your main goal is to keep a close eye on systems and get alerts the moment something goes wrong, Prometheus is an excellent choice. It’s a powerful open-source monitoring and alerting toolkit designed specifically for the job. Prometheus collects and stores your data with timestamps, making it incredibly efficient for real-time analysis. It’s a favorite among developers and system administrators because it’s built to handle huge volumes of data without breaking a sweat. Plus, it integrates beautifully with visualization tools like Grafana, allowing you to create clear, insightful dashboards that show you exactly what’s happening with your systems at any given moment.

For high-frequency data analysis

When you’re dealing with data that comes in fast and furious—think signals from IoT devices or application performance metrics—you need a database built for speed. InfluxDB is a time-series database created specifically for handling high-frequency data. Its entire architecture is optimized for extremely fast writes and queries, which is essential when you’re collecting, storing, and analyzing data in real time. This focus on performance means you can get immediate insights from rapidly changing datasets without the lag you might experience with a more general-purpose database. If your application lives and dies by its ability to process data streams instantly, InfluxDB is definitely one to look at.

For distributed systems

What happens when your data grows to a massive scale? That’s where OpenTSDB shines. It’s designed from the ground up for large-scale, distributed time-series data storage and analysis. OpenTSDB can efficiently manage billions of data points in real time, making it a great fit for applications that require serious scalability, like large IoT networks or extensive monitoring systems. It leverages the power of Hadoop for its big data capabilities, allowing it to store and retrieve huge amounts of distributed time-series data effectively. If you know your project is going to generate an enormous amount of data across many machines, OpenTSDB provides the robust, scalable foundation you’ll need.

For IoT data management

If you’re working with IoT data but want the familiarity and power of SQL, Tiger Data (formerly TimeScaleDB) is a fantastic option. It’s actually an extension of PostgreSQL, which means it combines the rock-solid reliability of a traditional relational database with the high performance needed for time-series workloads. This makes it especially well-suited for managing IoT data, as it can handle large volumes of incoming information while still letting you use standard SQL to run queries. For teams that already have experience with PostgreSQL, this makes the learning curve much gentler and allows you to get started without having to learn a whole new query language.

Collecting time-series data is just the first step. To get any real value from it, you need to see what’s happening. That’s where visualization and monitoring tools come in.

Tools for visualizing and monitoring data

Collecting time-series data is just the first step. To get any real value from it, you need to see what’s happening. That’s where visualization and monitoring tools come in. They transform endless streams of numbers into clear, actionable insights that help you spot trends, identify problems, and make smarter decisions. Without them, you’re essentially flying blind. The right tools don’t just show you data; they tell you a story about the health and performance of your systems.

Two of the most powerful open-source tools for this job are Grafana and Prometheus. They often work together as a dynamic duo. Prometheus is a beast at collecting and storing metrics from your infrastructure and applications in real time. Then, Grafana steps in to turn that raw data into beautiful, easy-to-understand dashboards. This combination gives you a comprehensive view of your entire system, allowing you to monitor everything from server performance to application behavior. By pairing these tools, you can build a robust monitoring setup that keeps you informed and in control.

Create effective dashboards

A great dashboard tells you what you need to know at a glance. Grafana is also the go-to open-source tool for this, excelling at creating dashboards that make time-series data easy to interpret. It connects seamlessly with a wide range of data sources, including popular time-series databases like Prometheus and InfluxDB. This flexibility allows you to pull data from all your different systems into one unified view. With Grafana, you can build dashboards that clearly show trends, patterns, and anomalies, helping your team quickly understand system performance and make data-driven decisions without getting lost in the numbers.

Set up your alert system

Monitoring isn't just about watching graphs; it's about being notified when something goes wrong. Prometheus has a robust alerting system built right in. You can configure rules that trigger alerts based on specific conditions in your data. For example, you can set an alert for when CPU usage spikes, a server goes down, or an application's error rate exceeds a certain threshold. This proactive approach is crucial for maintaining system reliability. Instead of waiting for a user to report a problem, your team gets an immediate notification, allowing you to respond to potential issues before they impact your customers.

Customize your visualizations

Every team has different needs, and your dashboards should reflect that. Grafana’s biggest strength is its incredible flexibility. It offers a wide variety of panels, graphs, and charts that you can mix and match to build the perfect dashboard for any use case. Its user-friendly interface makes it easy to drag, drop, and configure visualizations to tell the exact story you want. Whether you need a high-level overview for executives or a detailed diagnostic screen for your engineering team, you can create tailored dashboards that make even the most complex datasets understandable and actionable.

Advanced tools for analysis and forecasting

Once you have your time-series data collected, stored, and visualized, you can start asking the really interesting questions. What’s going to happen next week? Is that sudden spike in traffic normal, or is it a sign of trouble? This is where advanced analysis and forecasting tools come into play. These libraries and models go beyond just showing you what happened in the past; they help you predict the future and identify critical events in your data.

Many of these tools use sophisticated statistical methods and machine learning to uncover patterns that aren't obvious at a glance. They can automate much of the heavy lifting involved in building predictive models, allowing your team to focus on interpreting the results and making data-driven decisions. Whether you're just starting with simple forecasts or diving into complex deep learning models, there's an open-source tool that can help you get there. Let's look at some of the best options for taking your time-series analysis to the next level.

Statistical forecasting libraries

If you're ready to start forecasting, statistical libraries are a fantastic place to begin. They are built on proven mathematical models and are often easier to use than more complex machine learning frameworks. Two popular options, both from Facebook, are Kats and Prophet.

Kats is a lightweight and intuitive library for analyzing time-series data. It’s designed to be an all-in-one toolkit, helping you with everything from forecasting to detecting anomalies and spotting trend changes. For more straightforward forecasting tasks, Prophet is another excellent choice. It’s particularly good at handling data with seasonal patterns, like weekly or yearly cycles, even if they have irregularities. Its simplicity makes it great for generating quick, reliable predictions without a steep learning curve.

Another excellent option is Darts, an open-source Python library purpose-built for time-series forecasting. It offers a unified interface for dozens of models, from classical approaches like ARIMA to deep learning models like N-BEATS and TCN. What makes Darts especially powerful is its ease of experimentation—you can quickly compare different forecasting models on your data with minimal code, helping you find the best-fit solution faster.

Machine learning integrations

When you need more power and automation, machine learning libraries can streamline your entire forecasting workflow. These tools can automatically test multiple models on your data and help you choose the best one, saving you a ton of time and effort.

When you need more power and automation, machine learning orchestration tools can streamline your entire forecasting workflow. Two strong open-source options are Kubeflow and MLflow. Kubeflow is ideal for managing the full ML lifecycle, especially when you need to run scalable pipelines across Kubernetes environments. It’s built for teams that want production-grade workflows without getting bogged down in infrastructure management. MLflow, on the other hand, is a lightweight but powerful toolkit for tracking experiments, packaging models, and managing deployment. Together, they provide end-to-end coverage for training, versioning, and serving time-series models at scale.

Anomaly detection

Sometimes, the most important data points are the ones that don't fit the pattern. Anomaly detection is the process of finding these outliers, which can signal critical events like equipment failure, fraudulent activity, or a sudden surge in customer demand. Identifying these events as they happen allows you to react quickly and address potential issues before they become bigger problems.

Many modern time-series tools include features for anomaly detection. For instance, the Kats library we mentioned earlier has built-in capabilities for finding unusual data points. This is crucial for maintaining the health of your systems and understanding significant shifts in your business metrics. By automatically flagging outliers, these tools help you focus on the events that truly matter.

You can also use Evidently to monitor your time-series models after deployment. It provides real-time dashboards for tracking drift, data quality, and prediction performance. If your model’s accuracy starts to degrade or if your input data changes in unexpected ways, Evidently will alert you—helping you catch issues before they impact business decisions.

IN DEPTH: Advanced anomaly detection functionality built with Cake

Deep learning models

For the most complex time-series challenges, deep learning models offer state-of-the-art performance. These models can capture incredibly intricate patterns and relationships in your data that traditional methods might miss. While they can be more complex to implement, the results are often worth the effort, especially for high-stakes forecasting tasks.

The Time-Series-Library (TSLib) is an excellent open-source resource for anyone interested in deep learning for time-series analysis. It provides clean implementations of advanced models, allowing you to test them on your own data or use them as a foundation for building custom solutions. A model called TimesNet has emerged as a top performer within this library for tasks like forecasting, classification, and anomaly detection.

Top tools for data processing and streaming

Collecting and storing your time-series data is a great start, but the real value comes from what you do with it next. This is where data processing and streaming tools come into play. They are the engines that transform raw data points into actionable insights. Think of it this way: your database is the library, but these tools are the librarians and researchers who find, organize, and interpret the information for you.

There are two main approaches to processing data. The first is stream processing, which analyzes data in real-time as it’s created. This is perfect for tasks that require immediate action, like detecting system anomalies or monitoring financial transactions. The second is batch processing, where data is collected and processed in large groups or "batches." This method is ideal for deep analysis that doesn't require a split-second response, like generating weekly sales reports or training a machine learning model on historical data.

A robust time-series analysis setup often uses both methods. You might use streaming to power a real-time dashboard and trigger alerts, while using batch processing overnight to run complex forecasting models. The tools below help you manage this flow, whether you're working with data in motion or data at rest. They handle everything from building data pipelines to automating your entire analysis workflow, forming the backbone of an efficient AI-ready infrastructure.

Stream processing frameworks

Imagine you need to analyze data from thousands of IoT sensors the moment it arrives. You can't wait for an end-of-day report; you need insights now. This is the job of a stream processing framework. These tools are built to handle continuous, high-volume data streams in real time.

A fantastic open-source option here is Apache Flink. It’s a powerful system designed specifically for processing data streams on the fly. Flink excels at handling complex, time-based events and making predictions as new data comes in. Its flexibility allows you to build sophisticated real-time applications, from fraud detection systems that flag suspicious transactions in milliseconds to dynamic pricing engines that adjust to market conditions instantly. It’s a go-to for any scenario where latency is a critical factor.

Batch processing options

While stream processing is about the "now," batch processing is about the big picture. It involves collecting data over a period and processing it all at once. This approach is perfect when you need to perform complex analysis on large datasets without needing immediate results, like calculating monthly user engagement metrics or training a predictive model on years of historical data.

For this, check out Apache Druid. While it's known as a real-time analytics database, it’s incredibly powerful for querying and summarizing massive historical datasets with lightning speed. Druid is designed to give you answers from your time-series data in less than a second, even when you're dealing with trillions of events. This makes it an ideal backend for interactive dashboards where users need to explore and analyze large volumes of historical information quickly and efficiently.

Data pipeline management

Your data needs a reliable way to get from its source to your analytics tools and databases. A data pipeline is the set of processes that moves your data, and managing it is crucial for ensuring data integrity and availability. A good pipeline tool ensures that data flows smoothly, is transformed correctly, and arrives where it needs to be without delays or errors.

OpenTSDB is a solid choice for this task. It’s a large-scale, distributed time-series database built to handle a massive number of data points quickly. It can collect, store, and serve billions of data points in real time, making it a workhorse for demanding environments like IoT platforms and infrastructure monitoring. OpenTSDB is also designed to work well with big data tools like Hadoop, allowing you to build robust, scalable pipelines for your time-series analysis.

Workflow automation

If you’re looking to optimize forecasting models with cutting-edge techniques like hyperparameter tuning or population-based training, Ray Tune is a great addition to your stack. It’s a scalable library for distributed hyperparameter tuning that integrates with popular ML frameworks and runs efficiently across CPUs and GPUs. Combined with tools like Darts and MLflow, Ray Tune helps you train better models faster—and deploy them with confidence.

A great tool for this is PyCaret, an open-source machine learning library in Python. PyCaret simplifies the entire model-building process, especially for time-series forecasting. It helps you automate the tedious tasks of selecting, training, and testing different prediction models. With just a few lines of code, you can compare dozens of models to find the best one for your data, allowing your team to build and deploy accurate forecasts much faster.

Model serving at scale

Once your forecasting model is ready, you need to serve predictions in production with speed and reliability. Triton is a powerful inference server that supports multiple frameworks—including PyTorch, TensorFlow, and ONNX—all in the same deployment. It’s designed for high-throughput, low-latency model serving, and includes features like dynamic batching, model versioning, and GPU support out of the box. For time-series workloads that require real-time predictions (like anomaly detection or dynamic pricing), Triton helps you operationalize your models at scale.

Choosing the right open-source tools is a great first step, but how you implement and manage them is what truly determines your success.

Tips for implementation and optimization

Choosing the right open-source tools is a great first step, but how you implement and manage them is what truly determines your success. A powerful tool with a poor setup can lead to slow queries, high costs, and a lot of frustration. Getting the configuration right from the start and continuously optimizing your system will ensure you get the most value from your time-series data. These tips will help you build a system that is not only powerful but also efficient and scalable for the long haul.

Define your data retention policy

Time-series data can accumulate incredibly fast, and storing everything forever is rarely practical or cost-effective. A clear data retention policy is essential for managing storage costs and maintaining system performance. Many modern databases can help you automate this process. For example, some tools let you set rules to automatically compress older data, making it smaller but still accessible, or delete it entirely after a certain period. Decide what data is critical to keep in its raw form, what can be summarized, and what can be archived or deleted. This data lifecycle management approach keeps your database lean and efficient.

Manage your resources effectively

While these open-source tools are powerful, how you set them up is very important for getting good results. Proper setup is key. This means allocating the right amount of CPU, memory, and storage for your workload from the beginning. Don't just set it and forget it; continuously monitor your resource usage to identify potential bottlenecks before they become major problems. If you notice your CPU is constantly maxed out or you're running low on memory, it’s a sign that you need to scale your resources or optimize your queries. Effective resource management ensures your system remains stable and responsive as your data volume grows.

Fine-tune for better performance

Most time-series tools come with specific features designed to improve performance for certain use cases. Take the time to learn about and use them. For instance, some databases use features like compression to make data smaller and continuous aggregates to create pre-calculated summaries of your data. These features can dramatically speed up common queries because the database has already done some of the work ahead of time. Dig into the documentation for your chosen tools and look for configuration settings you can adjust. A small tweak to a cache setting or query planner can sometimes yield a massive improvement in performance.

Keeping your system secure and healthy

Setting up your time-series database is a huge accomplishment, but the work doesn’t stop there. To get real, lasting value from your data, you need to keep your system running smoothly, securely, and efficiently over the long haul. Think of it like maintaining a car; regular check-ups prevent major breakdowns on the road. For your data infrastructure, this means having a solid plan for security, backups, updates, and monitoring.

A healthy system is a reliable one, which is the foundation of any successful AI or data analysis project. When your database is secure and performing well, you can trust the insights you pull from it. Neglecting these maintenance tasks can lead to security vulnerabilities, data loss, or performance bottlenecks that slow down your queries and frustrate your team. Taking a proactive approach to system health ensures your data is always available and protected when you need it. At Cake, we believe that managing the entire stack is key to building production-ready solutions, and that includes these essential maintenance practices. Let’s walk through the four key areas you need to focus on to keep your system in top shape.

IN DEPTH: AI Governance, Powered by Cake

1. Control who has access

Not everyone on your team needs access to every piece of data. Limiting access on a need-to-know basis is a fundamental security practice that protects your sensitive information. To manage this effectively, you need a clear "front door" for your APIs and a system for authentication. Think about how you will handle user logins or device authentication to ensure that only authorized people or services can access your data. Using tools like an API gateway can help you centralize access control, making it easier to manage permissions and see who is accessing your system and when. This simple step can prevent both accidental and malicious data breaches.

2. Create a solid backup strategy

Imagine losing all your historical data overnight. A robust backup strategy is your insurance policy against data loss, whether it’s caused by hardware failure, software bugs, or human error. But a good strategy is more than just making copies. It also involves smart data lifecycle management to control storage costs. For instance, some databases like TimescaleDB allow you to set rules to automatically compress old data, making it smaller but still queryable, or to delete it entirely after a certain period. This ensures you can recover from disaster while efficiently managing your storage and retaining the data that matters most.

3. Keep your system updated

It can be tempting to ignore those update notifications, but keeping your database and its components up-to-date is critical for security and performance. Software updates often contain vital security patches that protect your system from newly discovered vulnerabilities. They can also include performance improvements and bug fixes that make your database faster and more reliable. Regular maintenance, which includes applying updates and performing routine checks, is essential for keeping your system running smoothly. Scheduling a regular maintenance window helps you stay on top of updates without disrupting your operations.

4. Monitor your system's health

Don't wait for your system to fail before you investigate its health. Proactive monitoring allows you to spot potential issues before they become critical problems. By using tools like Prometheus, you can collect and store time-series data about your system's own performance—like query latency, CPU usage, and memory consumption. Setting up alerts for unusual metrics helps you stay ahead of trouble. Consistently monitoring your system's health gives you the visibility you need to maintain high performance, diagnose problems quickly, and ensure your database can handle your workload as it grows.

Frequently asked questions

How do I choose the right time-series database from all these options?

The best way to decide is to focus on your primary goal. If your main objective is real-time system monitoring and alerting, Prometheus is built specifically for that job. If you're dealing with extremely high volumes of data coming in very quickly, like from IoT sensors, InfluxDB is optimized for that speed. For projects that you know will grow to a massive, distributed scale, OpenTSDB is a solid choice. And if your team is already comfortable with SQL and you want a powerful yet familiar option, TimescaleDB is a fantastic starting point.

What's the real difference between a time-series database and a regular one like PostgreSQL?

Think of it as using a specialized tool for a specific job. A regular database like PostgreSQL is a versatile jack-of-all-trades, great for storing all kinds of business data. A time-series database (TSDB) is a specialist, designed from the ground up to handle data points that are always tied to a timestamp. This specialization makes them incredibly fast at ingesting massive streams of chronological data and running queries over specific time ranges, tasks that can be slow and inefficient in a general-purpose database.

Do I need separate tools for storing, visualizing, and analyzing my data?

Yes, and that’s actually a good thing. The most effective setups use a combination of specialized tools that work together, often called a "stack." You might use one tool like Prometheus to collect and store the data, connect it to Grafana to create beautiful, real-time dashboards, and then use a library like Kats or PyCaret to perform advanced forecasting. This modular approach lets you pick the best-in-class tool for each part of the process, giving you more power and flexibility than a single, all-in-one solution might.

My team is small. How complex is it to set up and manage these open-source tools?

It’s a valid concern, as setting up and maintaining any new system requires effort. The initial setup can be complex, which is why it's important to start with clear goals and read the documentation thoroughly. The good news is that the most popular open-source tools have large, active communities. This means you'll find plenty of tutorials, forums, and guides to help you through any challenges. Start simple with one or two tools, like Prometheus and Grafana, to get comfortable before adding more advanced components.

When does it make sense to move from basic monitoring to advanced forecasting?

Moving to forecasting is a natural next step once you have a solid handle on your historical data. Start by using visualization and monitoring tools to understand your past performance and identify key trends and patterns. Once you can confidently explain what has already happened, you're ready to start predicting what will happen next. This is the point where you can introduce statistical or machine learning libraries to build models that turn your historical insights into a strategic advantage for the future.