Overview
Financial markets generate massive amounts of publicly available data daily, but interpreting it to make informed decisions can be difficult. This project uses machine learning (ML) to generate daily stock signals, track their effectiveness compared to the baseline average (~120 commonly traded stocks), and provide a hands-on tool for analyzing model performance and trading strategies. So far, results show strong performance in predicting long-term trends (150-day horizon), while short-term predictions (10-day horizon) are unreliable. Future iterations will explore other techniques to use for short-term predictions to improve results.
The code repository for additional detail on the methodology of the end-to-end data pipeline can be found here: GitHub
Application
View Application ๐
Interactive Filters
- Stock Symbol (all or individual stocks)
- Horizon (Short or Long)
- Date Selection
Visualize Results
- โBuyโ, โHoldโ, and โSellโ signals (by Date Selection and Horizon)
- Performance charts of average return and win rate for signals over time, with comparison to baseline (average gains/losses across full population of ~120 stocks)
- Additional information on pipeline and model methodologies
API Access and Documentation
Description:
Access daily ML stock signals and historical indicators via API endpoints. Additional endpoints on model performance and technical indicators to come.
Documentation:
- Swagger UI: https://api.crowedata.com/docs/
Data Endpoint:
Endpoint | Method | Description |
/signals | GET | Returns daily โBuy/Hold/Sellโ signals for all stocks or a specific symbol. |
Parameters:
Parameter | Type | Description | Required | Example |
date | string (YYYY-MM-DD) | The date for which to get signals | โ
| 2025-10-14 |
horizon | string | short / long | โ
| short |
symbol | string | Optional stock symbol | โ | MSFT |
Example Request:
GET https://api.crowedata.com/signals?date=2025-10-14&horizon=short&symbol=MSFT
Example Response:
[{
"date": "2025-10-14T00:00:00",
"symbol": "MSFT",
"horizon": "short",
"buy_prob": 0.4993219753635199,
"buy_signal": false
}
]
Tech Stack
Programming & Scripting:
- Python (data processing, Random Forest ML model training, Streamlit app)
- SQL (queries, data analysis, database interactions)
Data Integration & ETL:
- Kafka (streaming of stock data)
- Airflow (orchestration of daily DAGs for data ingestion and model updates)
Data Warehousing / Storage:
- PostgreSQL (storing raw stock data, model outputs, and buy/hold/sell signals)
- EC2 โ hosting API service and database for production deployment
API & Visualization:
- FastAPI โ serving stock signal endpoint for programmatic access
- Streamlit (interactive dashboard for exploring daily stock signals, effectiveness plots, and baseline comparisons)
DevOps / Automation:
- Docker (containerized Airflow, Kafka, PostgreSQL, and Python workflows)
- Git (version control)
Future Improvements
- Add additional endpoints to FastAPI layer, and deploy on RapidAPI for increased visibility
- Expand population of stocks to all of S&P 500, for baseline comparison to be against consistently performant index
- Explore other Machine Learning techniques for short-term analysis to compare effectiveness of signals