Business Problem Statement The business problem is to determining which customers to target …show more content…
for an upcoming marketing summit and provide ideas/mapping for incremental testing.
Data Description
The data set used consists of 2 years of data, totaling 180,720 rows of customer level data.
The variables included are customer_id, transaction_date, total amount paid ($), and total coupon amount used ($). My initial assessment is that with such a large amount of data we will have reliable blocks (groupings) of data. We should be able to look at a customer’s rankings and see its lifestyle,(e.g. is the customer spending more or less over time). We should also be able to tell if a customer is or isn’t using coupons and how it is affecting their monetary ranking, and are they spending more often or less often over time.
Analysis Plan
To follow are the steps I will take when running the analysis:
1. Get to know the data so I understand what the data represents, what each field means
2. Check data for missing values and fill in with ‘average’ of other rows in that field
3. Calculate an RFMC field for each row of data for comparing each customers data o Recency - based on most recent date of purchase …show more content…
compared o Frequency - based on number of distinct purchase dates o Monetary – based on sum of spend o Coupons – based on sum of amount of coupon value
4. Rank the data by putting them in quantiles ( 5 groups / 20% increments)
5. Validate the ranking by running descriptive statistics (e.g. highest rank group, 4, should have the most recent (or lowest) max score for recency)
6. If descriptive statistics does not look right, go back to data to see what went wrong , correct error, and re-run process from where the error exists
7. Create a computed column, RFM Score, to put the ranking together by concatenation
8. Run descriptive statistics with the RFM Score to see who the best customers are (e.g. RFMC score of or near 4444) compared to the worst customers (e.g. RFMC score of or near 0000)
9. If descriptive statistics does not look right, go back to Step 5
10. Review output and make recommendations
Results The first step in building the RFM was running the analysis with all fields for all years to see how customers compared in the last 2.2 years.
Next, I ran analysis one year at a time in order to see MRFC year over year. Year 2018 was only 2 months’ worth of data so it was not an apple to apple compare with 2016 and 2017. I kept the output for 2018 to show that it was part of the analysis and what the trend was over the first two months.
DESCRIPTIVE ANALYSIS FOR ALL YEARS: Over a 2.2 year span, the best customers, on average, have purchased in the last 14 days, 215 times, spent $8,399.00, and saved $64.00 by using coupons. The worst customers, on average, have purchased in the last 112 days, 17 times, spent $397.00, and saved $1.412.00 by using coupons.
YEAR_ALL
RECENCY FREQUENCY
The next page shows descriptive analysis for years 2016 – 2018 separately. The Frequency looks high as I ran the data with a recent date of 22Jan2018. Subtracting ~365 days for each represents a frequency of approximately 25 for year 2016 and 30 days for year 2017. For the best customers, the average frequency in purchases for 2016, 2017 is 112 and 107 days respectively. The average amount of money spent per customer for 2016, 2017 is $4,500.00 and $4,100.00 respectively. The average savings from coupons per customer for 2016 and 2017 are $24.00 and $25.00 respectively. Running the analysis for each year separately shows customers on average are purchasing approximately once a month compared
to twice a month, approximately 110 times versus 215 times, and spend approximately $43,000.00 compared to $8,399.00.
Below is a sample of the RFMC descriptive analytics. I wanted to compare customers who haven’t purchased recently, often, spent much money, but use coupons, customers who have purchased recently, frequently, have little spend, and use coupons often versus those who have purchased recently, frequently, spend a lot of money, and not used coupons. This shows that there are more customers using coupons and not spending much versus customers who have purchased recently and frequently are spending more money and using less coupons. Below is a portion of the process flow of the steps taken per the Analysis Plan.
As shown in the chart above, the highest volume of customers have an RFMC score of 0004 (highlighted in red). These customers are using coupons, have not purchased recently or frequently, and spend very little. I would suggest not giving coupons to these customers. The cells highlighted in yellow are customers who have purchased frequently, spent a lot, not used coupons, but have not purchased recently. I would suggest sending coupons to these customers. The cells highlighted in green are customers who have purchased recently, frequently, spent a lot of money, but have not used coupons. Sending coupons to these customers may not be necessary as they tend to purchase regardless of having a coupon. For next steps I would suggest performing incremental testing to look for a difference in response. This will help predict which blocks of customers to market to base on a sample response.