Using python to analyze 12.25 million pieces of Taobao data, I finally figured out my trading behavior! super long-1024programmer

1. Project background and analysis

1) Project background

Online shopping has become an indispensable part of people’s life. Based on the data of Taobao app platform, this project analyzes user behavior through relevant indicators, so as to explore user-related behavior patterns.

2) Data and field description

The data set used in this article contains the user behavior data of Taobao App mobile terminal within one month from 2014.11.18 to 2014.12.18. The data has 12256906 days of records, a total of 6 columns of data.

user_id: user identity
item_id: item id
behavior_type: user behavior type (including four behaviors of clicking, collecting, adding to shopping cart, and paying, respectively represented by numbers 1, 2, 3, and 4)
user_geohash: geographic location
item_category: category id (category to which the product belongs)
time: the time when the user behavior occurred

3) Dimensions of analysis

Analysis of traffic indicators
User Behavior Analysis
Funnel Churn Analysis
RFM Analysis of User Value

4) Common analysis methods for e-commerce

5) What is funnel analysis?

“Funnel analysis” is a set of process data analysis, which can scientifically reflect the state of user behavior, as well as an important analysis model of user conversion rate at each stage from the beginning to the end.

2. Import related libraries

import numpy as np
 import pandas as pd
 import matplotlib as mpl
 import matplotlib.pyplot as plt
 import seaborn as sns
 import warnings
 # Set to seaborn drawing style
 sns.set(style="darkgrid", font_scale=1.5)

 # Used to display Chinese labels
 mpl.rcParams["font.family"] = "SimHei"

 # used to display negative sign
 mpl.rcParams["axes.unicode_minus"] = False

 # Sometimes there will be a lot of warning output when running the code, such as reminding the new version, if you don't want these messy output, you can use the following code
 warnings. filter warnings('ignore')

3. Data preview, data preprocessing

# Note: str is to read all fields as strings
 df = pd.read_csv("taobao.csv",dtype=str)
 df.shape
 df. info()
 df.sample(5)

The results are as follows:

1) Calculate the missing rate

# Because there are too many missing values for geographic location, we can't fill them in, so delete this column first
 df.apply(lambda x:sum(x.isnull())/len(x),axis=0)

The results are as follows:

2) Delete the geographic location column

df.drop(["user_geohash"],axis=1,inplace=True)

3) Process the time column and split the column into a date column and an hour column

df["date"] = df.time.str[0:-3]
 df["hour"] = df.time.str[-2:]
 df.sample(5)

The results are as follows:

4) Change the time and date columns into standard date format, and change the hour column into int format

df["date"] = pd.to_datetime(df["date"])
 df["time"] = pd.to_datetime(df["time"])
 df["hour"] = df["hour"].astype(int)
 df.dtypes

The results are as follows:

5) Arrange the data in ascending order according to the time column

df.sort_values(by="time", ascending=True, inplace=True)
 df. head()

The solution is as follows:

6) Delete the original index and regenerate a new index

df. reset_index(drop=True, inplace=True)
 df. head()

The results are as follows:

Knowledge point: Pay attention to the use of the parameter drop in reset_index().

7) Use the describe() function to view the distribution of the data. An include parameter is used here, pay attention

<p��Income/AU” is calculated

total_custome = df[df['behavior_type'] == "4"].groupby(["date","user_id"])["behavior_type"].count()\
                 .reset_index().rename(columns={"behavior_type":"total"})
 total_custome. head()
 total_custome2 = total_custome.groupby("date").sum()["total"]/\
                  total_custome.groupby("date").count()["total"]
 total_custome2. head(10)
 # draw as follows
 x = len(total_custome2. index. astype(str))
 y = total_custome2.index.astype(str)

 plt.plot(total_custome2.values)
 plt.xticks(range(0,30,7),[y[i] for i in range(0,x,7)],rotation=90)
 plt.title("per capita consumption per day")

 plt.tight_layout()
 plt.savefig("Per capita consumption per day", dpi=300)
 plt. show()

The results are as follows:

The drawing is as follows:

⑥ Daily ARPU analysis: It represents the average revenue per user. ARPU = total revenue/AU received

df["operation"] = 1
 aa = df.groupby(["date","user_id",'behavior_type'])["operation"].count().\
      reset_index().rename(columns={"operation":"total"})
 aa. head(10)
 aa1 = aa.groupby("date").apply(lambda x: x[x["behavior_type"]=="4"]["total"].sum()/x["user_id"].nunique()  )
 aa1. head(10)
 # draw as follows
 x = len(aa1. index. astype(str))
 y = aa1. index. astype(str)

 plt.plot(aa1.values)
 plt.xticks(range(0,30,7),[y[i] for i in range(0,x,7)],rotation=90)
 plt.title("The number of daily active user consumption")

 plt.tight_layout()
 plt.savefig("Consumption times of active users per day", dpi=300)
 plt. show()

The results are as follows:

The drawing is as follows:

⑦ Payment rate PUR = APA/AU, here is replaced by [number of consumers / number of active users]

rate = aa.groupby("date").apply(lambda x: x[x["behavior_type"]=="4"]["total"].count()/x["user_id"].  nunique())
 rate. head(10)
 # draw as follows
 x = len(rate. index. astype(str))
 y = rate. index. astype(str)

 plt.plot(rate.values)
 plt.xticks(range(0,30,7),[y[i] for i in range(0,x,7)],rotation=90)
 plt.title("Payment Rate Analysis")

 plt.tight_layout()
 plt.savefig("Payment rate analysis",dpi=300)
 plt. show()

The results are as follows:

⑧ Analysis of repurchase situation (repurchase rate)

re_buy = df[df["behavior_type"]=="4"].groupby("user_id")["date"].apply(lambda x: x.nunique())
 print(len(re_buy))
 re_buy[re_buy >= 2].count() / re_buy.count()

The results are as follows:

3) Funnel analysis

df_count = df.groupby("behavior_type").size().reset_index().\
            rename(columns={"behavior_type":"link",0:"number of people"})
           
 type_dict = {
     "1": "Click",
     "2": "Favorites",
     "3": "Add to cart",
     "4":"Payment"
 }
 df_count["link"] = df_count["link"].map(type_dict)

 a = df_count.iloc[0]["number of people"]
 b = df_count.iloc[1]["number of people"]
 c = df_count.iloc[2]["number of people"]
 d = df_count.iloc[3]["number of people"]
 funnel = pd.DataFrame({"Link":["Click","Favorite and Add to Shopping Cart","Payment"],"Number of People":[a,b+c,d]})

 funnel["overall conversion rate"] = [i/funnel["number of people"][0] for i in funnel["number of people"]]
 funnel["single conversion rate"] = np.array([1.0,2.0,3.0])
 for i in range(0,len(funnel["number of people"])):
     if i == 0:
         funnel["single conversion rate"][i] = 1.0
     else:
         funnel["single conversion rate"][i] = funnel["number of people"][i] / funnel["number of people"][i-1]
 # draw as follows
 import plotly.express as px
 import plotly.graph_objs as go

 trace = go. Funnel(
     y = ["Click", "Favorite and Add to Cart", "Purchase"],
     x = [funnel["number"][0], funnel["number"][1], funnel["number"][2]],
     textinfo = "value+percent initial",
     marker=dict(color=["deepskyblue", "lightsalmon", "tan"]),
     connector = {"line": {"color": "royalblue", "dash": "solid", "width": 3}})
    
 data = [trace]

 fig = go. Figure(data)

 fig. show()

The results are as follows:

The drawing is as follows:

Result analysis: Since collecting and adding to car purchase are both user behaviors with purchase intentions, regardless of the order, we regard their combination as a stage. From the above funnel chart and funnel table, it can be seen that the conversion rate from browsing to purchase intention (collection and adding to shopping cart) is only 5%, but the conversion rate to actual purchase is only 1%. Look at “single conversion” rate”, the conversion rate from purchase intention to actual purchase has reached 20%. Explain that the stage from browsing to bookmarking and adding to the shopping cart is an important link for indicator improvement.

4) Customer value analysis (RFM analysis)

from datetime import datetime
 # The number of days since the last purchase
 recent_buy = df[df["behavior_type"]=="4"].groupby("user_id")["date"].\
              apply(lambda x:datetime(2014,12,20) - x.sort_values().iloc[-1]).reset_index().\
              rename(columns={"date":"recent"})
 recent_buy["recent"] = recent_buy["recent"].apply(lambda x: x.days)
 recent_buy[:10]
 # Calculate the number of purchases
 buy_freq = df[df["behavior_type"]=="4"].groupby("user_id")["date"].count().reset_index().\
           rename(columns={"date":"freq"})
 buy_freq[:10]
 # Merge the above two columns of data
 rfm = pd.merge(recent_buy,buy_freq,on="user_id")
 rfm[:10]
 # Score different types
 r_bins = [0,5,10,15,20,50]
 f_bins = [1,30,60,90,120,900]
 rfm["r_score"] = pd. cut(rfm["recent"],bins=r_bins,labels=[5,4,3,2,1],right=False)
 rfm["f_score"] = pd. cut(rfm["freq"],bins=f_bins,labels=[1,2,3,4,5],right=False)
 for i in ["r_score","f_score"]:
     rfm[i] = rfm[i].astype(float)
 rfm. describe()
 # Compare the size of each score with its respective mean
 rfm["r"] = np.where(rfm["r_score"]>3.943957,"high","low")
 rfm["f"] = np.where(rfm["f_score"]>1.133356,"high","low")
 # Merge the strings in columns r and f
 rfm["value"] = rfm["r"].str[:] + rfm["f"].str[:]
 rfm. head()
 # Custom function to label users
 def trans_labels(x):
     if x == "high":
         return "Important Value Customer"
     elif x == "Low High":
         return "Important recall customer"
     elif x == "high and low":
         return "Important deep cultivation customers"
     else:
         return "important to save customers"
 rfm["labels"] = rfm["value"].apply(trans_labels)
 # Calculate the number of users for each tag
 rfm["label"].value_counts()

The results are as follows:

Source code acquisition group: 850591259

2MtaW1hZ2UvMTBiYjBiMTZjZjU4NDBiNTk4M2YzMWM2MDZkY2IzMWQ?x-oss-process=image/format,png”>

The drawing is as follows:

4) Customer value analysis (RFM analysis)

from datetime import datetime
 # The number of days since the last purchase
 recent_buy = df[df["behavior_type"]=="4"].groupby("user_id")["date"].\
              apply(lambda x:datetime(2014,12,20) - x.sort_values().iloc[-1]).reset_index().\
              rename(columns={"date":"recent"})
 recent_buy["recent"] = recent_buy["recent"].apply(lambda x: x.days)
 recent_buy[:10]
 # Calculate the number of purchases
 buy_freq = df[df["behavior_type"]=="4"].groupby("user_id")["date"].count().reset_index().\
           rename(columns={"date":"freq"})
 buy_freq[:10]
 # Merge the above two columns of data
 rfm = pd.merge(recent_buy,buy_freq,on="user_id")
 rfm[:10]
 # Score different types
 r_bins = [0,5,10,15,20,50]
 f_bins = [1,30,60,90,120,900]
 rfm["r_score"] = pd. cut(rfm["recent"],bins=r_bins,labels=[5,4,3,2,1],right=False)
 rfm["f_score"] = pd. cut(rfm["freq"],bins=f_bins,labels=[1,2,3,4,5],right=False)
 for i in ["r_score","f_score"]:
     rfm[i] = rfm[i].astype(float)
 rfm. describe()
 # Compare the size of each score with its respective mean
 rfm["r"] = np.where(rfm["r_score"]>3.943957,"high","low")
 rfm["f"] = np.where(rfm["f_score"]>1.133356,"high","low")
 # Merge the strings in columns r and f
 rfm["value"] = rfm["r"].str[:] + rfm["f"].str[:]
 rfm. head()
 # Custom function to label users
 def trans_labels(x):
     if x == "high":
         return "Important Value Customer"
     elif x == "Low High":
         return "Important recall customer"
     elif x == "high and low":
         return "Important deep cultivation customers"
     else:
         return "important to save customers"
 rfm["labels"] = rfm["value"].apply(trans_labels)
 # Calculate the number of users for each tag
 rfm["label"].value_counts()

The results are as follows:

Source code acquisition group: 850591259

Using python to analyze 12.25 million pieces of Taobao data, I finally figured out my trading behavior! super long

2. Import related libraries

3. Data preview, data preprocessing

author: admin

Leave a Reply Cancel reply

Contact us

Scan wechat and follow us

WeChat Swipe: Share

2. Import related libraries

3. Data preview, data preprocessing

给这篇文章的作者打赏

WeChat Swipe: Share

author: admin

Related recommendations

Leave a Reply Cancel reply

Contact us

Scan wechat and follow us