“Identifying customer priority for new products in target marketing: Using RFM model and TextRank”

Target marketing is a key strategy used to increase the revenue. Among many methods that identify prospective customers, the recency, frequency, monetary value (RFM) model is considered the most accurate. However, no RFM study has focused on prospects for new product launches. This study addresses this gap by using website access data to identify prospects for new products, thereby extending RFM models to include website-specific weights. An RF model, built using frequency and recency information from website access data of customers, and an RwF model, built by adding website weights to frequency of access, were developed. A TextRank algorithm was used to analyze weights for each website based on the access frequency, thus defining the weights in the RwF model. South Korean mobile users’ website access data between May 1 and July 31, 2020 were used to validate the models. Through a significant lift curve, the results indicate that the models are highly effective in prioritizing customers for target marketing of new products. In particular, the RwF model, reflecting website-specific weights, showed a customer response rate of more than 30% among the top 10% customers. The findings extend the RFM literature beyond purchase history and enable practitioners to find target customers without a purchase history.


INTRODUCTION
With the development of the Internet and the growth of online market, online consumption has increased, and recently COVID-19 has accelerated this trend (Bhatti et al., 2020). In this environment consumers can easily access and purchase products from multiple online stores. These trends increase the likelihood of being able to infer customers' interest from their online behavior. Analyzing their Internet access history allows to deduce certain items of interest and measure the likelihood of customers purchasing those items. Using this approach, companies that lack customers' purchasing histories can still efficiently identify marketing targets to launch their new products.
As the telecommunications market in the Republic of Korea is almost saturated, telecom companies are under pressure to cut rates and create new revenue sources (Yoon, 2007). New businesses must emerge to preserve or increase their profit. Telecommunications companies are well placed to develop hyperscale businesses that can identify customers' online behaviors. Specifically, they can gain insights into customer preferences with data generated from mobile phones and thereby, build new business that generate revenue (Chui & Manyika, 2015). Telecommunications companies want to launch new products rather than to rely on existing communications-related products, such as mobile phones and internet protocol TV. In this regard, a new product being launched is a content video service utilizing augmented reality/virtual reality technology. This service utilizes augmented reality to provide home training services such as yoga. However, it is important to break through early entry barriers because there are many operators providing home training content. One strategy to break into new markets is to identify targets and market to them directly (Stryker, 1996). However, it is not easy to identify marketing targets efficiently when entering a new business for the first time.
In target marketing, it is important to identify a customer base with a high probability of consumption to increase revenue. During this process, companies can use various types of data. One of the most common approaches is to conduct targeted marketing through recency, frequency, monetary value (RFM) analysis based on a purchase history of customers. However, this method can only be applied to businesses that have a portfolio of products and a history of purchases for each product. Therefore, it is difficult for companies that are new entrants to an existing market or pioneering a new business to identify prospective customers. It is important for a company to secure an initial market share. If the identification of targets and related activities are delayed compared to its competitors, this could result in a significant loss for the company.

RFM model
There are various ways to identify prospects and effectively conduct target marketing. Chou et al. (2000) stated that demographic information was used to define customer groups using the central value in a clustering technique (K-cluster, clique). Specifically, clusters were based on income data classified by customer occupation and gender (Chou et al., 2000). For verification, the results of the campaign execution by customer group were compared with a lift chart (Chou et al., 2000). This approach enables efficient identification of prospective customers using only demographic information (Chou et al., 2000). However, this approach has limitations as it presupposes that customers with the same demographic information have the same characteristics. Therefore, the most widely used approach is the RFM model.
Most studies commonly use a basic RFM model that assigns equal weights to recency (R), frequency (F), and monetary (M) value (Blattberg et al., 2008;Hughes, 1994;Nimbalkar & Shah, 2013 Wei et al., 2012). A key prospective group was found by proposing a length of RFM that applied to the purchase period in the RFM model (Wei et al., 2012). Another proposed method has been to segment customer groups using K-means clustering after adding the period of product activity as a variable to the RFM model. A verification of this study was conducted using data from an Iranian delivery company, SAPCO (Hosseini et al., 2010). As a result of the verification, the proposed model showed a significant effect on segmenting customer groups based on loyalty (Hosseini et al., 2010). In a separate study, customer groups with similar purchasing behaviors were classified using purchase data and a genetic-algorithm-based clustering technique (Tsai & Chiu, 2004). Then, an RFM model was used for analysis between clusters (Tsai & Chiu, 2004). The study proposed an RFM model for each product (RFM/P), considering the product perspective rather than that of a customer. This model first estimated the customer value of all products individually and then aggregated them to obtain the overall customer value. Based on this, Heldt et al. (2019) conducted an empirical analysis on financial companies and supermarkets and the authors were able to verify the significance of results.

Application of the RFM model
The RFM model measures when, how often, and how much a customer purchases a product. History of customer purchases can effectively predict future purchasing behavior, and companies can estimate the value of the customers that contribute to their revenue. Thus, RFM models are widely applied in database marketing. For example, they are often developed for marketing programs (i.e., direct mail) targeting specific customers to improve customer response rates (Sohrabi & Khanlari, 2007). This shows that RFM facilitates the choice which customers to target through proposals (Colombo & Jiang, 1999).
Companies can benefit from the adoption of RFM models in terms of increased response rates, reduced order costs, and increased profits. In many studies, RFM models based on purchase history data from various industries have been applied to define high-value customers, who are the most attractive potential customers, and real-world data have verified the significance of high-value customers. The RFM model is used in different fields, such as finance ( (Lumsden et al., 2008). Although different industries have been analyzed, these studies identify high-value customers to shed light on topics such as loyalty, purchase, and revisit behaviors through RFM models. RFM models have been proved to be effective in many industries as a means of finding target customers.
The specific aspect of new product launch has not been considered before, including the studies dis-cussed above. In most studies, the main contribution is refining the existing RFM models. Though, theoretically, upgrading a model is a positive improvement, as introducing new aspects is also necessary. Therefore, a method for the effective application of the RFM model in the context of new product launches (without any purchasing history) is proposed. This aspect is a concern for all practitioners who seek target consumer data without a purchase history, and this research aims to address this concern.

Aims
This study proposes a method to prioritize valid marketing targets through website access for related product categories when there is no product purchase history. It also establishes a hypothesis that the more frequently customers access a website, the more priority they should have in target marketing. This hypothesis and the methodologies developed are subsequently verified through real data. Thus, this research aims to guide companies seeking marketing targets for the launch of new products.

Hypotheses
This study applies an RFM model that assumes shopping website access represents a history of purchases. Further, it defines hypotheses to see if it is meaningful to use website access history to predict customer value. It also considers access time and number of connections because website access history does not contain purchase amount data, unlike purchase history data. In particular, many websites may be accessed in relation to a product or service, but because of website differences, such as quality (Griffiths & Christensen, 2005) and quantity, customers tend to access certain websites frequently. This has implications for the research because among websites related to product categories, which are frequented by customers, and some have higher utility weights than others for research purposes. Thus, customers that are more likely to access these sites with higher utility weight will demonstrate more positive buying behavior.
Accordingly, based on the aims and the literature review, the following three hypotheses are proposed: H1: Using website access data related to the product categories in the RFM model, instead of historical purchase data, leads to significant target marketing results.
H2: There are websites related to product categories that several customers access frequently.
H3: In target marketing, the RFM model, which weighs websites that are frequently accessed by people, is more meaningful than the model that does not take weight into account. Figure 1 illustrates the conceptual RFM model for identifying prospective customers adopted in this study. As mentioned in the literature review, this model is adopted because it has a strong ability to find potential customers. In addition, this study expands this model to identify potential customers for new products even without having a purchase history. Therefore, data on the purchase history of the product is replaced with the website access history related to product categories.

Research model
Accordingly, the RFM model is redefined as the RF model because monetary information cannot be defined within the website access history.
Then, the RwF model, which reflects website-specific utility weights, and the RF model, which does not, are applied to analyze prospects and thereby to evaluate the performance of the two models through empirical research.
The research analysis process comprised the following steps.
Step 1: Data pre-processing. Initially, web access history data related to product categories were extracted. Then, outliers and inaccurate values were removed and the initial dataset was generated. Next, redundant properties were removed to transform the data into an easier and more efficient form for processing to analyze customer value.
Step 2: Define factors for the model that identify prospects. The factors of the RF and RwF (recency, frequency, and utility-weighted frequency) are defined as follows.
Recency: The last website access shown by R represents the most recent value in website access and is defined as a gap between the start date of a given period and the date of website access. It is normalized to a value between 0 and 1. The higher the recency, the greater the R. Frequency: This refers to the number of website visits to the same site during a given period. This number is normalized to a value between 0 and 1. The greater the frequency of visits, the greater the F. Utility-weighted frequency: This refers to the multiplication of utility weights and access frequency by website. Each website offers different degrees of purchasing opportunities to customers. As each website has different types, quantities, and quality of information, each website has different utility weights. As shown in Figure 2, supposing there are websites and associated with a specific category, such as yoga, and website has a utility weight of 0.7, website has a utility weight of 0.3. Consumer accesses website twice; therefore, the weighted sum of website utility and number of times accessed is 1.4 (= 0.7 weight × 2 times). Consumer accesses website once and website once, and the weighted sum of website utility and the number of times accessed becomes 1 (= 0.7 weight × 1 time + 0.3 weight × 1 time). In terms of website access frequency, consumers and have the same two connections. However, in terms of the weighted value of the website utility and number of connections, customer is 1.4 and customer B is 1. Thus, customer has higher value than customer. Even if consumers access the website the same number of times, the weighted sum value can vary depending on the specific website pages they access.
To quantify the utility weights for each website page, as shown in Figure 2, the study utilizes the simultaneous access frequency of customers. Websites that attract many customers are defined as having high utility weights. A TextRank algorithm is used for analysis of these website-specific utility weights based on the frequency of the customer access of each website. The frequency term of the model is defined by the weighted sum of utility weights and access counts for each website. The RF model and the RwF model differ in the frequency term, whereby the latter reflects the utility weights for each website, not just the frequency itself.
The PageRank algorithm places relative importance on documents with hyperlink structures (Ding et al., 2009). This algorithm has been used in many studies in various fields to find the key in-  found that the PageRank algorithm resulted in better results than traditional approaches, such as degree centrality and closing centrality (Mihalcea & Tarau, 2004). Griffiths and Christensen (2005) used PageRank as an indicator of the quality of web pages as experienced by consumers. The PageRank algorithm is used in various fields and can even be applied to find keywords in text analysis. A sentence is split into words to create a co-occurrence matrix for each word. The weight for each word is calculated using the TextRank algorithm, which is applied by the PageRank algorithm for text analysis. A word with a high TextRank value is considered a keyword for that text (Jaffery & Liu, 2009;Tian, 2013).
The method to obtain website-specific weights is a TextRank algorithm that finds keywords through word-specific weight calculations (Tian, 2013). To apply the TextRank algorithm to text, a non-directional graph is defined, in which words are expressed as a link between words as nodes. Applying these TextRank algorithms, the TextRank formula for each website is defined as 3. Analyze the weight value of each web page through the TextRank algorithm.
Using these steps, the weighted value of each website is determined, and it is higher for websites that are frequently accessed by a relatively large number of customers. Then, the TextRank algorithm is used as the utility weight to define the weight for each website.
Step 3: Normalization of factors in models. Given the differences in scale, the data for the three factors of the models -namely, recency, frequency, and utility-weighted frequency -need to be standardized. Obtaining initial standardization using min-max standardization methods eliminates the impact of the respective numerical values on the analytical results.
Step 4: Formulate and analyze identifying prospective models. The RF and RwF values are analyzed for each prospective customer according to the following expressions.
When applied as defined above, the RF model is shown as Step 5: Evaluate. As target marketing aims to find prospective customers with a high value for purchases rather than to match all prospective purchase consumers accurately, a lift chart on a quantitative scale can be used to meet this objective (Jaffery & Liu, 2009;Piatetsky-Shapiro & Masand, 1999). This lift chart is often used in studies that measure the effectiveness of target marketing To determine the significance of the model used in the study, the mobile long message service (LMS) is used to contact the top N% prospective value customers, and then the service subscription rate is compared. The LMS content consists of a description of the service and a subscription URL. The LMS campaign targets are sampled from 1,000 people for each 10% section of values analyzed from each RwF and RF model. In addition, 10,000 people are sampled for each RF and RwF model so that the targets of the campaign do not overlap. Then, LMS campaigns are conducted for these customers.

Data collection
Website access data are collected from a mobile telecommunications company in the Republic of Korea. The study collects and uses the history data of website access for 357,856 mobile subscribers from May 1 to July 31, 2020. The study focusses on a home training service with augmented reality, which is scheduled to be launched, and which will offer various elements of content, such as yoga and fitness, in the future. However, as most of the content at the time of launch is related to yoga categories, history data of website access related only to yoga is used for this study. The list of websites for the study include related yoga category websites, which are mainly comprised of yoga shopping offerings and yoga academy websites where consumption can occur. A list of 156 items had been collected by the mobile communication company on its own and verified manually. The access histo-ry dataset for yoga websites consists of the customer ID, website name, website category, number of connections, and access date (see Table 1). Note: ID = identification; URL = uniform resource locator.

Prospective customer value by models
The statistics (average, maximum, and minimum) of the prospective customer values calculated with the RF and RwF models is shown in Table 2 and Table 3. The statistics for each decimal interval is distributed without overlapping intervals. This shows that each decimal segment has a different prospective purchasing power intensity. The statistics shows that the values calculated from the RF model are higher than those from the RwF model. This is due to the fact that there is no difference in weight between websites.

Estimation results for target marketing
As a result, the total number of customers, which are expected to subscribe, is 618 for the RF model and 623 for the RwF model. Table 4 shows that the lift value of the RF model represents an overall decreasing trend. According to the lift value for each percentile segment, there is a slight increase in the lift value in the 50 th percentile section, but it does not have enough influence to reverse the declining trend. In comparison, the lift value of the RwF model continues to decrease from the 10 th percentile to the 100 th percentile. General practice suggests that the higher the interval, the higher the lift value is, so it can be argued that both models are significant for target marketing. These results support H1.
As mentioned previously, high TextRank websites are those with high utility weights. The RwF model that reflects this utility is found to be 34% higher than the RF model in the 10 th percentile section, and the 20 th percentile interval is 32% higher. Up to the 50 th percentile section, it can be confirmed that the RwF model represents a higher lift value than the RF model. Moreover, the 80 th percentile interval indicates that the RF model is 1% higher than the RwF model. Typically, if the slope of a lift chart decreases sharply to the right, the discovery of the top prospective customers has been performed well (Wang et al., 2013). Figure 3 shows that the RwF model, which has a large difference in slope of the lift value up to the 30 th percentile section, is more effective in target marketing than the RF model. For this reason, H3 is supported only for highly prospective customers.

Estimation results for website utility weight
The website cumulative weight distribution represents 80% of the total weight of the top-40 web-  sites, as shown in Figure 4. In particular, the top-10 websites account for 70% of the total weight, which implies that these are the websites commonly accessed by consumers interested in yoga.
The top website has a weight of approximately 0.2, meaning that it has a huge influence in the field of yoga. In other words, there are websites that are relatively frequently accessed by customers and those that are not. Therefore, H2 is strongly supported.

DISCUSSION AND IMPLICATIONS
Similarly, to Babaiyan  , RFM models were applied and expanded to further confirm that the RFM model is a powerful tool for finding prospective customers. However, the results of this study reflect two differences. First, the possibility of utilizing website access history instead of purchase history was verified. Second, the RwF model, which provided different prospects depending on which website customers accessed, was significantly better in identifying prospects than the RF model.
Both RF and RwF models proposed were tested with LMS campaigns and represented higher target marketing success rates for customers with higher prospects. This means that access to websites for related product categories is a positive factor affecting product purchase. It also confirmed that using a website access history could be an important approach to prioritizing potential customers if a new product launch company or marketer does not have a consumer purchase history.
The TextRank algorithm was used to identify websites, which should be more heavily weighted than others. Simply making assumptions based on the number of website pages visited  can be ineffective because it may reflect the number of connections generated by a small number of customers. However, the TextRank method, based on the co-occurrence matrix between websites, allocates higher weights to sites that consumers commonly access. This TextRank approach can overcome the problem of finding representative websites simply by the number of connections. This meant that most prospective customers could be identified to access the top 25% of websites, as determined by the website utility weights.
This study proposed an RwF model with utility weights by website and confirmed that the actual LMS campaign, which was based on the RwF model, showed a higher lift value than the RF model. In particular, the value of the lift chart in the top 10% section was 23% higher than that of the RF model, and the difference between the top 50% section was more than 5%. This shows that customers are more likely to sign up for products on websites that they access more frequently. This approach will improve the implementation of higher-accuracy target marketing for companies launching new products. As consumption-related activities, such as information exploration and purchasing, gradually shift to online-based activities, the value of the approach developed in this study is likely to become an ever more powerful way to estimate individual interests and discover prospective customers.
Specifically, telecommunications companies or firms, which collect website access data, can apply study results immediately. The proposed approach can be effective in prioritizing prospective customers in the early stages of the launch of various products.

CONCLUSIONS AND FUTURE WORK
This study sought to develop a procedure to identify prospective customers by priority for targeted marketing for new product launches. Notably, the model yields the value of the customer likely to be interested in a new product. The primary output of the proposed procedure is a decision-making process that enables an entity entering a new business to identify easily customers to be targeted with new offerings. That is, the model identifies which customers are more important, and thus are likely to contribute more to firm revenue. The model allows marketing practitioners to analyze the value of marketing targets, prioritize them, and market to as many customers as necessary, even if they do not have a purchase history.
Ultimately, the proposed approach could be used by a variety of companies seeking to launch new products. Future research is required to further verify the performance of the model by applying it to products in various fields. Collecting access time data that detail how long a customer has spent on a website will help to find prospective customers more accurately through their application to the website weight analysis.