Using Python for Text Analytics in Accounting (Essay Sample)
My thesis research question is: Can stock returns be explained by sentiments extracted from 10-k reports?
I did find some similar papers about this topic, so i would like you to develop a small ‘twist’ or ‘take’ on this topic.
I have to write a workplan outlining how I intend to research and report on this research question.
- Each thesis is structured around a clear research question. The thesis raises a question, explains why that question is important (academic and practical relevance), reports on the research that you have carried out, and makes it clear what answer to the question is given on the basis of this research.
- The thesis shows that you can position your work in the academic literature. You must discuss what the most important relevant publications have to say about your topic, either in substance or in terms of research methods
The supervisor has made available a list of required readings, mainly academic papers relating to the topic of my group: Using python for text analytics in Accounting. The literature is chosen to give a good start in the relevant literature, but it is certainly not all the literature I need. As you write the workplan, you will actively need to search additional literature in support of my research question.
These are the required readings: (See attached)
Anand, V., Bochkay, K., Chychyla, R., & Leone, A. J. (2020). Using Python for Text Analysis in Accounting Research. Foundations and Trends in Accounting.
El‐Haj, M., Rayson, P., Walker, M., Young, S., & Simaki, V. (2019). In search of meaning: Lessons, resources and next steps for computational analysis of financial discourse. Journal of Business Finance & Accounting, 46(3-4), 265-306.
Lewis, C., & Young, S. (2019). Fad or future? Automated analysis of financial text andits implications for corporate reporting. Accounting and Business Research, 49(5), 587-615.
Loughran, T. and B. McDonald (2016). “Textual analysis in accounting and finance: A survey”. Journal of Accounting Research. 54(4): 1187–1230.
Zhang, M. C., Stone, D. N., & Xie, H. (2019). Text data sources in archival accounting research: insights and strategies for accounting systems' scholars. Journal of Information Systems, 33(1), 145-180.
It is required, that at least one focal variable is measured based on textual data that is transformed into numerical measures using Python.
Attached you will find the work plan form and a document to explain each step of the work plan.
Furthermore, As instructions, my supervisor has mentioned the following for this research question:
If you would study this research question, you would combine textual analysis with doing an event study, and you could (among others) build on the following paper (see attached):
Yekini, L. S., Wisniewski, T. P., & Millo, Y. (2016). Market reaction to the positiveness of annual report narratives. The British Accounting Review, 48(4), 415-430.
So please look at these papers I have basic knowledge about python/ data analytics, so could you make this not to difficult. look at the paper: Anand, V., Bochkay, K., Chychyla, R., & Leone, A. J. (2020). Using Python for Text Analysis in Accounting Research. As I have mentioned: It is required that at least one focal variable is measured based on textual data that is transformed into numerical measures using Python.
For the workform, the last question of 3000 words doesn't have to be answered yet. So answer all the questions except the last one please. And be precise and detailed in answering them.
python will be used in data analysis and if you look at the workform, you will see that I need about 1000 words to answer the questions. I actually have a research quesiton and my twist to it. The research question is: Can stock returns be explained by sentiments extracted from 10-k reports? and I want to state 2 hypothesis: H1: TF-IDF approach can be used to explain stock returns by analyzing the sentiment of 10-K reports. H2: The IF-IDF approach of 10-K reports is better at explaining stock returns than the Bag of Words approach using the Loughran & McDonald’s (2011) financial dictionary to analyze the sentiment of 10-K reports.
Can you look at these statements and maybe see if I can change it or add something? I didnt look good enough online if there are similar papers of thesis about this. My thesis needs to be relevant and add something to the excising literature. Hi, if you look at the instructions I had mentioned that my supervisor adviced me to you build (among others) on the following paper: Yekini, L. S., Wisniewski, T. P., & Millo, Y. (2016). Market reaction to the positiveness of annual report narratives. The British Accounting Review, 48(4), 415-430. I had added this paper in the instructions. So maybe the hypotheses I just stated are not the best ones to research. So maybe you can look into this? Hi, my supervisor just gave me some comments regarding my research. See below: In addition to the documents that I have just sent to all students and that are also very relevant for you(I will attach these documents in the files section (paper: Ding et al. 2018 A review of short-term event studies in operations and supply chain management and a document with the code that can be used to iterate over all (.txt) files in a directory), the following remarks: - Why do you want to compare the TF-IDF approach with the BoW approach? What exactly is the difference? Have you found papers which have something similar? If not, it may be better to instead examine the influence of (different types of) tone (i.e., positiveness, etc.) in annual reports on firms' stock returns. - You propose to examine the period 2008-2020. Given the amount of work, it may be better to limit yourself to one year. - You mention different window lengths for the event window. For this (and many other methodological issues), I recommend you to carefully look at the Ding et al. (2018) paper. So I would like to, as he advised: examine the influence of (different types of) tone (i.e., positiveness, etc.) in annual reports on firms' stock returns. Also, i would like to examine the period(year) just before corona, so that it can't influence the outcome in any way (this is 2019).For the window lengths you could look at the paper the teacher mentioned (That i will attach).
Using Python for Text Analytics in Accounting
Date of Submission
Using Python for Text Analytics in Accounting
Can stock returns be explained by sentiments extracted from 10-k reports?
Type of Research to Answer the Question
A Linear Regressions Engagement to Share Price Forecasting
Framing and Contribution of the Research
Take a look at the New York Times business section. When you read, you are expressing views on the different companies in the daily news. Your cerebral cortex allocates the text a rating of "sentiment" based on a beneficial, adverse, or neutral text rubric (Anand et al., 2020). Analysis of feelings is the comparable computer science of reading news. It is the systematic processing of attributes extracted from text mining words.
As can be seen from a glance at a newspaper page, text significantly outnumbers numerical information. Anecdotes, recollections, and quotations trump charts and graphs. Previously limited to price ratios and margins, financial analysis is undergoing a sentiment revolution (Anand et al., 2020).
Within Google Scholar, thanks to seminal publications, the phrase "sentiment analysis in finances" now produces 661,000 results. As shown, the text is tokenized, processed, probably stems, and classified (translated into words) (Anand et al., 2020). Texts from several publications, including (a) forums, blogs, and wikis, have now been incorporated into the literature; (b) headlines and studies reports.
The existing academic literature is primarily concerned with using sentiment to forecast stock market returns. Therefore, the literature has not evaluated if a textual analysis can predict future revenue, working capital, or influence for a company (Azhar et al., 2019).
Strategies and Tactics
We examine whether the feeling expressed in the Management Discussion and Analysis (MD&A) Section of SEC Form 10-K provides a predictor of the foundations of a company during the subsequent filing period. Management's assessment of the company's current financial condition and prospects for growth is provided in the managerial conversation and debate section (Azhar et al., 2019).
Our findings will aid in determining the accuracy of management's tone and portrayal of a firm's trajectory. From a practical point of view, we (1) offer microprocessor models for quantifiable financial firms to anticipate the economic situation of their investments, (2) direct insurance businesses to critical 10-K statements which help to assess the health of the balance sheet of a business, and (3) increase market volatile awareness among CEOs and CFOs. In selecting verbiage, corporate management responsible for accurate representations of the value of enterprise can now be more careful.
Using edgarWebR, we scrape the 10-K annual reports of all S&P 500 companies in 2016. Then we carry out an NRC sentiment classification in Syuzhet using Saif Mohammad's Emotion Lexicon. According to Mohammad, "a combination of letters and their organizations with eight impulses is a vernacular of the National Research Council: anguish, fear, anticipation, confidence, surprise, despair, pleasure and revulsion" and two feelings (negative and positive)." NRC sentiment attributions were initially collected manually via Mechanical Turk and have since gained widespread adoption in computational intelligence and finance. We combined our analysis with the company's net income, operational cash flows, investments in retained earnings, fund flow financing, dividends, and stock-based remuneration from 2016 to 2017 (Azhar et al., 2019).
Overview of Public Opinion
We immediately notice a firm's inherent positivism, as evidenced by the fact that positive words outnumber negative words by a factor of two in the average 10-K. The information is expected to be an allusion to CEOs and the inordinate trust of CFOs in their predicting skills. Additionally, we keep in mind that 2016 saw a robust economy and bull market, which bolstered optimistic forecasts (Brink & Stoel, 2019).
The most frequently occurring words are those associated with Trust. The example is primarily attributed to the emphasis placed on financial regulation and moral principles. It is also unsurprising that Anticipation has the second-highest word count, given the MD&A section's forward-looking nature. Surprisingly, sadness and fear also occur frequently, most likely due to the plight of distressed energy and retail companies in 2016.
Plots & Visualizations of Correlation
The correlation plot between the various NRC emotions is now shown. We have no significant correlation between forecast and fear, joy and anger, surprise, and disgust, contrary to expectations. The more words related to fear, the more they relate to joy (a testament to the polarity of management). With the above-mentioned optimistic inclinations, excessive positivism appears to have "masked out" significant differences in emotional content underlying it. Furthermore, differences should be considered in the overall lengths of 10 ks, since longer 10 ks may contain more words of joy and fear (Azhar et al., 2019).
The plot of financial correlation is full of intriguing relationships compared to the emotional correlation plot. (1) Investment cash flows and operational cash flows, for example, are negatively correlated with net income. This year's lower bonuses lead to higher revenue figures for CEOs in the following year, unfortunately. (2) The correlation –0.16 between net revenue and the operating cash flow is also worthy of note. The negative correlation indicates that the S&P 500 companies use accruals and income easing extensively. (3) All other logical relations are relatively well established; increasing operating cash flows entail greater capacity for reinvestment, which leads to a strong correlation with increasing cash flows (Brink & Stoel, 2019).
Design Choices in the Research and Limitations
We immediately notice that various emotional categories are highly predictive of changes in the fundamentals of a business. These are the first findings of their kind, as all previous academic research has limited sentiment analysis to relationships with equity returns without regard for underlying fundamentals. Five significant observations summarize our findings (Brink & Stoel, 2019).
Each financial variable is associated with a unique set of sentiment categories. For example, Disgust exhibits a statistically significant negative correlation with the logged change in Net Income. It becomes insignificant, however, when the Operating Cash Flow of business changes. This finding is relevant to analysts who predicted specific parts of a company's balance sheet or remuneration policies; case-specific research requires an appropriate 10-K analysis approach and strategic planning conversations (Azhar et al., 2019).
Emotional changes are logically related to changes in fundamental firm fundamentals. A 1% rise in joy leads, for example, to a statistically meaningful increase in capital flow from investment and finance. CEOs who are optimistic about the future naturally increase capital expenditures and issue additional debt to fund speculative plant expansions. Investment banks can use this finding to establish relationships with clients expected to have higher future growth CAPEX (i.e., through M & M&A).
It is worth noting a few niche relationships. In terms of dividends, an increase in Surprise results in increased dividend distributions, whereas an increase in Disgust results in a reduction in dividend distributions of the same magnitude. Typically, 10-Ks that include Surprise does not include a great deal of Disgust. Increases in Anticipation result in decreased stock-based compensation. When CEOs express anxiety about the near future, they appear to be reducing existing incentive plans. Additionally, CEOs appear to be preceding near-term compensation in order to pursue long-term investment opportunities. This finding can help firms to determine future compensation plans (Brink & Stoel, 2019).
Positive and negative sentiment changes do not statistically significantly predict changes in corporate fundamentals. A data analyst should degrade such feelings into the more nuanced emotions previously discussed.
The above regressions have an average statistical R-squared of 3%. Although this is important, we now use a financial sentiment analysis machine learning approach to maximize the percentage of variance explained.
Major potential issues and "Plan B."
In this case, abnormal trends of the data listed would be a significant potential problem. Our plan B would then improve the efficiency of the AI.
Textual data have risen significantly in recent years in accounting research. This monograph defines and describes the standard text information and then shows the storage and analysis of textual data using the Python programming tongue to help researchers understand and use textual data. The memoir contains a sample code that replicates tasks of textual analysis in recent papers.
In the first section of the monograph, we discuss how to get started with Python. We begin by describing Anaconda, a Python distribution that includes the necessary libraries for textual analysis and how to install it. The Jupyter notebook is then introduced, a programming environment that facilitates researc...
- IT Demand Supply and Processes and the Portfolio and Life-Cycle ManagementDescription: IT process modeling and project management show that only successful business process improvement will not contribute to complete supply chain coordination. In order to accomplish the above goals, the goals are to: * Show how IT demand and supply development within a company and how that development can be...33 pages/≈9075 words| 18 Sources | APA | IT & Computer Science | Essay |
- Data Communication and Enterprise Networking Analysis EssayDescription: When looking at computer networks, it is enumerated that data is sent in packets (small blocks). Each packet will be transmitted as a specified individual and may all together follow a different route to reach its specific destination....10 pages/≈2750 words| 10 Sources | APA | IT & Computer Science | Essay |
- The Use of ICT to Alleviate Mental Health Problems in The SocietyDescription: Information and Communication Technologies are essential in solving and alleviating a wide variety of health-related problems. ICT can be used in e-learning platforms for preventive education, information dissemination, self-care and follow-up. This proposal delve the use of ICT as tool for mental health...4 pages/≈1100 words| 1 Source | APA | IT & Computer Science | Essay |