Welcome to Nobias’ abstractive summarization challenge in collaboration with Finance Club, IIT Bombay. At Nobias, they build machine learning models to automatically tag financial news articles as either bullish or bearish. The first step in creating such labels for articles is to understand what the article is talking about. You will create a machine-learning model to do exactly that!
Your machine learning model should output a short (3-5 sentences) abstractive summary of a given news article. Your model should then use the summaries to rank each stock mentioned in the articles (these stock tickers are available in both the training and test data, no need to extract them yourselves). To rank the stocks, give them a rating between 1-5 with the following criteria:
In NLP, there’s two types of summarization tasks. Extractive summarization selects important phrases from the original source to create a concise summary of it. Abstractive summarization does not simply copy important phrases from the source text but also potentially come up with new phrases that are relevant. This technique entails identifying key pieces, interpreting the context, and re-creating them in a new way. Due to the difficulty of both extracting relevant information from a document as well as automatically generating coherent text, abstractive summarization has been considered a more complex problem than extractive summarization.
We constructed a dataset of a 100k news articles about various US-listed stocks from Nobias’ database. Each data point contains the raw content from the news article, a one-sentence summary of the article, and a stock ticker for the stock we want to focus on for the article.
The dataset is organized in one single json file, with all articles in an array. Each object in the array contains five properties “id”, “date”, “article”, “stock” and “summary”.
Jan. 17, 2023 - Feb. 1, 2023
Finance Club, IIT Bombay
Online
₹ 200,000