Natural Language Processing (NLP)

Natural Language Processing or NLP is a sub field of Linguistics, Computer Science,Information Technology and Artificial Intelligence that gives the machines the ability to understand,read, speak and derive meaning from human languages.

NLP is among the hottest topic in the field of data science. Companies are putting tons of money into research in this field. Everyone is trying to understand Natural Language Processing and its applications to make a career around it. Every business out there wants to integrate it into their business somehow.

My first Article in Telugu Language

ఆర్టిఫిషియల్ ఇంటెలిజెన్స్ (AI) అనేది మనుషుల వలె ఆలోచించడానికి మరియు వారి చర్యలను అనుకరించటానికి ప్రోగ్రామ్ చేయబడిన యంత్రాలలో మానవ మేధస్సు యొక్క అనుకరణను సూచిస్తుంది. అభ్యాసం మరియు సమస్య పరిష్కారం వంటి మానవ మనస్సుతో సంబంధం ఉన్న లక్షణాలను ప్రదర్శించే ఏ యంత్రానికి కూడా ఈ పదాన్ని వర్తించవచ్చు.కృత్రిమ మేధస్సు యొక్క ఆదర్శ లక్షణం ఒక నిర్దిష్ట లక్ష్యాన్ని సాధించడానికి ఉత్తమమైన అవకాశాన్ని కలిగి ఉన్న హేతుబద్ధీకరణ మరియు చర్యలను తీసుకునే సామర్థ్యం. కృత్రిమ మేధస్సు యొక్క ఉపసమితి యంత్ర అభ్యాసం, ఇది కంప్యూటర్ ప్రోగ్రామ్‌లు మానవుల సహాయం లేకుండా స్వయంచాలకంగా నేర్చుకోగలవు మరియు క్రొత్త డేటాకు అనుగుణంగా ఉంటాయి అనే భావనను సూచిస్తుంది. లోతైన అభ్యాస పద్ధతులు టెక్స్ట్, ఇమేజెస్ లేదా వీడియో వంటి పెద్ద మొత్తంలో నిర్మాణాత్మక డేటాను గ్రహించడం ద్వారా ఈ ఆటోమేటిక్ లెర్నింగ్‌ను అనుమతిస్తుంది.


Reinforcement learning (RL) is an area of machine learning concerned with how intelligent agents ought to take actions in an environment in order to maximize the notion of cumulative reward. Reinforcement learning is one of three basic machine learning paradigms, alongside supervised learning and unsupervised learning.

Here’s what I found:

1. Combine work and play with AWS DeepRacer

AWS introduced DeepRacer in November 2018 as the “fastest way to get rolling with machine learning.” In December 2020, they had more than 10,000 competitors and a grand prize that included $10,000 of AWS promotional credits.

Don’t let the competition scare you away, because DeepRacer is a superb learning tool…


In order to use textual data for predictive modelling, the text must be parsed to remove certain words — this process is called tokenization. These words need to then be encoded as integers, or floating-point values, for use as inputs in machine learning algorithms. This process is called feature extraction (or vectorization).

Scikit-learn’s CountVectorizer is used to convert a collection of text documents to a vector of term/token counts. It also enables the ​pre-processing of text data prior to generating the vector representation. This functionality makes it a highly flexible feature representation module for text.

About Scikit-Learn’s vectorizers

As you know machines…

Question 1) How to choose the value of the regularisation parameter (λ)?

Selecting the regularisation parameter is a tricky business. If the value of λ is too high, it will lead to extremely small values of the regression coefficient, which will lead to the model underfitting (high bias — low variance). On the other hand, if the value of λ is 0 (very small), the model will tend to overfit the training data (low bias — high variance).

There is no proper way to select the value of λ. What you can do is have sub-samples of data and run…

Question 1) What is linear regression?

In simple terms, linear regression is a method of finding the best straight line fitting to the given data, i.e. finding the best linear relationship between the independent and dependent variables.

In technical terms, linear regression is a machine learning algorithm that finds the best linear-fit relationship on any given data, between independent and dependent variables. It is mostly done by the Sum of Squared Residuals Method.

Question 2) State the assumptions in a linear regression model.

There are three main assumptions in a linear regression model:

1. Assumption about the form of the…

Natural Language Processing (NLP) is one of the most important fields of study and research in today’s world. It has many applications in the business sector such as chatbots, sentiment analysis, and document classification.

Preprocessing and representing text is one of the trickiest and most annoying parts of working on an NLP project. Text-based datasets can be incredibly thorny and difficult to preprocess. But fortunately, the latest Python package called Texthero can help you solve these challenges.

What is Texthero?

Texthero is a simple Python toolkit that helps you work with a text-based dataset. It provides quick and easy functionalities that let you…

Gopathi Suresh Kumar

Lead Data Scientist

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store