Text Analytics – Getting insights from text

Technology revolution is changing every aspect of the human life. Media and Marketing are no different. Among all the technologies that are contributing to the advancement, Data Science is at the forefront. As Text Data is being continuously generated and consumed in various formats and sizes from a number of varied sources, it is becoming an important asset to organizations. But this asset can be leveraged upon, only if stored, processed and analysed efficiently with the help of intelligent algorithms. There is a growing interest to utilize such data for the improvement of business, health, education, society, etc. There are many ways to process and analyse such data, covering broad techniques such as text visualisation, classification, named entity recognition, sentiment analysis, etc. Effective applications of these techniques can give organisations valuable insights leading to competitive advantage, efficient service delivery and above all higher customer satisfaction.

With this in the view, CDAC Mumbai is conducting a series of short-term courses in Data Science and Machine Learning. This is second series of such courses and latest in this series is  “Text Analytics” going to be conducted during May 18-20, 2017. Registrations for the course are open. More details can be accessed at http://www.kbcs.in/datascience.

AlphaGo: Possible repercussions and India

ET’s editorial on March 11, 2016 talks about AlphaGo and draws some interesting sketches. At one point, “An AI-run factory, goes a joke, employs just a man and a dog. The dog’s job is to keep the man away from the factory. Why have the man at all, in that case? Someone has to feed the dog.”,
From the same editorial – A possible scenario for India: “AI will enhance productivity and profits for all companies that can master it and deploy it. Much of India’s advanced IT services industry might get replaced by AI, unless industry itself deploys AI. Indian universities have to teach and advance AI in all its myriad forms. India’s human intelligence potential must be realized, for the Indian economy to benefit from AI rather than be its victim.”
Lets wait and watch how it unfolds.

Call for Participation – Short-term Courses on Data Science at CDAC, Kharghar, Navi Mumbai


We are living in a Data Age. Data is being continuously generated and consumed in various formats, and sizes from a number of varied sources. This data can be a big asset if stored, processed and analysed efficiently in real time with the help of intelligent algorithms. There is a growing interest to utilize such data for the improvement of business, health, education, society, etc. There are many ways to process and analyse such data spanning techniques like data visualisation, text analysis, predictions and recommendations etc. Applications of these techniques can give companies and organisations valuable insights leading to competitive advantage, efficient service delivery and above all customer satisfaction. And so the demand for skilled resources in these fields is growing day by day.

With this view, CDAC, Mumbai is announcing the following short-term courses in Data Science and Machine Learning.

  1. Using R for data visualization and analytics: This course introduces R – a language and environment for Statistical Computing and Visualisation. In recent years, R has become very popular due its open source cross-platform nature, robust package repository and strong graphics capabilities. During the course, one will not only learn about basics of R, but also about techniques of data acquisition and processing. Course will also cover in detail the features of R related to data analysis and visualisation.
  2. Text Analytics: The course aims to provide learners an understanding of the methods for text analytics. It will cover major techniques for mining and analyzing text data to discover interesting patterns, extract useful knowledge, and support decision making. The techniques will include Named Entity Recognition, Sentiment Analysis and Text Categorization among others. Learners will also be introduced to various open source utilities for developing text analytics applications.
  3. Predictive Analytics and Recommender Systems: The course covers various methods of Predictive Analytics and Recommender Systems drawn from Statistics, Data Mining, and Machine Learning. We will discuss popular algorithms in the domain and their use in various applications. The course emphasizes hands-on approach for better understanding of the techniques used in the domain. During the course, mainly open-source tools will be used for illustrations and lab.

Target Audience: Individuals, students, and professionals from government, industry, and academia working / interested in Data Science

Courses Schedule:

Course Name Using R for data visualization and analytics Text Analytics Predictive Analytics and Recommender Systems
Course Dates May 19 – 21, 2016 June 16 – 18, 2016 July 14 – 16, 2016
Final Registration Date May 04, 2016 June 01, 2016 June 30, 2016

Registration Process: Registration fee per course for a candidate is Rs. 7500/-. For more details about registration and payment process, please visit http://www.kbcs.in/datascience.

Note: Registration will be on first come first serve basis. Final participation in any of the courses will be subject to the realization of payment of applicable registration fee.

For More details, please contact:

Centre for Development of Advanced Computing (Formerly NCST)

Near Bharati Vidyapeeth, Raintree Marg, Sector 7, CBD Belapur,

Navi Mumbai – 400614, Maharashtra, INDIA

Telephone: + 91-22-27565303/304/305

Fax: +91-22-27565004

email: kbcs@cdac.in

URL: http://www.kbcs.in/datascience

Sangrah – Knowledge Repository for FOSS in Education from CDAC, Mumbai

CDAC, Mumbai has announced the beta release of portal SangrahKnowledge Repository for FOSS in Education . This portal contains resources about different categories like Learning Management System, Content Management System, etc. It also contains user experiences for these categories, comparative analysis of various tools from these categories, specialised search, and collaboration facility for community supported content updates.

The portal is maintained with least manual intervention as most of the tasks including, resource collection, categorization, user experience identification, comparative analysis, etc are largely automated.

The portal is intended for academic institutions, entrepreneurs, among others to help them to adopt Free and Open Source Softwares (FOSS).

The portal is still evolving, hence feedback about the portal, improvement suggestions can be given through the feedback section on portal.

Users can visit and register on the portal at – http://nrcfoss.cdacmumbai.in/sangrah

Release of new version of GNU/Linux distribution for Cognitively Challenged by CDAC, Mumbai

Centre for Development of Advanced Computing (CDAC) has released the new version (version 0.1.2) of GNU/Linux distribution for CognitivelyChallenged. Cognitively challenged people face different kinds of problems such as memory loss, forgetfulness, attention problems etc. Therefore, the major objective of this distribution is to provide an accessible desktop environment suitable to such users. The major highlights of this distribution are simplified and accessible desktop environment, simplified applications, tagged file system, tag-based searching, user’s activity log, reminder facility etc. that are specifically aimed to reduce distraction and memory load during computer interaction. These salient features of the distribution can be of immense help to such users and their caretakers, while using computer. This distribution is based on Ubuntu 10.04 and offers a number of improvements/enhancements over previously released version (version 0.1.1). These improvements/enhancements have been incorporated based on feedbacks and suggestions received from various organisations and users.

Major highlights in the current release:

  • Faster tag based searching
  • Facility to add new user-defined image tags
  • Enhanced tag control center to edit/delete existing tags(textual and image both).
  • Enhanced tag control center to add new file extensions for which tag setting option should be enabled.
  • New educational games included (The Number Race and Tux Type)

GNU/Linux distribution for Cognitively Challenged-0.1.2 can be downloaded from here.

More details about the distribution can be accessed at http://www.cdacmumbai.in/glcc.

Details of various enhancements made in the current version can be found at http://nrcfoss.cdacmumbai.in/access/LinuxForCC-0.1.2-docs/ChangeLog_0.1.2.pdf.

Feedback and suggestions about the distribution can be sent at ossd[at]cdac[dot]in.

Does our privacy really matter?

Was reading an article in today’s “The Hindu”“Through the PRISM, Big Brother is watching”. This article talks about how USA’s National Security Agency, in the name of surveillance and backed by some US law through a programme called “PRISM”, have direct access to servers of all big companies – Microsoft, Google, Facebook, Apple, AOL etc, as reported by “The Guardian” also. As per the article, it all started from Microsoft, who says – “Your privacy is Our priority”,  in April, 2007 to Apple as latest company to join the programme in October, 2012. Now, Dropbox can also be included in the programme. Google, Yahoo, Facebook etc. are also included in the programme. With direct access to servers, NSA can access any kind of stored data as well as real-time data – be it email contents, voice and video chat, photos, documents, search history or file transfers involving any person outside USA. Article reports that almost 2000 reports are issued every month by NSA. Now, on seeing and reading this, some questions arise in the mind:

1. Does USA have moral and legal right to access the data not pertaining to a US citizen?

2. Why these top companies who project themselves as protectors of free speech, make huge claims of protecting user’s privacy first, are involved in secret programme like this?

3. Are these companies under pressure of government in the name of law and security? If yes and they are compromising a user’s privacy, then why such huge claims about protection of user’s privacy and data?

4. Of course, senators are concerned and companies are also issuing the statement. See here. But the most important question need to be asked is – Does our (users’) privacy really matters (for these companies)?

In past also, continuous concerns have been raised over the approach of these companies towards the protection of privacy and user data – be it Google, Facebook or any other company. Sometimes, our data is sold to advertisers to increase revenue, sometimes our privacy is breached in name of better the user experience etc.

In India also, concerns are raised about the government’s intent on spying the users’ online activities and data. And, some acts of government and authorities in recent past have also fuelled the perception that they want to curb freedom of speech. We all know that how our privacy is breached by telecom companies in India despite the efforts of regulatory bodies. We are constant victims of pesky calls and smses.

What I think is that all these big and top companies whether Internet or telecom are least interested in protecting our privacy and data. For them, we only exist as commodities that can be sold and bought. It is only us who can protect our privacy and data. It is very much necessary and required to think twice before sharing any kind of personal and important information or document on Internet or social networking sites.

One-day workshop on Parikshak — an online program grading system organised by CDAC, Mumbai

Computer Programming is an important aspect of any computer-science/information-technology course. This is among the most difficult to teach, since being a good programmer is a skill to be acquired coupled with knowledge of many different aspects such as abstraction, analysis, structured programming, debugging, etc. Usually computer programming is taught through a combination of theory classes on concepts and languages, and lab classes on specific languages. Assignments are given, which are manually graded by the instructor. Manual grading of programs is very tedious and time consuming. While evaluating student assignments the teacher has to act as a compiler/interpreter and inspect each line in program and judge whether the overall program would work correctly. While this grading approach works fine for a small number of simple assignments, it gets unwieldy as the complexity and number of the assignment increases. This, in turn, results in inadequate attention to the practical programming skills of students.

Parikshak is a web based system which offers a solution to this. A tool like Parikshak, which facilitates automated evaluation of software programs can significantly reduce the load of the faculty, give direct feedback to student, thereby leading to efficient handling of programming assignments/exams. Parikshak allows teachers to define programming assignments systematically, allows registered students to attempt solving them online, and automatically assesses the solution. The feedback is available to the students and the faculty, for follow up and record. Using Parikshak to manage your programming component, clearly offers many advantages.

During the workshop, we will demonstrate the system, discussing various ways in which tool can be used in the academic setting, and provide hands-on for faculty. Since we have limited number of seats, we request you to confirm participation from your colleges latest by June 12, 2013, by email or phone call. Teachers from the CS/IT departments are preferred, though others with some programming experience/interest are welcome.

Contact Information

Details of workshop

Email: parikshak@cdac.in

Phone: 022-27565303, Extn 301

Contact Person: Ms. Mercy Sobhan

Date & Timing: Sat, 15 June, 2013

10:00 AM – 5:00 PM

Venue: CDAC Kharghar

Registration Fees: Rs 600/-

Tea and Lunch will be provided to all participants

Registration fee has to be paid by Demand Draft / Cheque (local or at par) drawn in favour of CDAC payable at Mumbai. Please mention your name, organisation and contact no. on the back side of the Demand Draft / Cheque.

Flyer: parikshak

A glimpse of current state of Indian IT industry

Recently, I read an article about this year’s Google’s Code Jam contest, which is a sort of Olympics for programming skills. At the qualifying stage, 17% of the contestants were Indians. By the third round, as the competition intensified, Indians were down to 0.7%.

Although India began with the highest number of participants, just three Indians lasted till round three, compared to 83 Chinese, 77 Russians, 36 Japanese, 25 Americans, 21 Poles, 13 Belarusians, 11 South Koreans etc. In fact, the code Olympics look much like the sports Olympics. So much for the IT superpower. – Dhirendra Kumar, CEO, Value Research

Read entire story here.