Python Read Text File Into Numpy Array

Welcome to another module of numpy. In our previous module, we had got insights on numpy in python. But the chore becomes hard while dealing with files or CSV files in python as there are a humongous amount of information in a file. To brand this task easier, nosotros volition have to deal with the numpy module in python. If you lot have non studied numpy, so I would recommend studying my previous tutorial to empathise numpy.

Introduction

One of the difficult tasks is when working with data and loading information properly. The most mutual fashion the information is formatted is CSV. Y'all might wonder if at that place is a direct way to import the contents of a CSV file into a tape assortment much in the mode that we do in R programming?

Why CSV file format is used?

CSV is a plainly-text file that makes it easier for data manipulation and is easier to import onto a spreadsheet or database. For example, Y'all might want to export the information of certain statistics to a CSV file and then import it to the spreadsheet for further data assay. It makes users working feel very piece of cake programmatically in python. Python supports a text file or string manipulation with CSV files directly.

Ways to load CSV file in python

There are many ways to load a CSV file in python. The iii common approaches in Python are the following: –

  1. Load CSV using numpy.
  2. Using Standard Library function.
  3. Load CSV using pandas.
  4. Using PySpark.

Out of all the three today, nosotros volition discuss simply how to read a CSV file using numpy. Moving ahead, let's see how Python natively uses CSV.

Reading of a CSV file with numpy in python

Equally mentioned before, numpy is used past information scientists and machine learning engineers extensively because they accept to work with a lot with the data that are mostly stored in CSV files. Somehow numpy in python makes it a lot easier for the information scientist to piece of work with CSV files. The two ways to read a CSV file using numpy in python are:-

  1. Without using any library.
  2. numpy.loadtxt() function
  3. Using numpy.genfromtxt() function
  4. Using the CSV module.
  5. Utilise a Pandas dataframe.
  6. Using PySpark.

i.Without using whatsoever congenital-in library

Sounds unreal, right! Merely with the help of python, we can achieve anything. There is a congenital-in function provided by python called 'open up' through which we can read any CSV file. The open built-in function copies everything that is there is a CSV file in string format. Let united states become to the syntax part to get it more articulate.

Syntax:-

open('File_name')

Parameter

All we need to practice is laissez passer the file name as a parameter in the open up built in role.

Return value

It returns the content of the file in string format.

Allow's do some coding.

file_data = open up('sample.csv') for row in file_data:     print(row)          

OUTPUT:-

                      Name,Hire Engagement,Salary,Sick Days Left                                Graham Bell,03/15/nineteen,50000.00,10                    John Cleese,06/01/xviii,65000.00,eight                    Kimmi Chandel,05/12/20,45000.00,10                    Terry Jones,11/01/xiii,70000.00,3                    Terry Gilliam,08/12/twenty,48000.00,7                    Michael Palin,05/23/twenty,66000.00,8        

2. Using numpy.loadtxt() office

It is used to load text file data in python.  numpy.loadtxt( ) is similar to the function numpy.genfromtxt( ) when no data is missing.

Syntax:

numpy.loadtxt(fname)

The default information type(dtype) parameter for numpy.loadtxt( ) is float.

import numpy as np information = np.loadtxt("sample.csv", dtype=int) print(data)# Text file data converted to integer information type          

OUTPUT:-

          [[i. 2. three.]  [4. 5. 6.]]        

Explanation of the code

  1. Imported numpy library having alias name as np.
  2. Loading the CSV file and converting the file data into integer information type by using dtype.
  3. Print the information variable to get the desired output.

3. Using numpy.genfromtxt() part

The genfromtxt() office is used quite ofttimes to load information from text files in python. We can read data from CSV files using this role and store it into a numpy array. This office has many arguments available, making it a lot easier to load the data in the desired format. We tin can specify the delimiter, deal with missing values, delete specified characters, and specify the datatype of data using the different arguments of this function.

Lets do some code to become the concept more than articulate.

Syntax:

numpy.genfromtxt(fname)

Parameter

The parameter is usually the CSV file name that yous want to read. Other than that, we can specify delimiter, names, etc. The other optional parameters are the following:

Name Description
fname file, file proper noun, list to read.
dtype The data type of the resulting assortment. If none, then the data type will be adamant by the content of each column.
comments All characters occurring on a line later on a comment are discarded.
delimiter The string is used to divide values. Past default, any whitespace occurring consecutively acts as a delimiter.
skip_header The number of lines to skip at the beginning of a file.
skip_footer The number of lines to skip at the end of a file.
missing_values The set up of strings corresponding to missing data.
filling_values A prepare of values that should be used when some data is missing.
usecols The columns that should be read. It begins with 0 showtime. For instance, usecols = (1,iv,v) will extract the 2nd,5th and 6th columns.
Description of the paramters

Render Value

Information technology returns ndarray.

from numpy import genfromtxt data = genfromtxt('sample.csv', delimiter=',', skip_header = i) print(information)          

OUTPUT:

          [[i. 2. 3.]  [four. 5. 6.]]        

Explanation of the code

  1. From the package, numpy imported genfromtxt.
  2. Stored the data into the variable data that will return the ndarray bypassing the file name, delimiter, and skip_header as the parameter.
  3. Print the variable to get the output.

4. Using CSV module in python

TheCSV the module is used to read and write data to CSV files more than efficiently in Python. This method volition read the data from a CSV file using this module and store it into a listing. And so it will further proceed to convert this list to a numpy array in python.

The code below volition explain this.

import csv import numpy as np  with open('sample.csv', 'r') as f:     information = list(csv.reader(f, delimiter=";"))  data = np.array(data) print(information)          

OUTPUT:-

          [[1. ii. 3.]  [4. 5. 6.]]        

Explanation of the lawmaking

  1. Imported the CSV module.
  2. Imported numpy every bit nosotros desire to use the numpy.array feature in python.
  3. Loading the file sample.csv in reading fashion as we take mention 'r.'
  4. After separating the value using a delimiter, nosotros store the data into an array course using numpy.array
  5. Print the data to get the desired output.

5. Use a Pandas dataframe in python

We can utilise a dataframe of pandas to read CSV data into an array in python. We can practise this past using the value() function. For this, we will have to read the dataframe and then catechumen it into a numpy assortment by using the value() function from the pandas' library.

from pandas import read_csv df = read_csv('sample.csv') information = df.values impress(data)          

OUTPUT:-

          [[ane 2 3]  [iv 5 6]]        

To show some of the power ofpandas CSV capabilities, I've created a slightly more complicated file to read, chosenhrdataset.csv. It contains data on company employees:

hrdataset CSV file

                      Name,Hire Date,Salary,Sick Days Left                                Graham Bong,03/15/19,50000.00,ten                    John Cleese,06/01/eighteen,65000.00,8                    Kimmi Chandel,05/12/20,45000.00,10                    Terry Jones,11/01/xiii,70000.00,3                    Terry Gilliam,08/12/twenty,48000.00,7                    Michael Palin,05/23/xx,66000.00,8        
import pandas dataframe = pandas.read_csv('hrdataset.csv') impress(dataFrame)          

OUTPUT:-

                      Proper noun      Hire Date   Bacon   Ill Days Left                                0   Graham Bell    03/15/19    50000.0          ten                    1   John Cleese    06/01/18    65000.0           viii                    two   Kimmi Chandel  05/12/20    45000.0          10                    3   Terry Jones    11/01/13    70000.0           3                    iv   Terry Gilliam  08/12/twenty    48000.0           7                    5   Michael Palin  05/23/twenty    66000.0           eight        

6. Using PySpark in Python

Reading and writing information in Spark in python is an important task. By and large, it is the starting time for any course of Big data processing. For case, there are unlike ways to read a CSV file using pyspark in python if you want to know the core syntax for reading data earlier moving on to the specifics.

Syntax:-

spark.format("...").option("key", "value").schema(…).load()

Parameters

DataFrameReaderis the foundation for reading information in Spark, it can be accessed via spark.read attribute.

  1. format — specifies the file format as in CSV, JSON, parquet, or TSV. The default is parquet.
  2. option — a set of key-value configurations. It specifies how to read data.
  3. schema — It is an optional one that is used to specify if you would like to infer the schema from the database.

3 ways to read a CSV file using PySpark in python.

  1. df = spark.read.format("CSV").selection("header", "Truthful").load(filepath).

2. df = spark.read.format("CSV").pick("inferSchema", "Truthful").load(filepath).

3. df = spark.read.format("CSV").schema(csvSchema).load(filepath).

Lets do some coding to understand.

diamonds = spark.read.format("csv")   .option("header", "truthful")   .option("inferSchema", "true")   .load("/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv")          

OUTPUT:-

3 ways to read a CSV file using PySpark in python.
diamonds

Decision

This article has covered the different ways to read information from a CSV file using the numpy module. This brings united states to the stop of our article, "How to read CSV File in Python using numpy." I hope y'all are articulate with all the concepts related to CSV, how to read, and the different parameters used. If you understand the basics of reading CSV files, you won't ever be defenseless flat-footed when dealing with importing information.

Brand sure you practice as much every bit possible and gain more than experience.

Got a question for the states? Please mention it in the comments section of this "6 ways to read CSV File with numpy in Python" article, and we will go back to you as presently as possible.

FAQs

  1. How exercise I skip the outset line of a CSV file in python?

Ans:- Use csv.reader() and side by side() if yous are non using any library. Lets code to sympathise.

Allow us consider the following sample.csv file to sympathise.

sample.csv

                      fruit,count                    apple,1                    banana,ii        
file = open('sample.csv') csv_reader = csv.reader(file) next(csv_reader)  for row in csv_reader:     impress(row)          

OUTPUT:-

          ['apple tree', '1']                    ['banana', 'two']        

As you tin can see the commencement line which had fruit, count is eliminated.

2. How exercise I count the number of rows in a csv file?

Ans:- Use len() and list() on a csv reader to count the number of lines.

lets go to this sample.csv information

          1,2,iii          4,5,6          7,eight,9        
file_data = open("sample.csv") reader = csv.reader(file_data) Count_lines= len(listing(reader)) print(Count_lines)          

OUTPUT:-

3

Equally you can meet from the sample.csv file that there were three rows that got displayed with the help of the len() part.

braheclarrythand.blogspot.com

Source: https://www.pythonpool.com/numpy-read-csv/

0 Response to "Python Read Text File Into Numpy Array"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel